An algorithm for U–Pb geochronology by secondary ion mass spectrometry

Vermeesch, Pieter

doi:https://doi.org/10.5194/gchron-4-561-2022

Articles | Volume 4, issue 2

https://doi.org/10.5194/gchron-4-561-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gchron-4-561-2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 4, issue 2

Research article

|

19 Aug 2022

Research article |

| 19 Aug 2022

An algorithm for U–Pb geochronology by secondary ion mass spectrometry

Pieter Vermeesch

Download

Final revised paper (published on 19 Aug 2022)
Preprint (discussion started on 04 Mar 2022)

Interactive discussion

Status: closed

RC1:
'A Solid Foundation for Accessible Best Practice in SIMS U-Pb Geochronology Data Reduction', Morgan Williams, 11 Apr 2022
This manuscript provides a well-written and logically-organised description of some of the key challenges in addressing data reduction of SIMS U-Pb geochronology, and succinctly describes best-practice approaches to the required processing steps. The author clearly identifies the main limitations and inaccuracies of commonly used approaches to dealing with mass spectrometric data in compositional systems, which as noted by the author have been acknowledged in the community for some time. The documented approach builds upon other recent work in the community encouraging the use of appropriate compositional data handling, and the preservation of full covariance matrix throughout the data reduction process where possible. The software accompanying this manuscript (simplex) provides a flexible interface (with online and offline in-browser applications and access to a command line interface) and supports the key data formats for SIMS geochronological data from SHRIMP and Cameca instruments. The author is to be commended for providing the software in an open-source manner with a permissive-use license, and for ensuring the approach is accessible via multiple interfaces (allowing both high-level and low-level interfaces catering to the broader community). Some comments regarding the manuscript and associated software are attached below, with some specific technical suggestions for the manuscript and software noted separately.

The implied extension of other software (e.g. isoplotr, in the first instance) to accept the fully-specified outputs of simplex will be a key next step to make best use of this work and bridge the full pipeline from instrument outputs to research outputs.

The key concepts are of general interest beyond the geochronology community, with direct application to other SIMS-based isotope ratio measurements. In encompassing the major steps in data reduction of compositional mass spectrometric data, the software is flexible enough to reduce e.g. Cameca oxygen and sulfur isotope data (as is demonstrated in the application). In some cases, additional steps might be appropriate for specific scenarios (e.g. dealing with electron-induced secondary ion emission for negative-ion measurements on SHRIMP instruments where an electron gun is used for charge compensation in insulating materials, Ickert et al. 2008), but the foundational steps of the workflow are the same.

Regarding the operation of the software, I can verify that online application interface generally works as expected and the provided installation instructions (in the GitHub Readme) were sufficient guidance for getting the package running locally (at least for those who either have or are comfortable setting up an R distribution). I’ve been able to verify the use of the software across most of the claimed platforms (with a few minor bugs, described in the attached document) – online, on both Windows and Linux (via Windows Subsystem for Linux) using an Anaconda R distribution. The references to simplex outputs in the manuscript corresponding to the demonstration datasets (e.g. Figure 6b, Figure 7) match the current outputs of the software. With appropriate adjustments, I was able to process SHRIMP U-Pb data (both .op and .pd files, measured on the Australian National University SHRIMP II in 2013) in the same manner as the demonstration datasets.

The software provides some functionality to enable reproducibility of data reduction even in a graphical interface where this is commonly a challenge, with the ability to save and load configurations for data processing. With future development, any extension to include provenance information would be a valuable addition, potentially via metadata extracted from the input files where possible.

The documentation for processing options within the in-browser application is simple but explains in general terms the processes and options available, and the interface-specific documentation for the R package itself indeed appears to be complete.

Regarding propagation of matrix-based uncertainties in geochronological data, McLean et al. (2011) provide a relevant digestible overview (although with a focus on specific considerations for ID-TIMS) which may be a useful reference to cite in text or in the appendix, at the author’s discretion.

Manuscript: Technical Corrections and Specific Suggestions

L9: typo “handes" should be “handles”.

L19: The series of short sentences here are succinct but do not read well; rephrasing along the lines of ‘In contrast to SIMS instrumentation, LAICPMS instruments are built by numerous manufacturers, and the widely used data reduction software packages are generally compatible with all of their data formats”.

L122, L126: Specifically refer to the additive log ratio (ALR) used.

L142: The term ‘trick’ here perhaps confounds the intention and doesn’t lend to confidence in the use of it. When referring to generalization, perhaps refer to Eqn. 4 (which is what is generalized, as the logratio approach is not specific to a dimensionality, even if examples are 2/3 dimensional).

Figure 2a – I think the labels on the upper figure may be reversed. This might also be better termed a ‘projection’ from a four-component simplex (which would be a tetrahedron). This diagram depicts the compositional data aspect of the manuscript and software rather nicely.

Eqn 13, Section 9: Dead-time correction – I think this exclusively applies to a non-extended deadtime correction? Generally, the difference between extending/non-extending is likely to be small except for particularly high count-rates (maybe for every high U, or analysing a minor element on an EM), but it might be worth noting this, at the author’s discretion.

L224: SHRIMP data processing – Depending on the scenario, I think Dodson interpolation may also be used to similar effect for dealing with smooth intensity changes during cycles of data collection within a single spot (Dodson, 1978).

Figure 5: Depending on the desired manuscript formatting, the capitalization of this figure caption may be off (e.g. “a) Blank …”)?

L321: “Figure 6.a” – the references to figures may be easier to read without the period (e.g. Figure 6a; as is used in the Figure 7 caption).

Fig7b: I may be worth nothing that the white ellipse represents the concordia age and uncertainty.

References: Add DOIs to references where possible.

Software and Repository: General Notes

The license for the software is specified in the GitHub readme and R package description (the GPL-3 copyleft license), but is not in the repository; generally, a copy of this license should be included with the package repository (for visibility, and to meet the terms of the license).

I would suggest releasing a specific software version corresponding to the publication (e.g. 1.0, or 0.1 for a ‘beta’ release) such that future users can readily recognise updated versions relative to that described here (particularly if this manuscript is intended to be the primary/secondary citation for simplex, rather than citing the software package itself).

In the browser-based application, plotting and other functions generally work as expected, and as in the hosted online version (as far as tested, a few specific bugs/issues/suggested features are noted below).

The selenium tests included in the repository include reference to local files which are not available; these may be able to be modified to be distributable (potentially helpful for debugging).

Unit tests for the R package itself may be a useful addition for future development.

To get the app to run via WSL, manual specification of the port is necessarily, as the browser can’t necessarily be launched directly (I believe the specifics of this are managed by shinylight; this would not be a problem on a standalone Linux distribution, but may be worth documenting).

All in-app plotting suffered from a Cairo-SVG related bug on Windows (‘svg: Cairo-based devices are not available for this platform’). This may potentially be rectified with an additional dependency, or potentially a change in the plotting output format.

During my tests in a base-R distribution (i.e. not from RStudio), remotes-based installation in from the command line requires the setting of an environment variable (related to unzipping the repository contents).

Software and Repository: Browser-based Use

Suggested Feature: For tab 5 (samples), it would be good to exclude standards by default; and have multiple comma-separated patterns for sample matching.

Suggested Feature: Add a CSV download option for tables (these can currently be copy-pasted into common spreadsheet programs, without headers/indexes).

Suggested Feature: Potentially preserve source file name on JSON export for some simple provenance metadata. The source files contain more of this metadata themselves, and this could be a feature to consider for the future (e.g. with XML import for SHRIMP data, which is suggested to be planned for future development).

Suggested Feature: For relevant stages, automatic refreshing for plots when the analysis # is changed would be useful.

Bug (tab 5, using tabulate): adding more than one option to the table gives notation for duplicated column names (e.g. A, B, C, A.1, B.1, C.1, A.2, …). In particular, the structure of the ‘cov’ option with vector values indexed by integers and sparse covariance matrix across all samples is not immediately clear in the tabulated output. After a second look this table is readily understood, but would ideally it would be indexed by appropriate and unique column names.

Software and Repository: Programmatic Use/CLI

Plotting via the programmatic interface did not exhibit any issues on Windows, in contrast to the browser-based app.

The example CLI workflow in the GitHub Readme gives errors at the second step, after modification to use the included `./inst/SHRIMP.op` file; using the included ‘SHRIMP_UPb’ dataset (not from an original file) enabled the use of this workflow as below:

lr <- logratios(SHRIMP_UPb)

stand <- standard(preset="Temora-t")

paired <- pairing(lr,stand=stand

cal <- calibration(lr,stand=stand,pairing=paired,prefix="TEM")

result <- calibrate(cal,exterr=TRUE)

References

Dodson, M. H. (1978). A linear method for second-degree interpolation in cyclical data collection. Journal of Physics E: Scientific Instruments, 11(4), 296. https://doi.org/10.1088/0022-3735/11/4/004

Ickert, R. B., Hiess, J., Williams, I. S., Holden, P., Ireland, T. R., Lanc, P., Schram, N., Foster, J. J., & Clement, S. W. (2008). Determining high precision, in situ, oxygen isotope ratios with a SHRIMP II: Analyses of MPI-DING silicate-glass reference materials and zircon from contrasting granites. Chemical Geology, 257(1–2), 114–128. https://doi.org/10.1016/j.chemgeo.2008.08.024

McLean, N. M., Bowring, J. F., & Bowring, S. A. (2011). An algorithm for U-Pb isotope dilution data reduction and uncertainty propagation. Geochemistry, Geophysics, Geosystems, 12(6). https://doi.org/10.1029/2010GC003478
Citation: https://doi.org/10.5194/gchron-2022-4-RC1
- AC1: 'Reply on RC1', Pieter Vermeesch, 06 May 2022
  
  I would like to sincerely thank Dr. Williams for thoroughly testing 'simplex' on Windows, Mac and Linux, using the online, offline and command line interfaces. He even installed simplex in the cloud, using AWS! I have already implemented most of the reviewer's pertinent suggestions in the latest version of the software (which is available on http://isoplotr.es.ucl.ac.uk/simplex/). For example, I have:
  1. enabled comma-separated lists of sample prefixes
  2. added a CSV download option
  3. auto refreshing for drift and logratio plots
  4. fixed the issue with duplicate column names
  5. added a copy of the GPL license to the GitHub repo
  6. added installation instructions for hosting online simplex mirrors
  I have also made some other improvements, such as the addition of sample names as row names in the output tables.
  I am in two minds about the reviewer's request to include all the local files in the selenium tests. On the one hand, I agree that this would make it easier to collaborate with other developers. But on the other hand, it would inflate the GitHub repository and expose proprietary data online. I honestly do not expect many others to join the simplex development team. However, I would be more than happy to add more test files to the GitHub repository should this change.
  With regards to unit tests for the R functions, these actually already exist. The simplex package passes all CRAN checks, which means that all public functions are fully document, with examples. Building the simplex package with CRAN checks turned on runs all these tests on the user's computer and save their output in a folder called simplex.Rcheck.
  I was not able to reproduce the Cairo-SVG error in Windows. Perhaps this problem expressed itself in an earlier version of the program and has since gone away?
  Version 1.0 of the software will be released on GitHub as soon as the GChron manuscript is accepted for publication.
  Dr. Williams also makes some useful suggestions to improve the paper itself. These suggested are all minor and will be implemented in the revised manuscript. I will document them in its cover letter.
  
  Citation: https://doi.org/10.5194/gchron-2022-4-AC1
RC2:
'Comment on gchron-2022-4', Nicole Rayner, 21 Apr 2022

This is a well written, well organized, clear and easy to follow presentation of issues in SIMS data reduction. I agree entirely with the sentiments expressed in the opening paragraph of comment RC1 and thus will not repeat them here.

I approached this review as a manager of a SHRIMP ion probe laboratory and thus someone with a hands-on perspective of dealing with SIMS data in a geological research setting. While this paper shines a light on issues that have been known by parts of the community for some time and provides a novel method of addressing these, at the current level of development the simplex data reduction protocol is not practical even with the limited functionality that the author is upfront about. To be clear, my reservations about simplex as a tool does NOT indicate reservations regarding the publication of this manuscript and the concepts therein. This is an excellent manuscript that introduces the geochronology community to a novel method of data reduction. As my concerns mainly relate to user needs within simplex (as opposed to the mathematics) I hope that by flagging these it may lead to changes that will engage the geochronology community to enhance the uptake, utility and ensure community-wide rigorous testing of this approach with a wide variety of “real-world” datasets in all their imperfect glory.

I will give a couple of examples below of my experience trying to use simplex (both online and through the R GUI) that I hope illustrate overarching issues with the current structure that limit the utility of simplex.

I had quite a bit of difficulty importing in new files. I was unable to get a .pd file to properly load (online or using the R GUI). When I attempted to load a .pd file I got a cryptic error message, then the numbers in the table in section 2 (Drift) changed but they did not correspond to the actual data and the spot names under Aliquots did not change from the test file. Clearly the file did not load properly (and some parts of the previously loaded dataset persisted) but given the opaqueness of the interface I have no idea what the problem was. I think it was partly related to the fact that there doesn’t seem to be a way to “clear” the existing data out of the algorithm and start from scratch. Relatedly, there is no record of what data file is being processed. Better tracking of what is being processed (and how it is being processed - for example when defining the reference materials) key to making this a useful tool because ultimately the data for a given rock sample stands alone but needs to be linked to its data reduction metadata (but not necessarily to other unknown samples in that analytical session. A single JSON output is impractical in that regard).

I had better luck loading a .op file but even then the behaviour was inconsistent. Part of this inconsistency is due to the fact that some (most?) interfaces seem to require a “refresh” of some sort (e.g. press “Plot” again after excluding a scan in Drift) to redo the calculations. This also means that things get out of sync. If you advance to a different aliquot in DRIFT, the plots don’t advance, but there is no label on the plots to indicate which aliquot is being displayed. These are just two examples of instances where the algorithm as currently structured does not meet the needs of geochronologists working in SIMS labs producing data and then interpreted ages. Reproducibility and even more importantly, traceabilitity, in the data reduction package is crucial and this needs to be considered and baked into every step of any new process.

This manuscript is an excellent first step in starting to addresses the mathematical challenges of dealing with mass spectrometer data, while acknowledging that a great deal of work is still required to translate this contribution from better handling of mass spec counts all the way to geological interpretations of ages. In order to bring the geochronological community along this journey (which I am assuming is one of the reasons for this paper) I think that this manuscript would benefit from a more explicit presentation of what simplex CAN do (today) and CANNOT do (today). Given that this is a first step in a long path toward usability (and while I appreciate that not all the benefits of the covariance matrices can currently be accomplished) I think many readers from the geochronology community would like to see a comparison of the previous approaches (squid or NordSIM spreadsheet) and simplex as a first demonstration of its utility.

Specific comments/Technical corrections

Line 75, while things like negative lower limits of confidence intervals are physically impossible, the “reliability of analytical uncertainty assigned to dates….” is best addressed by systematic long term evaluations of laboratory data, including by inter-laboratory comparison work not just by working in compositional space. I suggest the last sentence of this paragraph be removed.

Line 86 – inconsistent use of “standard” “reference material” and “reference standard” throughout the manuscript. Suggest use of “reference material” throughout particularly because of the ambiguous meaning of “standard” (e.g. standard error as statistical term or as the uncertainty of a reference material). If this change is made use abbreviation RM as subscript when needed in equations eg. L102, 105.

Caption Figure 1 – “calibration error” should be calibration uncertainty

108. Here you refer to “samples” when you really mean “aliquots”. Later on you refer to “within-spot” drift or other uncertainties, for consistency of usage/clarity for the reader I suggest you refer to these as “analytical spot”.

Line 129 consider illustrating Table 1 data in a ternary diagram prior to mapping to Euclidean space (data points on Figure 2 perhaps?).

Line 150 – here “inter-sample” should be “inter-spot”, as “inter-sample” to most people will signify 2 different ROCK samples, not analyses/aliquots/spots

Line 164 – typo “black” instead of “blank”

Section 7 “Zeros” – consider merging with the blanks section, they are two parts of the same problem and I find the current breakdown into two sections jarring. Since section 7 is so brief, I am not even very clear why it is needed.

Sections 8 and 9 – as a SIMS mass spectrometrist/geochronologist and not a statistician I am searching for points of familiarity, which up to this point I am largely able to do. You lose me here in these sections. More direct explanation of the steps in traditional data reduction that replaced by this approach and then how these values get used/incorporated into ratios would be helpful. Part of the difficulty in following is that is the traditional data reduction approach the deadtime correction happens first, but in the paper “Deadtime” follows the section about “Dealing with count data” which seems counterintuitive.

Line 250 states that mass-dependent fractionation is commonly ignored. It has been established from long term reproducibility studies that this is not true, and some labs do not ignore it (e.g. GA uses OG1 and the Geological survey of Canada uses 1242, see Davis et al. 2019). This paper doesn’t propose that this algorithm (or more broadly approaching SIMS geochronology data as compositional) will solve all issues related to data processing of SIMS data, however a number of issues such as this are glossed over as secondary concerns. This is an example one of my general statements about this paper, where in an attempt to address some of problems related to the data reduction (sections 2-4, 6, 7) others that are known in real-world SIMS data are minimized (e.g. blanks greater than 204, or overcounts in 204).

Line 283, refer to it as Temora2 throughout the manuscript (not just in the parentheses this one time) as that is its name.

Line 293 refer to mass spec “cycles”, elsewhere “sweeps”. I prefer cycle, but either is fine as long as consistent (including axis labels of figure 5)

Line 301, not negative, positive

Line 302 “207” superscript in error

Line 302 Again I’m trying to map this to the usual approach of secondary beam normalization of SHRIMP data. Is this treatment instead of SBM norm or before/after? I think in lieu but I don’t understand how this might be affected by drift in the primary beam intensity which might either enhance or minimize the depth/oxygen availability effect illustrated in this diagram. For example if over the time of the analysis the primary beam intensity decreases then increases, the within-spot drift may be is U-shaped, not linear and thus a single slope regression applied to each cycle isn’t appropriate.

Line 307 – the strict enforcement of a positive value for 204-b doesn’t reflect the real-life behaviour of ion-counted data and could indicate a real problem with the analytical setup. I am concerned that this approach “hides” a real problem.

Line 312 – insert “within-spot” when referring to fractionation here for clarity.

Caption 5 – since side a is blank and drift corrected (eg. Pb204 – b/Pb206 - b), are the converted counts shown in part b also blank corrected (in which case the vertical axis should be labelled Pb204 – b, Pb206 – b etc)

Line 328 – edit text to “Figure 7 applies the Pb/U calibration (Equation 23) to 91500….” Makes it easier for the reader to know “Equation 23” does without having to go back to check.

Line 330 – again “inter-sample” when should be inter-spot (reminder to rationalize this usage throughout)

Line 331 – figure 7 uses Temora2 as the reference material and 91500 as the sample. Figure 8 uses 91500 as the reference material and Temora2 as the sample. Both of these materials are commonly used as RM’s and so switching their usage back and forth is tricky for the reader. Perhaps there is one but I don’t see any reason why Figure 8 can’t be recast with Temora2 as the RM which would streamline things.

Figure 7 caption – specify “using Temora2 as the reference material” at the end of the first sentence.

Line 368 – I’m not sure what is meant by “In contrast with existing data reduction protocols, the new algorithm simultaneously processes all the aliquots in an analytical sequence.” Please clarify.

Line 381 – While I appreciate that not all the benefits of the covariance matrices can currently be accomplished, it would be great to see a comparison of results between the previous approaches and this one one using the provided datasets.

Davis, W.J., Pestaj, T., Rayner, N., McNicoll, V.M., 2019. Long-term reproducibility of 207Pb/206Pb age at the GSC SHRIMP lab based on the GSC Archean reference zircon z1242. Geological Survey of Canada, Scientific Presentation 111, 1 poster, https://doi.org/10.4095/321203

Citation: https://doi.org/10.5194/gchron-2022-4-RC2
- AC2: 'Reply on RC2', Pieter Vermeesch, 06 May 2022
  
  I am sorry that Dr. Rayner was unable to import her own .pd and .op files into 'simplex'. This reflects the fact that Cameca side of 'simplex' has undergone more rigorous testing than the SHRIMP side. The reviewer kindly shared the problematic .pd and .op files with me, and the issue was quickly fixed: it turns out that these files contained some comments, which were absent from the Geoscience Australia files that I had previously used to test 'simplex'.
  The improved robustness of the data import function also addresses the reviewer's concern that it 'simplex' is not able to 'clear' the existing data. Whenever a new dataset is loaded, it replaces the previous data and resets the calculations. I have also fulfilled Dr. Rayner's request for "better tracking of what is being processed". The latest version provides the user with a lot more status updates, which is especially useful when loading large SHRIMP .pd files.
  Although the reviewer is correct that 'simplex' was, first and foremost, created as a vehicle to test and demonstrate the algorithms presented in the GChron manuscript, I do hope that it will be used for actual isotope geoscience research in the future. I have posted a request for further feedback on Facebook's 'SHRIMP Fan Club' page. Hopefully, this feedback will iron out any remaining bugs and issues.
  The reviewer points out two simplifications in the new algorithm, which are only briefly mentioned in the original manuscript:
  1. The assertion that mass-dependent fractionation of the Pb-isotopes can be safely ignored in most cases.
  2. The assumption that 204Pb is measured accurately, i.e. is free of isobaric interferences ('204Pb overcounts') and has been adequately peak-centred.
  The manuscript already mentions these issues, and suggests solutions to them. Mass dependent fractionation can already be corrected using the stable isotope functionality of 'simplex' (see line 385 of the original manuscript). Possible issues with the 204Pb measurements can be corrected by manually specifying a session blank that brings the common-Pb corrected 207Pb/206Pb ratios into alignment with reference values (line 170 of the manuscript).
  These solutions are quite straightforward to implement from the command line, but not with a GUI. In anticipation of a user-friendly implementation of these fixes, I will be more upfront about simplex' existing simplifications in the revised manuscript. I will add a paragraph to the introduction and/or conclusions explaining (to quote the reviewer) "what simplex CAN do (today) and CANNOT do (today)."
  Dr. Rayner asks for an example where covariance matrices of isotopic data improve the accuracy and/or precision of geochronological data. I think that an error weighted mean (as in Section 13 of Vermeesch, 2015, doi:10.1016/j.chemgeo.2015.05.004) would be a good way to satisfy this request.
  Her other comments are all minor and easy to address. I will document the corresponding changes in the cover letter of the revised manuscript. There are just two points where I have decided not to follow the reviewer's suggestions:
  1. A review of the 'conventional approach' to drift and dead time correction. First, I am not sure if there actually is a 'conventional' way to do these calculations that applies to all instruments and labs. Second, the current practice to apply a dead-time correction to the data before the drift correction turns integer counts into real numbers, which are incompatible with the Poisson distribution. To reduce the reviewer's confusion, I will try to explain the rationale behind Sections 8 and 9 more clearly. However, I would prefer not to dedicate too much space to describing existing procedures that are, in my opinion, suboptimal.
  2. Plotting the synthetic data of Table 1 on Figure 2. First, the data of Table 1 are raw measurements, which do not have a 1-to-1 match with the isochron surfaces of Figure 2. Second, the example of Table 1 would produce a tightly clustered set of points in Figure 2, without uncertainties. Iam concerned that this would confuse rather than illuminate the reader. Third, the addition of data points to Figure 2 would make it less 'pretty', and less universally applicable.
  On a related note, I have completely redrafted Figure 2, because it had some mistakes in it. These mistakes happend because I made the original version of this plot many years ago, and have since used it in several presentations, research proposals etc. The figure was changed numerous times (fonts, orientation, colours), and somewhere down the line the scales were messed up (Reviewer 1 pointed out one of the errors). The landing page of the simplex website contains a fresh version of this crucial figure, which will be also used in the revised manuscript.
  
  Citation: https://doi.org/10.5194/gchron-2022-4-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to minor revisions (further review by editor) (27 Jun 2022) by Noah M McLean

This contribution outlines a data reduction workflow for U-Pb geochronology by SIMS and describes software that implements these algorithms for datasets from both major SIMS manufacturers. Two thorough, constructive reviews are generally positive and verify that the contributed software works as described and on the whole generates reasonable results. I echo much of the reviewers' praise of this article's progress in improving the statistical approach to SIMS data handling, and to providing open source software that implements the approach.

As written, this article makes a significant contribution to the field --- not just for SIMS U-Pb geochronology, but other mass spectrometry fields that can make use of the innovations and approaches outlined here. As such, I recommend acceptance of this article with minor revisions, as outlined and accepted in the posted reviews and author responses, as well as some suggestions of my own below.

As written, the audience for this article is relatively small: those with relatively (for geochronologists) advanced knowledge of statistics and mass spectrometry. Finding the right balance between accessibility and detail/accuracy is tricky for every paper in this field. The author is well aware of this, and I recommend leaving the decision of where to draw this line with the author. However, the article is quite concise as written, and I think it has room for significant additional context and additional explanatory text.

One place this article is exceptionally clear is Sections 2 -- 3, in a cogent explanation of the benefits of the compositional data approach taken throughout the paper. The approach most often employed currently is explained, its assumptions and underlying flaws detailed, and a solution is explained and justified. This approach can be extended to the model parameterization (Sections 6 -- 8) and the time-dependent corrections described in Sections 9 -- 11. In which cases are `standard' equations being recast as log-ratios (my understanding of the blank subtraction described in Section 6) and in which cases are fundamentally new procedures being outlined (my understanding of Sections 10 and 11)? Can the model parameters in equations 7 -- 23 be illustrated in one or several figures help the reader understand how they work?

Another suggestion for clarity is to remove the isotope-indexing subscripts like x and y from the model parameters and replace them with isotope names to the fullest extent possible. Additionally, the Greek letters can be replaced with multi-letter descriptive sub/superscripts. Thus, phi_x might become something like 204Pb_cps^norml or cps(204Pb)_norml for the normalized intensity in cps of 204Pb. Writing out equations with the names of the species involved, informative tags that don't need reference to a key or inline explanations somewhere earlier in the manuscript, and even duplicating similar equations -- frequently -- was a core strategy in "An algorithm for U-Pb isotope dilution data reduction and uncertainty propagation." I believe that strategy aided the readability of the article and facilitated the community understanding needed for the supporting software's adoption. Alternately, I believe the absence of clear equations in "Algorithms and software for U-Pb geochronology by LA-ICPMS" was partially responsible for its failure to similarly connect with its community, a fact the author may have considered when writing this as a math-oriented manuscript.

The level of abstraction in this article is closer to the writing of Ken Ludwig, whose brilliant work is nonetheless notorious in the geochronology community for its difficulty to understand and implement, at least for those who spend less time around math and statistics. If the author feels that the math foundations are easier to understand as written then the concise equation might be usefully published alongside its isotope-specific form. This article is short as written and certainly not in any danger or being too long, and the digital "ink" is free. The discussion points above are suggestions, and I recommend decisions on the level of detail and description be left to the author.

Line-by-line suggestions:

Line 8 -- Change 'Poissonian' to 'Poisson'

Line 9 -- Change "(SEM)" to "(SEMs)".

Line 10 -- Change "zero count" to "zero-count"

Line 17 -- Remove the comma after "samples"

Equation 1 -- This unconventional formulation of the 206/238 age equation is the only place where a 238U/204Pb ratio appears in an age-related calculation, and it is omitted for instance in equations 2 and 3 and does not appear in equation 21 for the blank-corrected 206/238 logratio. I understand the emphasis on deriving a U-Pb date from relative isotopic abundances, but suggest either omitting it here for clarity or adding additional explanation tying this equation to the rest of the text.

Table 1 caption -- mA is a rather large beam. The largest beams in the example datasets are in the range of pA, and in the single-digit mV. The point that the units does not matter stands, but using relatable units helps.

Line 57 -- Specify any two real numbers

Line 58 -- Change "equals" to "ought to equal"

Line 67 -- Include a brief explanation of the magnitude of difference you expect with a logratio approach -- the difference relative to an uncertainty measure depends on the relative standard deviation of the measured data and also on how many data points are averaged.

Line 73 -- Provide an example, e.g., here 2.262 for df = 9.

Line 75 -- In conversation with those in the SIMS U-Pb field, including many of its pioneers, I have heard the following argument: negative isotope ratios/abundances are ok in the presence of noise and censoring or forcing them to be positive will bias averages derived from the noisy data. For instance, the average of many noisy, near-zero isotopic abundances will average zero if the negative abundances are included, but be biased upward if only the positive abundances are included This is a worthwhile place in the manuscript to address this concern, as well as a useful exercise in thinking about how the results of SIMS U-Pb data reduction are used beyond just calculating dates. Are there other tools that simplex can offer to help SIMS analysts with routine QA/QC measurements?

Line 91 -- The statement around "both random and systematic" is too simplistic. Shared systematic uncertainties like the U decay constants do not need to be propagated to compare SIMS U-Pb and TIMS U-Pb, but the TIMS tracer uncertainty and SIMS primary standard age uncertainty (sans decay constants and perhaps sans tracer calibration) will need to be propagated.

Equation 2 -- Is this 206Pb/238U ratio corrected for Pbc? I'm actually not sure if it is or should be. Also, the 2 in parentheses for the species 238U16O(2) is confusing -- I'd leave discussion of different calibration schemes to later in the article.

Line 145 -- Change "into very" to "into the very"

Line 148 -- This reference to "the full covariance matrix" is a bit vague. I suggest making it concrete with an example -- 4 components yield 3 relative abundances/logratios, which require a 3 x 3 covariance matrix to describe the uncertainties and uncertainty correlations.

Line 150 -- Note that you'll need an expanded logratio covariance matrix to capture the inter-spot error correlations, whose size scales with the number of spots that will be interpreted together.

Figure 2 -- Include the necessary info needed to recreate this diagram in the caption -- the 238/235, 206/204, and 207/204 ratios, as well as the assumption that they are the same for all age regions. Also, please use the figure caption to explain the "regional geography" of these plots. What do the boundaries of the colored regions mean? Where is the familiar concordia curve (on the right)? To make this figure even more informative, add a third ternary field to the left-hand plot with 206Pb-238U-207Pb, so that it is more analogous to the back face of the plot on the right and so that a concordia line will be shown on a ternary plot. Also, the right-hand plot has its back panel coordinate axes reversed from the (Wetherill) concordia convention that has 206/238 on the y-axis and 207/235 on the x-axis. An even better idea might be to plot 207/206 vs. 238/206 in log-ratios on the right, as the Tera-Wasserburg concordia plot is more widely used in the SIMS U-Pb community. A more helpful caption might include a discussion of stengths/weaknesses of each plot (e.g., mixing lines are straight on left, elliptical confidence regions for log-normally distributed uncertainties on right). Finally, if the purpose of this plot is to illustrate the compositional nature of U-Pb geochronology data and its transformation to a log-ratio space where calculations are made, then this should be articulated in the caption. If the purpose is to provide new visualizations for U-Pb geochronology measurements and uncertainties, then please add some data, perhaps on a copy of the original with axes rescaled appropriately. Geochronology is an online journal and there is no cost associated with large color figures. I would support a full-page Figure 2 if the expanded figure more fully illustrates the principles in this paper. Geologic maps, stratigraphic sections, and multi-proxy correlations often take up full pages, so key math/data visualizations should not be bashful.

Line 170 -- Change "can" to "can be"

Line 186 -- Add a line to describe what "normalized" means in this context, in addition to equation 7 providing the mathematical definition.

Line 212 -- The bias happens for ratios between high- and low-intensity ion beams. The dead time effects on two high-intensity ion beams with the same average intensity will cancel out.

Equation 13 -- I believe the first dx on the RHS of this equation is a dwell time and the second is the dead-time -- two very different parameters with the same symbol.

Equation 13 -- This expression is the same as the conventional deadtime equation for a measured intensity, corr = meas/(1-dt*meas) where corr and meas are corrected and measured intensities and dt is the deadtime. Their equivalence was not obvious to me until I did some calculations on my own. This I think is at odds with the assertion on line 221 that "there is a fundamental difference between this approach and existing SIMS data reduction approaches." This is certainly true for the the log/Poisson link functions and drift corrections, but would need further support here.

Line 225 -- Multicollector machines can and do run single-collector routines.

Equation 16 -- For clarity, describe what the background is. The manuscript also refers frequently to the 'blank', as defined (?) on line 38. The equations use 'b' and 'bkg'. If (as I understand) these are the same, please harmonize their use in text and equations. For Faraday detectors, the bkg as written would also include the reference voltage for that Faraday, which might be below zero.

Equations 16 and 17 -- Identifying alpha and gamma in a figure (Figure 4) and referencing that figure would be very helpful. In Figure 4, it took me a long time to understand (?) that the gamma and alpha of equation 17 correspond to the dotted lines only, and not to the solid lines -- perhaps switch which is solid and dotted? It is also unclear to me, based on the (conflicting?) explanations here and in the Appendix (lines 406-414) whether you use species- (e.g., ion channel-) specific or element-specific gammas in the within-spot drift correction. This is one of many occasions where I as a reader would prefer several equations with specifically-named species and parameters over one equation with several multiply-indexed Greek letters.

Line 233 -- There will be Johnson and shot noise on the Faraday detector.

Equation 19 -- I suggest the multiply-indexed parameters be explained/illustrated in a concise figure, with or without accompanying data. The labeled axes,

Line 287 -- Please provide more details about the Cameca 1280HR -- is it a single collector? Is that collector an ion counter? Likewise on line 292 for the SHRIMP II. Please specify all times listed are for a single "mass station", not the total time during the analysis.

Line 305 -- Clarify if by "the same" you mean they have the same slopes are the slopes have the same sign.

Figure 4 -- Please specify which analysis from the SHRIMP dataset this figure shows. See comments on equations 16 -- 17. Why does this figure have time on the x-axis and Figure 5 have 'cycle' on the x-axis?

Figure 5 -- Can you explain here or near Equation 19 how the parameterization you use results in a piecewise continuous line? Why do the ratios have a discontinuous derivative when the intensities (in my understanding) are fit with the single dotted (?) line in Figure 4? Is there a way to illustrate uncertainties in this figure?

Line 336 -- Specify the specific systematic uncertainties that increase error estimates by different amounts.

Figure 6 -- Please specify whether slope, intercept, and data ellipse uncertainties are ±1 or 2 sigma (I think the first two are 1, the latter 2). I don't understand the point made in the figure caption about the correlation of the slope and intercept uncertainties. Their correlation has little to do with the data scatter/precision and more to do with where they're located. The farther their centroid and range from the origin, the more correlated their slope and intercept uncertainties are. A simple change of variables by translation would change the correlation but not the (translated) fit. It would be more clear to report the correlation coefficient in addition to or in place of the (partially redundant) covariance matrix. I think this point detracts from your well-made point that uncertainties in the calibration curve result in uncertainty correlations among the spots.

Equation A2 -- I'm unfamiliar with the LL notation. Is this just log-likelihood? If so, define in line 411.

Hide

AR by Pieter Vermeesch on behalf of the Authors (11 Jul 2022) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (30 Jul 2022) by Noah M McLean

ED: Publish subject to technical corrections (02 Aug 2022) by Klaus Mezger (Editor)

AR by Pieter Vermeesch on behalf of the Authors (02 Aug 2022) Author's response Manuscript

Short summary

Secondary ion mass spectrometry (SIMS) is the oldest and most sensitive analytical technique for in situ U–Pb geochronology. This paper introduces a new algorithm for SIMS data reduction that treats data as compositional data, which means that the relative abundances of ²⁰⁴Pb, ²⁰⁶Pb, ²⁰⁷Pb, and ²³⁸Pb are processed within a tetrahedral data space or simplex. The new method is implemented in an eponymous computer programme that is compatible with the two dominant types of SIMS instruments.