the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An algorithm for U–Pb geochronology by secondary ion mass spectrometry
Download
- Final revised paper (published on 19 Aug 2022)
- Preprint (discussion started on 04 Mar 2022)
Interactive discussion
Status: closed
-
RC1: 'A Solid Foundation for Accessible Best Practice in SIMS U-Pb Geochronology Data Reduction', Morgan Williams, 11 Apr 2022
This manuscript provides a well-written and logically-organised description of some of the key challenges in addressing data reduction of SIMS U-Pb geochronology, and succinctly describes best-practice approaches to the required processing steps. The author clearly identifies the main limitations and inaccuracies of commonly used approaches to dealing with mass spectrometric data in compositional systems, which as noted by the author have been acknowledged in the community for some time. The documented approach builds upon other recent work in the community encouraging the use of appropriate compositional data handling, and the preservation of full covariance matrix throughout the data reduction process where possible. The software accompanying this manuscript (simplex) provides a flexible interface (with online and offline in-browser applications and access to a command line interface) and supports the key data formats for SIMS geochronological data from SHRIMP and Cameca instruments. The author is to be commended for providing the software in an open-source manner with a permissive-use license, and for ensuring the approach is accessible via multiple interfaces (allowing both high-level and low-level interfaces catering to the broader community). Some comments regarding the manuscript and associated software are attached below, with some specific technical suggestions for the manuscript and software noted separately.
- The implied extension of other software (e.g. isoplotr, in the first instance) to accept the fully-specified outputs of simplex will be a key next step to make best use of this work and bridge the full pipeline from instrument outputs to research outputs.
- The key concepts are of general interest beyond the geochronology community, with direct application to other SIMS-based isotope ratio measurements. In encompassing the major steps in data reduction of compositional mass spectrometric data, the software is flexible enough to reduce e.g. Cameca oxygen and sulfur isotope data (as is demonstrated in the application). In some cases, additional steps might be appropriate for specific scenarios (e.g. dealing with electron-induced secondary ion emission for negative-ion measurements on SHRIMP instruments where an electron gun is used for charge compensation in insulating materials, Ickert et al. 2008), but the foundational steps of the workflow are the same.
- Regarding the operation of the software, I can verify that online application interface generally works as expected and the provided installation instructions (in the GitHub Readme) were sufficient guidance for getting the package running locally (at least for those who either have or are comfortable setting up an R distribution). I’ve been able to verify the use of the software across most of the claimed platforms (with a few minor bugs, described in the attached document) – online, on both Windows and Linux (via Windows Subsystem for Linux) using an Anaconda R distribution. The references to simplex outputs in the manuscript corresponding to the demonstration datasets (e.g. Figure 6b, Figure 7) match the current outputs of the software. With appropriate adjustments, I was able to process SHRIMP U-Pb data (both .op and .pd files, measured on the Australian National University SHRIMP II in 2013) in the same manner as the demonstration datasets.
- The software provides some functionality to enable reproducibility of data reduction even in a graphical interface where this is commonly a challenge, with the ability to save and load configurations for data processing. With future development, any extension to include provenance information would be a valuable addition, potentially via metadata extracted from the input files where possible.
- The documentation for processing options within the in-browser application is simple but explains in general terms the processes and options available, and the interface-specific documentation for the R package itself indeed appears to be complete.
- Regarding propagation of matrix-based uncertainties in geochronological data, McLean et al. (2011) provide a relevant digestible overview (although with a focus on specific considerations for ID-TIMS) which may be a useful reference to cite in text or in the appendix, at the author’s discretion.
Manuscript: Technical Corrections and Specific Suggestions
- L9: typo “handes" should be “handles”.
- L19: The series of short sentences here are succinct but do not read well; rephrasing along the lines of ‘In contrast to SIMS instrumentation, LAICPMS instruments are built by numerous manufacturers, and the widely used data reduction software packages are generally compatible with all of their data formats”.
- L122, L126: Specifically refer to the additive log ratio (ALR) used.
- L142: The term ‘trick’ here perhaps confounds the intention and doesn’t lend to confidence in the use of it. When referring to generalization, perhaps refer to Eqn. 4 (which is what is generalized, as the logratio approach is not specific to a dimensionality, even if examples are 2/3 dimensional).
- Figure 2a – I think the labels on the upper figure may be reversed. This might also be better termed a ‘projection’ from a four-component simplex (which would be a tetrahedron). This diagram depicts the compositional data aspect of the manuscript and software rather nicely.
- Eqn 13, Section 9: Dead-time correction – I think this exclusively applies to a non-extended deadtime correction? Generally, the difference between extending/non-extending is likely to be small except for particularly high count-rates (maybe for every high U, or analysing a minor element on an EM), but it might be worth noting this, at the author’s discretion.
- L224: SHRIMP data processing – Depending on the scenario, I think Dodson interpolation may also be used to similar effect for dealing with smooth intensity changes during cycles of data collection within a single spot (Dodson, 1978).
- Figure 5: Depending on the desired manuscript formatting, the capitalization of this figure caption may be off (e.g. “a) Blank …”)?
- L321: “Figure 6.a” – the references to figures may be easier to read without the period (e.g. Figure 6a; as is used in the Figure 7 caption).
- Fig7b: I may be worth nothing that the white ellipse represents the concordia age and uncertainty.
- References: Add DOIs to references where possible.
Software and Repository: General Notes
- The license for the software is specified in the GitHub readme and R package description (the GPL-3 copyleft license), but is not in the repository; generally, a copy of this license should be included with the package repository (for visibility, and to meet the terms of the license).
- I would suggest releasing a specific software version corresponding to the publication (e.g. 1.0, or 0.1 for a ‘beta’ release) such that future users can readily recognise updated versions relative to that described here (particularly if this manuscript is intended to be the primary/secondary citation for simplex, rather than citing the software package itself).
- In the browser-based application, plotting and other functions generally work as expected, and as in the hosted online version (as far as tested, a few specific bugs/issues/suggested features are noted below).
- The selenium tests included in the repository include reference to local files which are not available; these may be able to be modified to be distributable (potentially helpful for debugging).
- Unit tests for the R package itself may be a useful addition for future development.
- To get the app to run via WSL, manual specification of the port is necessarily, as the browser can’t necessarily be launched directly (I believe the specifics of this are managed by shinylight; this would not be a problem on a standalone Linux distribution, but may be worth documenting).
- All in-app plotting suffered from a Cairo-SVG related bug on Windows (‘svg: Cairo-based devices are not available for this platform’). This may potentially be rectified with an additional dependency, or potentially a change in the plotting output format.
- During my tests in a base-R distribution (i.e. not from RStudio), remotes-based installation in from the command line requires the setting of an environment variable (related to unzipping the repository contents).
Software and Repository: Browser-based Use
- Suggested Feature: For tab 5 (samples), it would be good to exclude standards by default; and have multiple comma-separated patterns for sample matching.
- Suggested Feature: Add a CSV download option for tables (these can currently be copy-pasted into common spreadsheet programs, without headers/indexes).
- Suggested Feature: Potentially preserve source file name on JSON export for some simple provenance metadata. The source files contain more of this metadata themselves, and this could be a feature to consider for the future (e.g. with XML import for SHRIMP data, which is suggested to be planned for future development).
- Suggested Feature: For relevant stages, automatic refreshing for plots when the analysis # is changed would be useful.
- Bug (tab 5, using tabulate): adding more than one option to the table gives notation for duplicated column names (e.g. A, B, C, A.1, B.1, C.1, A.2, …). In particular, the structure of the ‘cov’ option with vector values indexed by integers and sparse covariance matrix across all samples is not immediately clear in the tabulated output. After a second look this table is readily understood, but would ideally it would be indexed by appropriate and unique column names.
Software and Repository: Programmatic Use/CLI
- Plotting via the programmatic interface did not exhibit any issues on Windows, in contrast to the browser-based app.
- The example CLI workflow in the GitHub Readme gives errors at the second step, after modification to use the included `./inst/SHRIMP.op` file; using the included ‘SHRIMP_UPb’ dataset (not from an original file) enabled the use of this workflow as below:
lr <- logratios(SHRIMP_UPb)
stand <- standard(preset="Temora-t")
paired <- pairing(lr,stand=stand
cal <- calibration(lr,stand=stand,pairing=paired,prefix="TEM")
result <- calibrate(cal,exterr=TRUE)References
Dodson, M. H. (1978). A linear method for second-degree interpolation in cyclical data collection. Journal of Physics E: Scientific Instruments, 11(4), 296. https://doi.org/10.1088/0022-3735/11/4/004
Ickert, R. B., Hiess, J., Williams, I. S., Holden, P., Ireland, T. R., Lanc, P., Schram, N., Foster, J. J., & Clement, S. W. (2008). Determining high precision, in situ, oxygen isotope ratios with a SHRIMP II: Analyses of MPI-DING silicate-glass reference materials and zircon from contrasting granites. Chemical Geology, 257(1–2), 114–128. https://doi.org/10.1016/j.chemgeo.2008.08.024
McLean, N. M., Bowring, J. F., & Bowring, S. A. (2011). An algorithm for U-Pb isotope dilution data reduction and uncertainty propagation. Geochemistry, Geophysics, Geosystems, 12(6). https://doi.org/10.1029/2010GC003478
Citation: https://doi.org/10.5194/gchron-2022-4-RC1 -
AC1: 'Reply on RC1', Pieter Vermeesch, 06 May 2022
I would like to sincerely thank Dr. Williams for thoroughly testing 'simplex' on Windows, Mac and Linux, using the online, offline and command line interfaces. He even installed simplex in the cloud, using AWS! I have already implemented most of the reviewer's pertinent suggestions in the latest version of the software (which is available on http://isoplotr.es.ucl.ac.uk/simplex/). For example, I have:
1. enabled comma-separated lists of sample prefixes
2. added a CSV download option
3. auto refreshing for drift and logratio plots
4. fixed the issue with duplicate column names
5. added a copy of the GPL license to the GitHub repo
6. added installation instructions for hosting online simplex mirrors
I have also made some other improvements, such as the addition of sample names as row names in the output tables.
I am in two minds about the reviewer's request to include all the local files in the selenium tests. On the one hand, I agree that this would make it easier to collaborate with other developers. But on the other hand, it would inflate the GitHub repository and expose proprietary data online. I honestly do not expect many others to join the simplex development team. However, I would be more than happy to add more test files to the GitHub repository should this change.
With regards to unit tests for the R functions, these actually already exist. The simplex package passes all CRAN checks, which means that all public functions are fully document, with examples. Building the simplex package with CRAN checks turned on runs all these tests on the user's computer and save their output in a folder called simplex.Rcheck.
I was not able to reproduce the Cairo-SVG error in Windows. Perhaps this problem expressed itself in an earlier version of the program and has since gone away?
Version 1.0 of the software will be released on GitHub as soon as the GChron manuscript is accepted for publication.
Dr. Williams also makes some useful suggestions to improve the paper itself. These suggested are all minor and will be implemented in the revised manuscript. I will document them in its cover letter.
Citation: https://doi.org/10.5194/gchron-2022-4-AC1
-
RC2: 'Comment on gchron-2022-4', Nicole Rayner, 21 Apr 2022
This is a well written, well organized, clear and easy to follow presentation of issues in SIMS data reduction. I agree entirely with the sentiments expressed in the opening paragraph of comment RC1 and thus will not repeat them here.
I approached this review as a manager of a SHRIMP ion probe laboratory and thus someone with a hands-on perspective of dealing with SIMS data in a geological research setting. While this paper shines a light on issues that have been known by parts of the community for some time and provides a novel method of addressing these, at the current level of development the simplex data reduction protocol is not practical even with the limited functionality that the author is upfront about. To be clear, my reservations about simplex as a tool does NOT indicate reservations regarding the publication of this manuscript and the concepts therein. This is an excellent manuscript that introduces the geochronology community to a novel method of data reduction. As my concerns mainly relate to user needs within simplex (as opposed to the mathematics) I hope that by flagging these it may lead to changes that will engage the geochronology community to enhance the uptake, utility and ensure community-wide rigorous testing of this approach with a wide variety of “real-world” datasets in all their imperfect glory.
I will give a couple of examples below of my experience trying to use simplex (both online and through the R GUI) that I hope illustrate overarching issues with the current structure that limit the utility of simplex.
I had quite a bit of difficulty importing in new files. I was unable to get a .pd file to properly load (online or using the R GUI). When I attempted to load a .pd file I got a cryptic error message, then the numbers in the table in section 2 (Drift) changed but they did not correspond to the actual data and the spot names under Aliquots did not change from the test file. Clearly the file did not load properly (and some parts of the previously loaded dataset persisted) but given the opaqueness of the interface I have no idea what the problem was. I think it was partly related to the fact that there doesn’t seem to be a way to “clear” the existing data out of the algorithm and start from scratch. Relatedly, there is no record of what data file is being processed. Better tracking of what is being processed (and how it is being processed - for example when defining the reference materials) key to making this a useful tool because ultimately the data for a given rock sample stands alone but needs to be linked to its data reduction metadata (but not necessarily to other unknown samples in that analytical session. A single JSON output is impractical in that regard).
I had better luck loading a .op file but even then the behaviour was inconsistent. Part of this inconsistency is due to the fact that some (most?) interfaces seem to require a “refresh” of some sort (e.g. press “Plot” again after excluding a scan in Drift) to redo the calculations. This also means that things get out of sync. If you advance to a different aliquot in DRIFT, the plots don’t advance, but there is no label on the plots to indicate which aliquot is being displayed. These are just two examples of instances where the algorithm as currently structured does not meet the needs of geochronologists working in SIMS labs producing data and then interpreted ages. Reproducibility and even more importantly, traceabilitity, in the data reduction package is crucial and this needs to be considered and baked into every step of any new process.
This manuscript is an excellent first step in starting to addresses the mathematical challenges of dealing with mass spectrometer data, while acknowledging that a great deal of work is still required to translate this contribution from better handling of mass spec counts all the way to geological interpretations of ages. In order to bring the geochronological community along this journey (which I am assuming is one of the reasons for this paper) I think that this manuscript would benefit from a more explicit presentation of what simplex CAN do (today) and CANNOT do (today). Given that this is a first step in a long path toward usability (and while I appreciate that not all the benefits of the covariance matrices can currently be accomplished) I think many readers from the geochronology community would like to see a comparison of the previous approaches (squid or NordSIM spreadsheet) and simplex as a first demonstration of its utility.
Specific comments/Technical corrections
Line 75, while things like negative lower limits of confidence intervals are physically impossible, the “reliability of analytical uncertainty assigned to dates….” is best addressed by systematic long term evaluations of laboratory data, including by inter-laboratory comparison work not just by working in compositional space. I suggest the last sentence of this paragraph be removed.
Line 86 – inconsistent use of “standard” “reference material” and “reference standard” throughout the manuscript. Suggest use of “reference material” throughout particularly because of the ambiguous meaning of “standard” (e.g. standard error as statistical term or as the uncertainty of a reference material). If this change is made use abbreviation RM as subscript when needed in equations eg. L102, 105.
Caption Figure 1 – “calibration error” should be calibration uncertainty
108. Here you refer to “samples” when you really mean “aliquots”. Later on you refer to “within-spot” drift or other uncertainties, for consistency of usage/clarity for the reader I suggest you refer to these as “analytical spot”.
Line 129 consider illustrating Table 1 data in a ternary diagram prior to mapping to Euclidean space (data points on Figure 2 perhaps?).
Line 150 – here “inter-sample” should be “inter-spot”, as “inter-sample” to most people will signify 2 different ROCK samples, not analyses/aliquots/spots
Line 164 – typo “black” instead of “blank”
Section 7 “Zeros” – consider merging with the blanks section, they are two parts of the same problem and I find the current breakdown into two sections jarring. Since section 7 is so brief, I am not even very clear why it is needed.
Sections 8 and 9 – as a SIMS mass spectrometrist/geochronologist and not a statistician I am searching for points of familiarity, which up to this point I am largely able to do. You lose me here in these sections. More direct explanation of the steps in traditional data reduction that replaced by this approach and then how these values get used/incorporated into ratios would be helpful. Part of the difficulty in following is that is the traditional data reduction approach the deadtime correction happens first, but in the paper “Deadtime” follows the section about “Dealing with count data” which seems counterintuitive.
Line 250 states that mass-dependent fractionation is commonly ignored. It has been established from long term reproducibility studies that this is not true, and some labs do not ignore it (e.g. GA uses OG1 and the Geological survey of Canada uses 1242, see Davis et al. 2019). This paper doesn’t propose that this algorithm (or more broadly approaching SIMS geochronology data as compositional) will solve all issues related to data processing of SIMS data, however a number of issues such as this are glossed over as secondary concerns. This is an example one of my general statements about this paper, where in an attempt to address some of problems related to the data reduction (sections 2-4, 6, 7) others that are known in real-world SIMS data are minimized (e.g. blanks greater than 204, or overcounts in 204).
Line 283, refer to it as Temora2 throughout the manuscript (not just in the parentheses this one time) as that is its name.
Line 293 refer to mass spec “cycles”, elsewhere “sweeps”. I prefer cycle, but either is fine as long as consistent (including axis labels of figure 5)
Line 301, not negative, positive
Line 302 “207” superscript in error
Line 302 Again I’m trying to map this to the usual approach of secondary beam normalization of SHRIMP data. Is this treatment instead of SBM norm or before/after? I think in lieu but I don’t understand how this might be affected by drift in the primary beam intensity which might either enhance or minimize the depth/oxygen availability effect illustrated in this diagram. For example if over the time of the analysis the primary beam intensity decreases then increases, the within-spot drift may be is U-shaped, not linear and thus a single slope regression applied to each cycle isn’t appropriate.
Line 307 – the strict enforcement of a positive value for 204-b doesn’t reflect the real-life behaviour of ion-counted data and could indicate a real problem with the analytical setup. I am concerned that this approach “hides” a real problem.
Line 312 – insert “within-spot” when referring to fractionation here for clarity.
Caption 5 – since side a is blank and drift corrected (eg. Pb204 – b/Pb206 - b), are the converted counts shown in part b also blank corrected (in which case the vertical axis should be labelled Pb204 – b, Pb206 – b etc)
Line 328 – edit text to “Figure 7 applies the Pb/U calibration (Equation 23) to 91500….” Makes it easier for the reader to know “Equation 23” does without having to go back to check.
Line 330 – again “inter-sample” when should be inter-spot (reminder to rationalize this usage throughout)
Line 331 – figure 7 uses Temora2 as the reference material and 91500 as the sample. Figure 8 uses 91500 as the reference material and Temora2 as the sample. Both of these materials are commonly used as RM’s and so switching their usage back and forth is tricky for the reader. Perhaps there is one but I don’t see any reason why Figure 8 can’t be recast with Temora2 as the RM which would streamline things.
Figure 7 caption – specify “using Temora2 as the reference material” at the end of the first sentence.
Line 368 – I’m not sure what is meant by “In contrast with existing data reduction protocols, the new algorithm simultaneously processes all the aliquots in an analytical sequence.” Please clarify.
Line 381 – While I appreciate that not all the benefits of the covariance matrices can currently be accomplished, it would be great to see a comparison of results between the previous approaches and this one one using the provided datasets.
Davis, W.J., Pestaj, T., Rayner, N., McNicoll, V.M., 2019. Long-term reproducibility of 207Pb/206Pb age at the GSC SHRIMP lab based on the GSC Archean reference zircon z1242. Geological Survey of Canada, Scientific Presentation 111, 1 poster, https://doi.org/10.4095/321203
Citation: https://doi.org/10.5194/gchron-2022-4-RC2 -
AC2: 'Reply on RC2', Pieter Vermeesch, 06 May 2022
I am sorry that Dr. Rayner was unable to import her own .pd and .op files into 'simplex'. This reflects the fact that Cameca side of 'simplex' has undergone more rigorous testing than the SHRIMP side. The reviewer kindly shared the problematic .pd and .op files with me, and the issue was quickly fixed: it turns out that these files contained some comments, which were absent from the Geoscience Australia files that I had previously used to test 'simplex'.
The improved robustness of the data import function also addresses the reviewer's concern that it 'simplex' is not able to 'clear' the existing data. Whenever a new dataset is loaded, it replaces the previous data and resets the calculations. I have also fulfilled Dr. Rayner's request for "better tracking of what is being processed". The latest version provides the user with a lot more status updates, which is especially useful when loading large SHRIMP .pd files.
Although the reviewer is correct that 'simplex' was, first and foremost, created as a vehicle to test and demonstrate the algorithms presented in the GChron manuscript, I do hope that it will be used for actual isotope geoscience research in the future. I have posted a request for further feedback on Facebook's 'SHRIMP Fan Club' page. Hopefully, this feedback will iron out any remaining bugs and issues.
The reviewer points out two simplifications in the new algorithm, which are only briefly mentioned in the original manuscript:
1. The assertion that mass-dependent fractionation of the Pb-isotopes can be safely ignored in most cases.
2. The assumption that 204Pb is measured accurately, i.e. is free of isobaric interferences ('204Pb overcounts') and has been adequately peak-centred.
The manuscript already mentions these issues, and suggests solutions to them. Mass dependent fractionation can already be corrected using the stable isotope functionality of 'simplex' (see line 385 of the original manuscript). Possible issues with the 204Pb measurements can be corrected by manually specifying a session blank that brings the common-Pb corrected 207Pb/206Pb ratios into alignment with reference values (line 170 of the manuscript).
These solutions are quite straightforward to implement from the command line, but not with a GUI. In anticipation of a user-friendly implementation of these fixes, I will be more upfront about simplex' existing simplifications in the revised manuscript. I will add a paragraph to the introduction and/or conclusions explaining (to quote the reviewer) "what simplex CAN do (today) and CANNOT do (today)."
Dr. Rayner asks for an example where covariance matrices of isotopic data improve the accuracy and/or precision of geochronological data. I think that an error weighted mean (as in Section 13 of Vermeesch, 2015, doi:10.1016/j.chemgeo.2015.05.004) would be a good way to satisfy this request.
Her other comments are all minor and easy to address. I will document the corresponding changes in the cover letter of the revised manuscript. There are just two points where I have decided not to follow the reviewer's suggestions:
1. A review of the 'conventional approach' to drift and dead time correction. First, I am not sure if there actually is a 'conventional' way to do these calculations that applies to all instruments and labs. Second, the current practice to apply a dead-time correction to the data before the drift correction turns integer counts into real numbers, which are incompatible with the Poisson distribution. To reduce the reviewer's confusion, I will try to explain the rationale behind Sections 8 and 9 more clearly. However, I would prefer not to dedicate too much space to describing existing procedures that are, in my opinion, suboptimal.
2. Plotting the synthetic data of Table 1 on Figure 2. First, the data of Table 1 are raw measurements, which do not have a 1-to-1 match with the isochron surfaces of Figure 2. Second, the example of Table 1 would produce a tightly clustered set of points in Figure 2, without uncertainties. Iam concerned that this would confuse rather than illuminate the reader. Third, the addition of data points to Figure 2 would make it less 'pretty', and less universally applicable.
On a related note, I have completely redrafted Figure 2, because it had some mistakes in it. These mistakes happend because I made the original version of this plot many years ago, and have since used it in several presentations, research proposals etc. The figure was changed numerous times (fonts, orientation, colours), and somewhere down the line the scales were messed up (Reviewer 1 pointed out one of the errors). The landing page of the simplex website contains a fresh version of this crucial figure, which will be also used in the revised manuscript.
Citation: https://doi.org/10.5194/gchron-2022-4-AC2
-
AC2: 'Reply on RC2', Pieter Vermeesch, 06 May 2022
Peer review completion
compositional data, which means that the relative abundances of 204Pb, 206Pb, 207Pb, and 238Pb are processed within a tetrahedral data space or
simplex. The new method is implemented in an eponymous computer programme that is compatible with the two dominant types of SIMS instruments.