the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Examination of the accuracy of SHRIMP U–Pb geochronology based on samples dated by both SHRIMP and CA-TIMS
Charles W. Magee Jr.
Simon Bodorkos
Christopher J. Lewis
James L. Crowley
Corey J. Wall
Richard M. Friedman
Download
- Final revised paper (published on 11 Jan 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 04 Aug 2022)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on gchron-2022-20', Anonymous Referee #1, 23 Aug 2022
This study reviews published and to lesser extent new U-Pb zircon geochronology data generated by two methods: secondary ionization mass spectrometry (SIMS) and thermal ionization mass spectrometry (TIMS) without and with pre-treatment of zircon by chemical abrasion (CA), respectively. The SIMS analyses were carried out over approximately the past 15 years using the SHRIMP II instrument at Geoscience Australia, and they targeted felsic plutonic and volcanic rocks with ages mostly falling into the Permo-Triassic to Cambrian age range. The exceptions are one Cretaceous “calcilutite” and Archean secondary reference zircon OG1, which was analyzed in most of the sessions to monitor Pb-isotopic fractionation. The goal of the comparison is to re-assess the reproducibility of SHRIMP U-Pb geochronology considering updated and improved methodology implemented over the past two decades. This is done in reference to CA-TIMS ages obtained for the same samples with an uncertainty that is considered negligible compared to that of SHRIMP.
This sample-based comparison is for an age range and geological provenance of zircon that is typical for the Geoscience Australia lab: the samples fall into an age range, where uncertainties for Pb-Pb ages are typically larger than those of U-Pb dating. Therefore, uncertainties are dominated by how well the U-Pb calibration curve can be defined and reproduced. The materials are mostly igneous rocks with seemingly “simple” crystallization histories (e.g., compared to metamorphic zircon). In comparing both methods, this includes zircon where the penalty of TIMS with its indiscriminate averaging over multiple growth domains in individual zircon is minor, and seemingly negligible compared to analytical uncertainties. The CA pre-treatment selectively removes zircon, but this is commonly regarded as non-detrimental in obtaining the original crystallization age of zircon.
The approach of comparing zircon ages generated by different methods from such materials, however, necessarily has limitations due to a lack of knowledge and temporal resolution regarding the age homogeneity of the samples. Although some aspect of zircon longevity in magma systems are probably indeed negligible, there are other concerns for zircon age heterogeneity even at the limit of TIMS resolution, that are less clear-cut to dismiss. Only with sufficient sampling, which is however rarely achieved in CA-TIMS studies, might this become adequately constrained. Another critical aspect is that CA-untreated and CA-treated zircon is compared; this – at least for zircon with younger SHRIMP relative to CA-TIMS ages – leaves some doubts whether indeed the same materials are compared in case SHRIMP ages would overlap on zircon areas affected by Pb-loss that would have otherwise been removed by CA. What the manuscript is also missing is a presentation and discussion of U abundances in the analyzed zircons, as this would control potential metamictization, and there are also documented matrix effects for high-U zircon in SIMS analysis.
Considering these problems and omissions, three critical aspects stand out, where I find that the interpretation overstepped what can be extracted from the data:
- There is speculation that the distribution of the age differences between SHRIMP and TIMS ages is bimodal. I doubt this, and would argue that this apparent bimodality is an artifact of the small sampling size. As a demonstration, a simple test was made using the Excel Rand function 35-times and normalizing the results to a mean of 0 and a standard deviation (using the NORM-INV function). It is very easy to generate an apparent bimodal distribution with a difference in modes at approximately the same value stated for the difference between volcanic and plutonic samples (0.7%):
Moreover, a clear identification of a bimodal distribution requires differences between the modes of 3–5 times the standard error (Keller et al. 2018). This is not the case in a distribution where the two modes only differ by 0.7% if the SHRIMP uncertainty was 1%; the standard error of each subpopulation would then be between 0.2 and 0.3% (because of division by square-root of n). Applying the conservative criterion of 4-times the standard error, the difference between the two modes should be at least 0.8 to 1.2%, which is larger than the postulated difference. I therefore do not believe that it is statistically justified to discriminate between plutonic and volcanic samples regarding the apparent deviations between ages determined by different methods; if the authors think that the bimodality in the data is robust, it is their onus to provide an adequate statistical analysis to demonstrate the validity of their assessment.
- There is a risk of circular argument in the data analysis in that the authors aim to realistically constrain the age uncertainty of SHRIMP (which is a priori unknown), but then they use the apparent overdispersion of the data to identify non-analytical causes for the overdispersion, whereas in fact, an underestimation of the uncertainty could produce the same overdispersion. Therefore, one cannot make the unequivocal statement that the distribution of the age differences is non-Gaussian, as this may be biased by a potential underestimation of the actual uncertainties. An important aspect here is scrutiny about the data in this comparison. Although some of the double-dated samples in the database are based on a comparatively large number of analyzed zircons (both, for CA-TIMS and SHRIMP results), there are also samples with severe limitations in one or both data sets (e.g., 1528025, 1594761, 1954030 with only three zircon analyses in the TIMS-group, and often 2 to 3 excluded grains, as well as 1978295 and 1978296 with only 6 or 7 zircons analyzed by SHRIMP, again with a comparatively large number of data excluded). The selection of data to be included/excluded may play an important role (i.e., the sample with the largest deviation has the least SHRIMP data points, and many excluded analyses). It would be reasonable to restrict the comparison to those samples with a robust number of data in each population. In this same context, I think the regression in Fig. 3 is dubious; it is not a valid fit (see P value), and it may be biased by a few data points of questionable quality. Especially as the authors make a point that there is no linear bias in the age difference vs. age, I find this diagram unnecessary and distractive.
- The discussion on potential “geological” causes for the age difference is oversimplified, and it does adequately reflect the current research status on how and to what extent “real” age heterogeneity in zircon is generated (e.g., Burgess et al., 2019, for a well-documented case study for tephras). Studies of young (e.g., Quaternary) systems show age variability within a single volcanic or plutonic sample of several 100’s of ka. This is, however, much less than the percent-level uncertainties of SHRIMP (e.g., 1% for a 400 Ma zircon = 4 Ma) and even less than that of CA-TIMS (e.g., 0.1% uncertainty = 400 ka). Hence, calling for “real age differences in crystallization age” due to “…eruption-aged zircons grains with zircon crystallized earlier in the history of the volcanic edifice” needs to be better justified, and may not advance the understanding on how zircon age variability for comparatively ancient samples arises. There are, however, other ways of generating age heterogeneity in volcanoclastic rocks, for example when ashes are mixed during or after the eruption with materials from other eruptions (please note that this is different from zircon crystallization in the same magma system). In this case, age differences in a single sample may reflect the longevity of a volcanic region, which can span multi-million years. This, however, raises question on the number of crystals that needs to be analyzed for representative sampling (see the discussion on maximum depositional ages from detrital zircon studies). Lastly, I do not understand why plutonic zircon would intrinsically be more vulnerable to Pb-loss than volcanic zircon (“plutonic zircons, which are more likely to have accumulated enough radiation damage to have undergone minor Pb loss”)? Due to comparatively slow cooling of plutonic rocks relative to volcanic deposits, one may argue that volcanic zircons spend more time at low temperature where they are vulnerable to accumulate radiation damage. This assumption is admittedly also simplistic, and accumulation of radiation damage will dependent on the exact thermal history of the samples, but it serves to demonstrate that the authors’ preference is not at all straightforward. Because of this ambiguity, more explanation is required on what the authors insinuate, especially as U-abundances and structural state of the investigated zircon crystals (e.g., through EBSD, or Raman spectroscopy) are not mentioned or discussed.
The manuscript refers to another study in preparation, where a direct comparison of CA-treated zircon analyzed with both methods, and involving reference zircons will be made. I actually expect more insight on the analytical comparability and the assessment of realistic uncertainties from this future study. Due to the significant uncertainties regarding the homogeneity of the natural samples studied here, especially those of pyroclastic origin, there is little new insight, and in fact, the main conclusion stated in this study is that an uncertainty <0.7 % for SHRIMP data is over-optimistic. Hence, in essence, the 20 year old estimate for SHRIMP reproducibility at ~1% appears to be still valid.
Additional comments:
Line 23: This apparent bimodality needs to be statistically verified.
Line 25: “better single-grain age-resolution of TIMS” = this is a bit awkward to read, as the integration of multiple age domains is the main drawback of TIMS. I also doubt if CA-TIMS can resolve genuine pre-eruptive zircon crystallization in the same magma system (= antecrysts) in the age range presented. Even if it did, what would this mean for dating a geological event such as deposition of a tephra (see Keller et al., 2018)?
Line 298: is not included
Line 310: This section is repetitive and tedious to read; this can be condensed to summarizing the main points in a table. In fact, I think section 4.1 which is presently in the discussion, should be presented as the main result.
Line 402: Pb-loss after 3-5 million years seems highly speculative, and not supported by any experimental data on zircon interaction with fluids. As U abundances are not discussed, there is no way to gauge timescales for metamictization, but this is something that the authors should look into and add to the presentation.
Line 405: Yes, I totally agree that this is not statistically robust.
Line 408: I disagree that the shape of the distribution has been assessed in a statistically robust way.
Line 416. Not sure if “p-hacking” is an adequate term; in any case, it has a negative connotation.
Line 456: Why would this be more likely? There seem to be some underlying assumptions here that should be explicitly stated.
Lien 457: This sentence is awkward: What are natural ages? What are chemically abraded ages?
Line 465: I am not convinced that this is a valid interpretation.
Line 469: “SIMS geochronology is not the best method in geologic settings where grains may have real differences in crystallization age that are smaller than the precision of a single spot, but larger than the precision of the final age of the pooled spot values.” I don’t agree with this statement, as a bulk method will create artificially small uncertainties for an age that may not have any geological significance (see discussion in Keller et al., 2018, and elsewhere).
Line 478: “improvements in SHRIMP manufacturing and installation may have reduced the fundamental uncertainty associated with the calibration equation” “May have” reads awkward; the data and interpretation in this paper at least do not support this.
Fig. 2: The PDF is based on assigned uncertainties that may or may not be adequate (see comment to Fig. 4)
Fig. 3: I would omit this plot; the fit has a probability of only 0.003, the slope generated is probably an artifact of the data selection, and the results for OG1 show that this relation is invalid (including younger reference zircon would probably also confirm this). “Cherry picking” and “p-hacking”: why even go there?
Fig. 4: MSWD and probability of fit suggest that there is overdispersion/underestimation of uncertainties for the SHRIMP results.
Fig. 5: Statistical testing of the difference/equivalence of both distributions would be required to demonstrate that this distinction is significant (e.g., using a Kolmogorov—Smirnov comparison).
Additional references:
Burgess, S. D., Coble, M. A., Vazquez, J. A., Coombs, M. L., & Wallace, K. L. (2019). On the eruption age and provenance of the Old Crow tephra. Quaternary Science Reviews, 207, 64-79.
Keller, C. B., Schoene, B., & Samperton, K. M. (2018). A stochastic sampling approach to zircon eruption age interpretation. Geochemical Perspectives Letters (Online), 8(LLNL-JRNL-738859).
Citation: https://doi.org/10.5194/gchron-2022-20-RC1 -
AC1: 'Author responce to RC 1 and 2', Charles Magee, 17 Oct 2022
Reply to referee reports:
We thank both reviewers for their insightful reviews. In response, we would like to take this opportunity to improve the structure of the paper. The source of dispersion (excess to counting stats) in U-Pb SIMS data has been debated for decades; this debate can most succinctly be summarized by the arguments of Black and Jagondinski (2003) on one hand, who say that an excess error term needs to be applied if the source cannot be identified, and Compston (2000), who argues that all the scatter can be traced to bad zircons. We didn’t include either paper in our introduction as the actual data they were discussing (multi-grain pre-CA TIMS aliquots vs SHRIMP 1 data) are obsolete, but the ideas are still current. Indeed, and Black and Jagodzinski (2003) say in their subsection “The Way Ahead”:
“ There are two alternative courses of action to adopt once it is accepted that uncertainties exceeding those predicted from counting statistics can be generated as part of the SHRIMP analytical process. The first is to empirically quantify the magnitude of variation by means of replicate analyses, and then to use SHRIMP only for those projects where such variation (e.g. 1–2%) is acceptable. The other approach is to delve as deeply and objectively as possible into the various sources of uncertainty in SHRIMP dating, so that they can be identified, understood and ultimately either minimised or removed altogether.”
Moving our treatment of uncertainty from the intro to the methods (as reviewer 2 suggests) will allow us to streamline the introduction to highlight how we can now examine both calibration issues and inhomogenous natural sample issues using our data of doubly dated zircons.
As both reviewers recommend removing figure 3 and minimizing the associated analysis, we can replace this with discussion about the calibration. In the current manuscript we skip that analysis and go straight to looking at natural mineral-based explanations, which is jarring when you consider how much of the introduction is spent explaining the calibration. We have re-examined the calibration-related data enough to know that there is information there to discuss, and this would make the paper more interesting and help tease out the competing factors of calibration quality vs geological accuracy.
This leads us to the relevance and usefulness of considering sub-percent SIMS data. It should not be surprising that most of our data are reported with sub-percent precision, as most current SIMS U-Pb data falls into this category. In both rounds of the recent G-Chron U-Pb proficiency testing (Webb et al., submitted), the reported median 2sig uncertainty for the 206Pb/238U SIMS age was 0.5% in round 1 and 0.6% in round 2. As our dataset contains many samples in this range, it is worth considering how SIMS data in that range of precision compares to CA-ID-TIMS.
This leads us to the questions about the source and application of uncertainty in central to the comments from reviewer 1 about the statistical validity of our data analysis.
Using the data as reported, even with the exclusion of the outliers discussed in section 4.2.2, the weighted mean of the reported data has an MSWD of 1.9 and a probability of fit of 0.002, indicating that it is not a homogenous population. Obviously the probability of fit can be increased by arbitrarily increasing the uncertainties of each age- increasing them to 2% give an MSWD of 0.35 and a 100% probability of fit, for example. But as Reiners et al. (2017) suggest in chapter 4, hiding dispersion by the use of excess error should only be done if a physical explanation cannot be found; this is why it is important to consider explanations related to the physical samples. As far as our hypothesis that the data contains two populations is concerned, we would like to point out that the mixture-modelling approach of Sambridge & Compston (1994) shows that the statistically most likely population split is similar to that derived from dividing the samples into intrusive and extrusive rocks.
Obviously we cannot accommodate Reviewer 1’s request to both cull data and have a larger dataset. One result of compiling this data, which was first done in the late 2010’s, is that it tells us which samples are not likely to yield geologically useful ages from SHRIMP analyses, and should be sent straight to a CA-ID-TIMS lab. As a result, the incidence of double dating has been reduced, and there is only one additional doubly dated sample which has appeared since 2016. As it wasn’t available until the manuscript was in internal review, and it is yet another outcrop of the Emmaville Volcanics, which are already over-represented relative to the rest of the Australian continent in this study, we didn’t add it in between reviewers. However, we would be happy to include it in the final paper. As for culling data, we feel that a discussion of the calibration behaviour and quality (see above) will help put various outliers in context.
We appreciate reviewer 1’s suggestion of Burgess et al. (2019), and agree that it is a great paper on Pleistocene tuff dating. However, the papers on Permian tuffs cited by us which describe the zircons discussed in this paper are more representative of the problems we have in these rocks. For example, figure 12 in Metcalfe et al 2015, shows that most of the samples in this drill core which have been analysed by CA-ID-TIMS contain zircons crystals which predate the eruption age. A particularly striking example is the second lowest tuff, GA2122738, with an eruption age of 254.34 +/- 0.08 Ma. The next three tuffs above this in the drill core contain inherited (or contaminant) zircons whose age is within uncertainty of the GA2122738 eruption age. These grains are excluded from the weighted mean eruption age in each case because the TIMS is precise enough to identify them.
The uppermost tuff in this sequence (GA2122750) is one of the SHRIMP analyses in which the SHRIMP age is older than, and not within uncertainty of, the CA-ID-TIMS age. However, the old outlier grain in the CA-ID-TIMS analysis of GA2122750 is within error of the SHRIMP age, as are the next six tuffs lower down in the drill hole. The 2sig uncertainty on each individual SHRIMP spot for this sample is on the order of 5 Ma, so distinguishing spots on antecrysts instead of eruption-age zircons by U-Pb date alone is impossible, allowing accidental antecryst analyses to bias the SHRIMP ages older.
We are happy to incorporate the line-by-line corrections not addressed above as appropriate when revising the paper.
References cited:
Black, L. P., Jagodzinski, E. A. Importance of establishing sources of uncertainty for the derivation of reliable SHRIMP ages, Australian Journal of Earth Sciences, 50:4, 503-512, 2003. DOI: 10.1046/j.1440-0952.2003.01007.x
Compston, W. Interpretations of SHRIMP and isotope dilution zircon ages for the geological time-scale: 1. The early Ordovician and late Cambrian Mineralogical Magazine (2000) 64 (1): 43–57.
Reiners, P., Carlson, R., Renne, P., Cooper, K., Granger, D., McLean, N., Schoene, B. (2017). Interpretational approaches: making sense of data. In Geochronology and Thermochronology. Wiley. 10.1002/9781118455876.ch4.
Sambridge, M.S., Compston W.: Mixture modelling of multi-component data sets with application to ion-probe zircon ages. EPSL 128 373-390. 1994
Webb, P., Wiedenbeck, M., Glodny, J. An International Proficiency Test for U-Pb Geochronology Laboratories -Report on the 2019 Round of G-Chron based on Palaeozoic Zircon Rak-17 Scientific Technical Report STR - Data 21/06 ISSN 2190-7110G-Chron 2019 – Round 1 Submitted
- There is speculation that the distribution of the age differences between SHRIMP and TIMS ages is bimodal. I doubt this, and would argue that this apparent bimodality is an artifact of the small sampling size. As a demonstration, a simple test was made using the Excel Rand function 35-times and normalizing the results to a mean of 0 and a standard deviation (using the NORM-INV function). It is very easy to generate an apparent bimodal distribution with a difference in modes at approximately the same value stated for the difference between volcanic and plutonic samples (0.7%):
-
RC2: 'Comment on gchron-2022-20', Yuri Amelin, 08 Sep 2022
This manuscript presents a cross-calibration between two widely used techniques of U-Pb dating: isotope dilution TIMS assisted with chemical abrasion, and SIMS. Such studies are needed to assurure that the global geochronological dataset is consistent. Still, these studies are rare. Furthermore, previous similar comparisons were performed before the recent developments in both dating techniques, and thus have only limited applicability to the modern studies.
Overall, I think this is important and generally very good paper that can be published after minor revisions. The data, both previously published and new, are high quality, and the interpretations are generally sound. Some questions and suggestions for further improvements are below.
General comments
- It is important to emphasize two points about the “older – younger” distinction. First, it is only meaningful if the time interval between two points in time, e.g. CA-ID-TIMS date and SHRIMP date, is omly meaningful if the value of this interval is greater than its uncertainty. For example, the age difference of 5±4 Ma is significant and can be interpreted and discussed, whereas the age difference of 5±6 Ma has to be interpreted as two events occurring simultaneously. Second, the above distinction depends on the confidence level of the uncertainties, and this confidence level must be is specified. It would be even better not only report the confidence level, but also to justify its choice. I suggest that the treatment of age intervals and their uncertainties should be described in a separate section in the Methods.
- I suggest to expore possible correlations between the SHRIMP − ID-TIMS age difference and parameters such as U concentration, radiation dose (estimated from U, Th and Sm concentrations and the age), and Th/U ratio. This could possibly add new dimensions to the story.
Specific comments
Lines 45-46. To what extent the study of Jeon and Whitehouse (2014) is relevant to SHRIMP usage? Is the difference between the Cameca and SHRIMP design sufficiently big to make their results inapplicable to SHRIMPs?
Lines 78-80. The qiestion here is when the uncertainty of calibration is applied: before or after averaging the sample spot analyses. I think the latter is correct approach, as it prevents artificial uncertainty reduction due to repetition. The same dilemma exists, and is widely acknowledged, in Ar-Ar geochronology.
Lines 85-91. Make it clear that you talk about age (or 238U/206Pb) standards here. Temora-2 is indeed superior to SL13 as an age standard. However, as a concentration standard SL13 is significantly better than any Temora zircon. This is why modern SHRIMP studies use both standards together, each to its strength.
Line 102. Another good paper on the basics of chemical abrasion is Mattinson (2011) Extending the Krogh legacy: development of the CA–TIMS method for zircon U–Pb geochronology, Canadian Journal of Earth Sciences v.48, pp.95-105 (the special volume dedicated to memory of Tom Krogh).
Lines 131-134. There are pros and contras in using sampes collected for geological problem solving vs. dedicated natural standards. The downside of the “in the wild” approach here is that the “geological” uncertainty is typically greater compared to using natural reference materials (i.e., the best preserved and most homogeneous minerals). It would be interesting to discuss this topic in more detail.
Lines 138-139. Consider recalculating SHRIMP ages using the age of Temora zircon reported by Schaltegger et al. (2021) JAAS DOI: 10.1039/d1ja00116g. The difference is likely to be small, but this would still make the data a bit more accurate.
Line 147. Metcalfe (a typo). The same in line 302.
Lines 233-234. Sounds like zircon solutions were put into anion exchange separation in 6M HCl. This does not make sense, and is not consistent with the Krogh (1973) chemistry or its later adaptations. Pb does not stick to the resin in this medium.
Line 235. Eluted together in what medium?
Lines 373-374. Strictly speaking, you should use a quadratic sum of both confidence intervals. The difference from using SHRIMP confidence interval would be small, however.
Section 4.1. I think it should be part of “Results” rather than “Discussion”.
Lines 430-434. Extrapolation of the trend defined from a narrow spread of values (in this case, the age) to a much wider range, such as shown in Fig. 3, is usually unreasonable. At the very least, show the uncertainty envelope for the entire range from 0 to 3500 Ma, not just ~100-500 Ma as it is done now. This will immediately and clearly show how much significance does this slope really have.
Lines 424-438. About OG1. All data that are discussed in a paper must be introduced in “Results”. It is not permissible to introduce any new data in the “Discussion” section. Hence add a brief section about OG1 to the “Results”.
Lines 434-435. Add references to support this statement.
Lines 436-438. Consider the distribution of radiation damage in inderstanding how chemical abrasion works. The damage can vary from individual recoil tracks (of ca. 100-150 nm) to U-rich bands in oscillatory zoning (microns or wider). Dissolution of individual recoil tracks would not make any visible changes in the zircon, but would impact the U-Pb system.
Lines 455-456. The extent of radiation damage can be estimated with the data available in this study, without any additional measurements (especially if we consider U and Th but ignore Sm, which may be a sensible approach). It would be good to seach for any possible correlations between radiation damage and U-Pb systematics.
Citation: https://doi.org/10.5194/gchron-2022-20-RC2 -
AC1: 'Author responce to RC 1 and 2', Charles Magee, 17 Oct 2022
Reply to referee reports:
We thank both reviewers for their insightful reviews. In response, we would like to take this opportunity to improve the structure of the paper. The source of dispersion (excess to counting stats) in U-Pb SIMS data has been debated for decades; this debate can most succinctly be summarized by the arguments of Black and Jagondinski (2003) on one hand, who say that an excess error term needs to be applied if the source cannot be identified, and Compston (2000), who argues that all the scatter can be traced to bad zircons. We didn’t include either paper in our introduction as the actual data they were discussing (multi-grain pre-CA TIMS aliquots vs SHRIMP 1 data) are obsolete, but the ideas are still current. Indeed, and Black and Jagodzinski (2003) say in their subsection “The Way Ahead”:
“ There are two alternative courses of action to adopt once it is accepted that uncertainties exceeding those predicted from counting statistics can be generated as part of the SHRIMP analytical process. The first is to empirically quantify the magnitude of variation by means of replicate analyses, and then to use SHRIMP only for those projects where such variation (e.g. 1–2%) is acceptable. The other approach is to delve as deeply and objectively as possible into the various sources of uncertainty in SHRIMP dating, so that they can be identified, understood and ultimately either minimised or removed altogether.”
Moving our treatment of uncertainty from the intro to the methods (as reviewer 2 suggests) will allow us to streamline the introduction to highlight how we can now examine both calibration issues and inhomogenous natural sample issues using our data of doubly dated zircons.
As both reviewers recommend removing figure 3 and minimizing the associated analysis, we can replace this with discussion about the calibration. In the current manuscript we skip that analysis and go straight to looking at natural mineral-based explanations, which is jarring when you consider how much of the introduction is spent explaining the calibration. We have re-examined the calibration-related data enough to know that there is information there to discuss, and this would make the paper more interesting and help tease out the competing factors of calibration quality vs geological accuracy.
This leads us to the relevance and usefulness of considering sub-percent SIMS data. It should not be surprising that most of our data are reported with sub-percent precision, as most current SIMS U-Pb data falls into this category. In both rounds of the recent G-Chron U-Pb proficiency testing (Webb et al., submitted), the reported median 2sig uncertainty for the 206Pb/238U SIMS age was 0.5% in round 1 and 0.6% in round 2. As our dataset contains many samples in this range, it is worth considering how SIMS data in that range of precision compares to CA-ID-TIMS.
This leads us to the questions about the source and application of uncertainty in central to the comments from reviewer 1 about the statistical validity of our data analysis.
Using the data as reported, even with the exclusion of the outliers discussed in section 4.2.2, the weighted mean of the reported data has an MSWD of 1.9 and a probability of fit of 0.002, indicating that it is not a homogenous population. Obviously the probability of fit can be increased by arbitrarily increasing the uncertainties of each age- increasing them to 2% give an MSWD of 0.35 and a 100% probability of fit, for example. But as Reiners et al. (2017) suggest in chapter 4, hiding dispersion by the use of excess error should only be done if a physical explanation cannot be found; this is why it is important to consider explanations related to the physical samples. As far as our hypothesis that the data contains two populations is concerned, we would like to point out that the mixture-modelling approach of Sambridge & Compston (1994) shows that the statistically most likely population split is similar to that derived from dividing the samples into intrusive and extrusive rocks.
Obviously we cannot accommodate Reviewer 1’s request to both cull data and have a larger dataset. One result of compiling this data, which was first done in the late 2010’s, is that it tells us which samples are not likely to yield geologically useful ages from SHRIMP analyses, and should be sent straight to a CA-ID-TIMS lab. As a result, the incidence of double dating has been reduced, and there is only one additional doubly dated sample which has appeared since 2016. As it wasn’t available until the manuscript was in internal review, and it is yet another outcrop of the Emmaville Volcanics, which are already over-represented relative to the rest of the Australian continent in this study, we didn’t add it in between reviewers. However, we would be happy to include it in the final paper. As for culling data, we feel that a discussion of the calibration behaviour and quality (see above) will help put various outliers in context.
We appreciate reviewer 1’s suggestion of Burgess et al. (2019), and agree that it is a great paper on Pleistocene tuff dating. However, the papers on Permian tuffs cited by us which describe the zircons discussed in this paper are more representative of the problems we have in these rocks. For example, figure 12 in Metcalfe et al 2015, shows that most of the samples in this drill core which have been analysed by CA-ID-TIMS contain zircons crystals which predate the eruption age. A particularly striking example is the second lowest tuff, GA2122738, with an eruption age of 254.34 +/- 0.08 Ma. The next three tuffs above this in the drill core contain inherited (or contaminant) zircons whose age is within uncertainty of the GA2122738 eruption age. These grains are excluded from the weighted mean eruption age in each case because the TIMS is precise enough to identify them.
The uppermost tuff in this sequence (GA2122750) is one of the SHRIMP analyses in which the SHRIMP age is older than, and not within uncertainty of, the CA-ID-TIMS age. However, the old outlier grain in the CA-ID-TIMS analysis of GA2122750 is within error of the SHRIMP age, as are the next six tuffs lower down in the drill hole. The 2sig uncertainty on each individual SHRIMP spot for this sample is on the order of 5 Ma, so distinguishing spots on antecrysts instead of eruption-age zircons by U-Pb date alone is impossible, allowing accidental antecryst analyses to bias the SHRIMP ages older.
We are happy to incorporate the line-by-line corrections not addressed above as appropriate when revising the paper.
References cited:
Black, L. P., Jagodzinski, E. A. Importance of establishing sources of uncertainty for the derivation of reliable SHRIMP ages, Australian Journal of Earth Sciences, 50:4, 503-512, 2003. DOI: 10.1046/j.1440-0952.2003.01007.x
Compston, W. Interpretations of SHRIMP and isotope dilution zircon ages for the geological time-scale: 1. The early Ordovician and late Cambrian Mineralogical Magazine (2000) 64 (1): 43–57.
Reiners, P., Carlson, R., Renne, P., Cooper, K., Granger, D., McLean, N., Schoene, B. (2017). Interpretational approaches: making sense of data. In Geochronology and Thermochronology. Wiley. 10.1002/9781118455876.ch4.
Sambridge, M.S., Compston W.: Mixture modelling of multi-component data sets with application to ion-probe zircon ages. EPSL 128 373-390. 1994
Webb, P., Wiedenbeck, M., Glodny, J. An International Proficiency Test for U-Pb Geochronology Laboratories -Report on the 2019 Round of G-Chron based on Palaeozoic Zircon Rak-17 Scientific Technical Report STR - Data 21/06 ISSN 2190-7110G-Chron 2019 – Round 1 Submitted