Comment on gchron-2021-22

Multi-kinetic effects in AFT thermochronology have long been neglected by much of the community, I gather in large part because it entails more trouble and expense to acquire sufficient compositional data, and the rewards are unclear, especially since thermal history inversion software will often produce a result without it. Hopefully this paper, and others from this group, will bend the curve. At the same time, to be effective in doing so (or at least transparent in trying), it would be good to better document the costs. For example, how long does the EMPA protocol take per spot? What considerations went into decision(s) of whether to do compositional analysis first versus laser ablation? Does doing laser ablation first save time, by figuring out which grains work and providing evidence of whether there is kinetic dispersion, and does this outweigh the disadvantage of not getting the analysis precisely where the tracks were measured? Are there cases where changing the order would be a good idea? Provide the reader with more data and use cases to enable a cost-benefit analysis.


calculated.
Multi-kinetic effects in AFT thermochronology have long been neglected by much of the community, I gather in large part because it entails more trouble and expense to acquire sufficient compositional data, and the rewards are unclear, especially since thermal history inversion software will often produce a result without it. Hopefully this paper, and others from this group, will bend the curve. At the same time, to be effective in doing so (or at least transparent in trying), it would be good to better document the costs. For example, how long does the EMPA protocol take per spot? What considerations went into decision(s) of whether to do compositional analysis first versus laser ablation? Does doing laser ablation first save time, by figuring out which grains work and providing evidence of whether there is kinetic dispersion, and does this outweigh the disadvantage of not getting the analysis precisely where the tracks were measured? Are there cases where changing the order would be a good idea? Provide the reader with more data and use cases to enable a cost-benefit analysis.
[line 249] Although replicate values are indeed important for assessing the reproducibility of kinetic parameter values, they may also be taken as an indication of the presence of zoning. The authors do not specify how many spots they took per analysis, but I suspect the answer is one, and that it reflects the usual 2-µm activation zone for EMP; was this driven by the desire for a faster and/or less expensive analysis? Likewise, how many Dpar measurements are averaged for each Dpar determination? The usual procedure is to average four, which ought to make the reproducibility better than observed in Fig. 2c. Also, it's a little unfortunate that the discussion of the downsides of this procedure (lines ~326-340; might not get a compositional analysis near the counting area, or for the grain at all, I gather partly due to the LAICPMS spot) is in the next section; the authors can probably clarify and condense things by briefly mentioning these here, and then referring to them in section 2.3.
[line 269] Although Dpar imprecision is certainly responsible for a lot of the scatter in Fig.  2e, it's not clear it's the main reason; the authors might try only plotting the points within the 20% bars in Fig. 2c and seeing what the Dpar vs. eDpar scatter looks like. The even scatter might simply be an indication that the things that throw Dpar off are bidirectional; a little OH might increase resistance to annealing compared to no OH (i.e. F-apatite), and a lot of OH might decrease it (e.g., OH-apatite HS from Carlson et al. (1999), but the more OH you have the higher Dpar is.
[ Figure 2] Maybe smaller symbols would be better to avoid some of the "solid cloud" effect; some "N =" annotations also would not hurt, and maybe correlation coefficients for d and e.
[line 291] "colour-coded" [line 295] It may be worth noting that compositional populations may also be good candidates for shared inheritance. Although eCl is one such possibility, insofar as it combines a number of compositional variables into one number, apatites with similar eCl may get there via different compositional components, and thus not constitute a good candidate for shared inheritance. This is discussed further below.
[line 333-337] Maybe here or elsewhere, discuss the choice between switching which bin a grain is in, versus leaving the grain out altogether.
[line 438] The claim that population 3 has retained tracks from 540 Ma, or from about 245°C ( Figure 6) is eye-catching, and probably overly optimistic about the ability of AFT to retain information about such high temperatures. It appears to stem from a difference in how AFTINV evaluates total annealing versus HeFTy's "oldest track". HeFTy assumes total annealing after reduced mean length falls below 0.4095 for non-projected lengths, corresponding to a mean length of just under 7 µm, whereas AFTINV appears to have total annealing correspond to a mean track length of 2 µm (line 419). This may be based on a slight misinterpretation of what's written in Ketcham et al. (2000); the 2 µm limit mentioned there corresponds to the smallest track that can appear in a track length distribution. However, such occurrences are due to including a population of tracks with a higher mean and large standard deviation. The 0.4095 value arises in part from the observation that no annealing experiments reported by Green (1988) or Carlson et al. (1999) had a mean length below 7 µm (although there are some 6's and 5's reported by Barbarand et al. (2003), and even an occasional 4 or 3 by their Analyst 3). Willett (1997) uses a similar value of 0.428 as the zero-density intercept of reduced length versus density reported by Green (1988). In other words, by the time a mean length falls below some limit, the track population becomes undetectable. I believe this provides a more realistic basis for evaluating total annealing and the oldest retained track. Using the revised criterion, the TA for the oldest track for an r mr0 =0.491 apatite is closer to 200°C, which seems a lot more reasonable considering the closure temperature is 161°C. This is not the most crucial of issues, but it's prudent to avoid distracting claims.
[ Figure 6, 7] I appreciate the authors' efforts to incorporate the CRS method into AFTINV, and intrigued by the result -it looks to be a powerful addition. I have long been considering doing something similar myself, having dropped the CRS method when I converted my earlier program AFTSolve to HeFTy. However, one of the reasons I did so may still be evident in the model results here. The CRS method has a tendency to quickly converge to a relatively smooth solution that does not explore the solution space as well as the Monte Carlo method, and thus map out the range of solutions that fit well. In HeFTy results, this allows the resolving power of the data to be evaluated by looking at the width of the solution envelopes. In the results here, what puzzles me for P013-12 is the relatively tight band of good solutions above 175°C from 600-450 Ma, and probably a fair bit younger/cooler than that. Given the 161°C closure temperature of the most resistant population, the idea that it would exert much constraint in the 175-250°C temperature range seems improbable, and is not reflected in the QTQt results either. This all is not necessarily a problem, but I think it should be discussed so people interpreting these results have a more complete knowledge of what they are looking at.
Along similar lines, did both the AFTINV and QTQt models assume that all apatites in each sample had the same inherited, pre-depositional history? If so, was the fact that they did so, and their success in fitting their models, and indication that there was shared provenance, or an indication that, for these samples, results are not terribly sensitive to the pre-depositional history? Or, are the results sensitive -do the few earlier-cooling 0.5 paths for P013-12 corresponds to the earlier peaks T's at ~195 Ma and/or ~70 Ma?
The manual (AFTINV) and automatic (QTQt) raising of the r mr0 values for the most resistant populations in each sample is interesting. What seems to be going on is that the different populations need greater separation in their partial annealing zones to produce their respective divergent age and length distributions. It's further interesting that the higher resistance is corroborated by the vitrinite data for sample LHA003, though less so for P013-12. The authors recommended approach of "anchoring" on low-resistance kinetic seems like a good one. Another possible "advantage" of the Ketcham et al. (1999) model over the (2007) one beyond the different r mr0 equation is that it has a much higher temperature range, which these results may imply is necessary to create these divergent populations.
Lastly, the comparison between AFTINV and QTQt results appears to gloss over their differences a bit. For P013-12, the first reheating peaks at ~168 Ma in AFTINV and could go as far back as 195 Ma, whereas QTQt appears to strongly say that it was at about 140 Ma. Similarly, AFTINV implies that the first peak reheating for LHA003 was at 345 Ma, compared to 300 Ma for QTQt. If you lay the models pairs on top of each other, they appear to exclude each other at these times. Is this because QTQt calculated different kinetics than the manually-shifted ones in AFTINV, or because of QTQt favoring simpler histories, or some combination of these and possibly other factors?