the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
(anchored) isochrons in IsoplotR
Abstract. Isochrons are usually fitted by 'York regression', which uses a weighted least squares approach that accounts for correlated uncertainties in both variables. Despite its tremendous popularity in modern geochronology, the York algorithm has two important limitations that reduce its utility in several applications. First, it does not provide a satisfactory mechanism to deal with socalled 'errochrons', i.e. datasets that are overdispersed with respect to the analytical uncertainties. Second, York regression is not readily amenable to anchoring, in which either the slope or the intercept of the isochron is fixed based on some external information. Anchored isochrons can be very useful in cases where the data are insufficiently spread out to constrain both the radiogenic and nonradiogenic isotopic composition.
This paper addresses both of these issues by extending a maximum likelihood algorithm that was first proposed by Titterington and Halliday (1979, Chemical Geology 26.34: 183195). The new algorithm offers the ability to attribute any excess dispersion to either the inherited component ('model 3a') or to diachronous closure of the isotopic system ('model 3b'). It provides an opportunity to anchor isochrons either to a fixed age, or to a fixed nonradiogenic composition. Last but not least, it allows the user to attach meaningful analytical uncertainty to the anchor. The new method has been implemented in IsoplotR for immediate use in Ar/Ar, Pb/Pb, U/Pb, Th/Pb, Rb/Sr, Sm/Nd, Lu/Hf, Re/Os, K/Ca and UThHe geochronology.
 Preprint
(589 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on gchron20245', Donald Davis, 07 Mar 2024
I found this manuscript to be very interesting, although more for what it omits than for what is in it. According to the abstract, the manuscript presents a maximum likelihood approach for improving estimates of age and initial composition for isochron data that are problematic due to clustering and/or excess scatter (data that are well distributed along an isochron but give a high MSWD). For scattered data an approach is described for cases involving constant age but variable initial ratio (model 3a) and constant initial ratio but variable age (model 3b) but it is only applied to clustered data where the MSWD is not really that high. This initially created some confusion in my mind as to what the manuscript was about. Calculating estimates of the range of initial ratios for model 3a and the range of ages for model 3b seems to me a much more interesting problem, which doesn’t seem to get addressed.
The author’s presentation is succinct and elegant, although I am afraid that it will not be transparent to most geochronologists without a strong mathematical background (including me). We are fortunate as geochronologists to have had people like York, Ludwig and the present author who have taken it upon themselves to provide us with the tools necessary for interpreting our data. I hope that my critique will improve the manuscript and I apologize in advance for anything that I have misunderstood.
As explained in the manuscript, data from measurements of radioactive parent, radiogenic daughter and nonradiogenic daughter isotope ratios from a suite of samples having variable parent/daughter elemental ratios should be distributed along a line called an isochron, provided that every sample has the same age and formed with the same isotopic composition of the daughter element (such as minerals that crystallized from the same magma). The isochron represents a mixing line between a hypothetical nonradiogenic end member and an end member where all of the daughter element is radiogenic. In conventional isochrons (e.g. Fig 1 i) of the manuscript), the latter end member is at infinity, but the age can be derived from the slope of the line, whereas the nonradiogenic component is given by the Y intercept. An equivalent ‘inverse’ isochron (e.g. Fig 1 iv) can also be constructed in which both age and daughter element initial isotope ratio are represented by intercepts with the X and Y axis, respectively.
Classical regression methods such as described in the manuscript have been developed that use the measured data and assigned errors to determine a bestfit isochron that gives the most likely values for age and intercept along with their errors. An equally important parameter determined by the regression program is the scatter of data around the bestfit line, which is usually quantified by the mean of the squares of weighted deviations (MSWD, which is the reduced chi square value and can be converted to a probability of fit). Two practical problems that can arise are when data show excess scatter relative to assigned errors around the best fit isochron (MSWD significantly greater than 1) and when data show limited dispersion along an isochron (clustering). It seems to me that only the latter problem can be addressed effectively by anchoring.
Clustered data provide poor constraints to an isochron so regression usually gives large errors for both parameters. Data clustering is similar to having multiple measurements of a single aliquot, approximating a single datum that defines a relationship between the age and primary composition so one cannot determine one without knowledge of the other. A way of determining an age in this case is to add an artificial (not derived from the data) estimate of the primary composition with an equally artificial error. This is the usual meaning of ‘anchoring’ and effectively gives a ‘model’ age. The manuscript suggests that by including extra error parameters in expressions for either the unknown age or primary isotopic composition, errors in the values used for anchoring can be calculated and parameter estimates are improved. It is obvious from the inverse isochron that both parameters cannot be anchored at the same time because that would define a unique line independent of the data.
The object of statistical analysis is to determine how much useful information can be derived from a data set without having to introduce arbitrary conditions. In the case of clustered data, determining a precise age still requires the ‘hack’ of introducing an initial ratio value that is not well constrained by the data, although it may be derived from an assumed model of average crustal composition or from measurements on related samples. Determining an error for this anchor amounts to solving a conditional probability problem (given that the intercept is the anchor, how much can it be allowed to vary and still be compatible with the data). Not having any anchor (normal regression) gives a large error. Positing an anchor restricts the intercept error, therefore will restrict the error in the other parameter as shown in Fig 2. This may be useful in some cases but it still produces what is effectively a model age. The approach would be made more convincing if it could be shown that the same average errors are produced from many random data sets.
High MSWD, provided that it is not the result of underestimated analytical errors or biased data, indicates a probable violation of the assumption that the data reflect a single distinct isochron. Somewhat surprisingly, scatter has no effect on isochron age and primary composition errors, which depend only on errors in the measured data, but their accuracy is questionable when there is a low probability of fit. This does not necessarily mean that the data are useless. Accurate data always contain useful information and it is the job of statistics to extract it. In this case, the scatter could be explained as being due to variation in either age or initial isotopic composition of the aliquots or both. This could be tested if one had a large, widely distributed data set. On a conventional isochron diagram, if one were dealing with a case of fixed initial isotopic composition but diachronous formation of samples over a fixed time range (model 3b in the manuscript), the data should fan out from a point on the Y axis representing the initial ratio and it should be bounded by lines whose slopes give the time range of formation. Conversely, if the samples all have a single age but formed with a range of initial isotopic compositions (model 3a), the data should form a band that intersects the Y axis across a range approximating the range in initial ratios and where the regression slope should approximate the age. The inverse isochron is most useful because it can be readily used to analyze cases where both age and initial ratio are variable. A densely populated data set would form a band whose intersection with the Y axis gives the range in initial ratios and whose intersection with the X axis gives the range in ages. Most data sets are too limited to show the patterns described above but may still be useful for determining the most consistent parameter estimates.
I have found that relationships expressed by equations are most easily understood by being visualized on a plot, in this case a probability density plot laid over the isochron diagram. Adding a third axis representing data probability density to the isochron diagram, allows the sum of normally distributed probability density peaks around each of the data to be visualized as a 3D surface, which will be quite bumpy if data are sparse. Dividing the volume beneath this surface by the number of data normalizes it. As mentioned above, on the inverse isochron the upper and lower limits of the ranges in age and initial ratio could be estimated by a pair of isochrons that encompass say, 95% of the volume. Generally these will not be parallel because the ranges in age and initial ratio will be different. Thus, in this case unlike that of clustering, models 3a and 3b can be assumed to apply together because the data produce a welldefined isochron, even if they are scattered, which can be made meaningful as an average in the context of the models. Again, the approach in the manuscript estimates errors in knowledge of an anchoring (assumed constant) parameter used for clustered data. This is distinct from the most probable range of a variable parameter with dispersed scattered data. The latter could be important information, so it would be interesting if IsoplotR could estimate it.
Maximum likelihood is a powerful statistical approach but it is based on the assumption that input values and parameter estimates follow normal distributions that can be characterized by variances. This is likely to be true for measured data. However, it is unlikely to be true for parameters like initial ratios and age range if these are variable. In this case, or in any case of high MSWD, an alternative approach would be to use nonparametric statistics such as the robust regression method in Ludwig’s Isoplot. It is a question of whether the advantage of improved stability of the mode versus the mean offsets the disadvantage of abandoning knowledge of the measurement errors. It would be interesting to read any comments by the author on the applicability of this approach.
‘(anchored)’ should be capitalized without brackets in the title. If the comments above have sufficient merit to lead to expanding the program, it would be better to make the title more general like ‘Interpreting scattered and clustered isochron data using IsoplotR’.
Line 24: ‘inverse’ should be capitalized at the beginning of a sentence.
Line 199: “no” should be “not”.
Line 204: Remove one “as”.
Line 333: Schaen et al. reference should be to Bulletin of the Geological Society of America.
Table B2 caption: Synthetic data for Figures 1 and 2.
Don Davis
Citation: https://doi.org/10.5194/gchron20245RC1 
AC1: 'Reply on RC1', Pieter Vermeesch, 09 Apr 2024
Dr. Davis's detailed review nicely summarises the key points of my paper whilst making some very useful suggestions, which are addressed below.
Comment 1:
I found this manuscript to be very interesting, although more for what it omits than for what is in it. According to the abstract, the manuscript presents a maximum likelihood approach for improving estimates of age and initial composition for isochron data that are problematic due to clustering and/or excess scatter (data that are well distributed along an isochron but give a high MSWD). For scattered data an approach is described for cases involving constant age but variable initial ratio (model 3a) and constant initial ratio but variable age (model 3b) but it is only applied to clustered data where the MSWD is not really that high. This initially created some confusion in my mind as to what the manuscript was about. Calculating estimates of the range of initial ratios for model 3a and the range of ages for model 3b seems to me a much more interesting problem, which doesn’t seem to get addressed.
Response 1:
Dr. Davis is right. I will add a section with benchmark tests for overdispersed samples.
Comment 2:
The author’s presentation is succinct and elegant, although I am afraid that it will not be transparent to most geochronologists without a strong mathematical background (including me). We are fortunate as geochronologists to have had people like York, Ludwig and the present author who have taken it upon themselves to provide us with the tools necessary for interpreting our data. I hope that my critique will improve the manuscript and I apologize in advance for anything that I have misunderstood.
Response 2:
I thank Dr. Davis for listing my name among those of Derek York and Ken Ludwig. I have found their papers very useful because they provide all the necessary technical detail to write computer code. I hope that others will find my papers useful in the same way. Whilst the comparison to York and Ludwig is flattering, it is too much honour. Reviewer 2 (Dr. Rudge) correctly points that my paper is a straightforward application of maximum likelihood theory to geochronology. It is neither fancy nor difficult but useful nonetheless, I hope.
Comment 3:
The object of statistical analysis is to determine how much useful information can be derived from a data set without having to introduce arbitrary conditions. In the case of clustered data, determining a precise age still requires the ‘hack’ of introducing an initial ratio value that is not well constrained by the data, although it may be derived from an assumed model of average crustal composition or from measurements on related samples. Determining an error for this anchor amounts to solving a conditional probability problem (given that the intercept is the anchor, how much can it be allowed to vary and still be compatible with the data). Not having any anchor (normal regression) gives a large error. Positing an anchor restricts the intercept error, therefore will restrict the error in the other parameter as shown in Fig 2. This may be useful in some cases but it still produces what is effectively a model age. The approach would be made more convincing if it could be shown that the same average errors are produced from many random data sets.
Response 3:
In some cases, anchored isochrons are crucially important. One example is the LuHf system, whose inherited composition invariably hovers around a 177Hf/176Hfratio of 3.55 +/ 0.05. I can add an example of such a dataset to the revised manuscript.
Comment 4:
I have found that relationships expressed by equations are most easily understood by being visualized on a plot, in this case a probability density plot laid over the isochron diagram. Adding a third axis representing data probability density to the isochron diagram, allows the sum of normally distributed probability density peaks around each of the data to be visualized as a 3D surface, which will be quite bumpy if data are sparse. Dividing the volume beneath this surface by the number of data normalizes it.
Response 4:
I have presented some arguments against PDPs elsewhere (doi:10.1016/j.chemgeo.2012.04.021) and won't repeat them here. IsoplotR already allows the user to project data along an isochron and visualise the results as a kernel density estimate. As explained in Section 16.5 of the IsoplotR manual (perpetually under construction at https://github.com/pvermees/geotopes/blob/master/latex/manual.pdf), it does so either by projecting the data along a set of fanning lines in conventional isochron space, or along a set of parallel lines in inverse isochron space.
Comment 5:
As mentioned above, on the inverse isochron the upper and lower limits of the ranges in age and initial ratio could be estimated by a pair of isochrons that encompass say, 95% of the volume. Generally these will not be parallel because the ranges in age and initial ratio will be different. Thus, in this case unlike that of clustering, models 3a and 3b can be assumed to apply together because the data produce a welldefined isochron, even if they are scattered, which can be made meaningful as an average in the context of the models. Again, the approach in the manuscript estimates errors in knowledge of an anchoring (assumed constant) parameter used for clustered data. This is distinct from the most probable range of a variable parameter with dispersed scattered data. The latter could be important information, so it would be interesting if IsoplotR could estimate it.
Response 5:
I must confess that I do not fully understand this comment. However, to follow up on my previous response, I have long wanted to add an option to IsoplotR that would label the horizontal axis of an inverse isochron with units of time. Now would be a good time to implement this feature.
Comment 6:
Maximum likelihood is a powerful statistical approach but it is based on the assumption that input values and parameter estimates follow normal distributions that can be characterized by variances. This is likely to be true for measured data. However, it is unlikely to be true for parameters like initial ratios and age range if these are variable. In this case, or in any case of high MSWD, an alternative approach would be to use nonparametric statistics such as the robust regression method in Ludwig’s Isoplot. It is a question of whether the advantage of improved stability of the mode versus the mean offsets the disadvantage of abandoning knowledge of the measurement errors. It would be interesting to read any comments by the author on the applicability of this approach.
Response 6:
I have resisted the call for robust regression methods because, in my view, they don't address the cause of the dispersion but just mask it. The reviewer is correct that the initial ratios and age do not follow normal statistics, because they are strictly positive (`Jeffreys') quantities. A better solution would be to reformulate the isochron regression problem in terms of logarithmic quantities (and logratios). I am currently working on this problem but am not ready to share the results yet. The problem is not as easy as one might think.
Comment 7:
‘(anchored)’ should be capitalized without brackets in the title. If the comments above have sufficient merit to lead to expanding the program, it would be better to make the title more general like ‘Interpreting scattered and clustered isochron data using IsoplotR’.
Response 7:
I admit that the title looks a bit odd. It aims to reflect the fact that the paper describes an algorithm for anchored isochron regression, which requires a descent into the rabbit hole of maximum likelihood theory and random effects models. Reviewer 2 did not seem to mind the title. I would be grateful for the Associate Editor's opinion on the matter.
I will follow all the other minor fixes by Dr. Davis at the end of his review.
Citation: https://doi.org/10.5194/gchron20245AC1

AC1: 'Reply on RC1', Pieter Vermeesch, 09 Apr 2024

RC2: 'Comment on gchron20245', John Rudge, 28 Mar 2024
This manuscript presents a straightforward application of maximum likelihood estimation to fitting of isochrons with "anchoring". Everything is very clearly described and explained. I have only fairly trivial suggestions for improvement, and I feel this manuscript could be published essentially asis.
Line 57, "model 3b is computationally more challenging than model 3a". Line 125 "Finding x requires many iterations". Line 134 "slow convergence rate". Line 313. "Although it is possible to jointly optimise the entire xvector, this takes considerably longer than model 1". While I agree with the author that doing a optimisation for all the parameters at once is more computationally intensive than those cases where there are some simple closed form expressions for finding part of the solution, I would argue that with modern computational methods it is not a significant challenge to perform a direct optimisation for all parameters at once. The likelihood function the author uses is differentiable, and the analytical derivatives are straightforward to write down (as effectively is done in going from (D2) to (D3)). I would have thought these analytical derivatives could be plugged into a standard gradientbased optimiser using BFGS or similar to rapidly yield a solution. I'm not sure why the author found a particularly slow convergence rate (line 134). It might be worth giving more details of the optimisation method used (Was is it slow NelderMead? That should only be used if you don't know the derivatives.)
Typos:
Line 4: Should be "errorchrons"
Equation 17. In the x equation I think it should be lambda_235 not lambda_238.
Equation D3. I think the (Omega_{1,2}^i + Omega_{1,2}^i) bits are wrong. I think they should be (Omega_{1,1}^i + b* Omega_{1,2}^i) instead. It may also be worth noting that Omega_{1,2}^i = Omega_{2,1}^i (the matrix is symmetric).
John Rudge.
Citation: https://doi.org/10.5194/gchron20245RC2 
AC2: 'Reply on RC2', Pieter Vermeesch, 09 Apr 2024
Comment 1:
This manuscript presents a straightforward application of maximum likelihood estimation to fitting of isochrons with "anchoring". Everything is very clearly described and explained. I have only fairly trivial suggestions for improvement, and I feel this manuscript could be published essentially asis.
Response 1:
Thank you!
Comment 2:
Line 57, "model 3b is computationally more challenging than model 3a". Line 125 "Finding x requires many iterations". Line 134 "slow convergence rate". Line 313. "Although it is possible to jointly optimise the entire xvector, this takes considerably longer than model 1". While I agree with the author that doing a optimisation for all the parameters at once is more computationally intensive than those cases where there are some simple closed form expressions for finding part of the solution, I would argue that with modern computational methods it is not a significant challenge to perform a direct optimisation for all parameters at once. The likelihood function the author uses is differentiable, and the analytical derivatives are straightforward to write down (as effectively is done in going from (D2) to (D3)). I would have thought these analytical derivatives could be plugged into a standard gradientbased optimiser using BFGS or similar to rapidly yield a solution. I'm not sure why the author found a particularly slow convergence rate (line 134). It might be worth giving more details of the optimisation method used (Was is it slow NelderMead? That should only be used if you don't know the derivatives.)
Response 2:
The optimisation uses the BFGS algorithm except for anchored model3 regression, where NelderMead is used. One would think that Newtonlike methods work better when an analytical expression is provided for the gradient. However, when I did this for Ludwig (1998) regression, I was surprised that the BFGS algorithm was more stable (if a bit slower) when the gradient was estimated numerically rather than analytically. Perhaps this phenomenon was caused by a programming error on my part (although I did rewrite the code a few times and always got the same result), or a calculation error by Ken Ludwig (although I redid his calculations and that didn't solve the issue). In any case, I have preferred to let R figure out the gradient by itself ever since, except in simple cases. Nevertheless, I will try Dr. Rudge's suggestion and derive the gradient of the loglikelihood with respect to the xvalues.
Comment 3:
Line 4: Should be "errorchrons"
Equation 17. In the x equation I think it should be lambda_235 not lambda_238.Response 3:
Correct. Thank you.
Comment 4:
Equation D3. I think the (Omega_{1,2}^i + Omega_{1,2}^i) bits are wrong. I think they should be (Omega_{1,1}^i + b* Omega_{1,2}^i) instead. It may also be worth noting that Omega_{1,2}^i = Omega_{2,1}^i (the matrix is symmetric).
Response 4:
Good catch! Fortunately, IsoplotR uses the correct expression. In R (lines 162164 of MLyork.R):
num < (O22*Y+O12*XO22*a)*bO12*a+O12*Y+O11*X
den < O22*b^2+2*O12*b+O11
x < num/denCitation: https://doi.org/10.5194/gchron20245AC2

AC2: 'Reply on RC2', Pieter Vermeesch, 09 Apr 2024
Viewed
HTML  XML  Total  BibTeX  EndNote  

390  77  15  482  3  6 
 HTML: 390
 PDF: 77
 XML: 15
 Total: 482
 BibTeX: 3
 EndNote: 6
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1