Robust Isochron Calculation

. 5 The standard classical statistics approach to isochron calculation assumes that the distribution of uncertainties on the data arising from isotopic analysis is strictly Gaussian. This effectively excludes from consideration datasets that have more scatter, even though many appear to have age signiﬁcance. A new approach to isochron calculations is developed in order to circumvent this problem requiring only that the central part of the uncertainty distribution of the data deﬁnes a “spine” in the trend of the data. This central spine can be Gaussian but this is not a requirement. This approach signiﬁcantly increases the range of datasets 10 from which age information can be extracted but also provides seamless integration with well-behaved datasets, and thus all legacy age determinations. The approach is built on the robust statistics of Huber (1981), but using the data uncertainties for the scale of data scatter around the spine, rather than a scale derived from the scatter itself, ignoring the data uncertainties. This robust data-ﬁtting reliably determines the position of the spine when applied to data with outliers, but converges on the classical statistics approach for datasets without outliers. The spine width is determined by a robust measure, the normalised median 15 absolute deviation of the distances of the data points to the centre of the spine, divided by the uncertainties on the distances. A test is provided to ascertain that there is a spine in the data, requiring that the spine width is consistent with the uncertainties expected for Gaussian-distributed data. An iteratively-reweighted least squares algorithm is presented to calculate the position of the robust


Introduction
The ability to fit a straight line through a body of isotope ratio data in order to form an isochron is the cornerstone of many geochronological methods. In detail, however, this is a non-trivial task, since uncertainties are usually associated with all variables, and these are often correlated, precluding simple "least squares" line-fitting techniques. Most of the research in this area was conducted in the late 1960s and early 1970s, being dominated by a classical statistics approach in which data uncertainties, derived from the analytical methods, are taken to be strictly Gaussian distributed (e.g. York, 1969;York et al., 2004, and references therein). This approach, referred to here as YORK, became entrenched in the geochemical community, particularly in the last two decades as the essential component of the very widely used software, ISOPLOT, e.g. Ludwig (2012).
In this contribution, we examine some of the problems inherent in these techniques and suggest an alternative approach. Our primary focus here will be on general-purpose isochron calculations that determine the age of an "event" that established the isotopic compositions of samples in a dataset. This involves what are called model 1 and 2 calculations in ISOPLOT -as described below. Approaches that try to extract detail within events, including ISOPLOT model 3 calculations, are not considered (but see, e.g. Vermeesch, 2018).

On ISOPLOT
In order to show that there are significant problems in using ISOPLOT for general-purpose isochron calculations and then to see how they can be addressed, it is first necessary to outline the ISOPLOT protocol, some details of which may not be apparent to the end user. Central to the ISOPLOT workflow, the main tool for considering data scatter is mswd, the mswd under the ISOPLOT protocol for a progressively modified dataset (see text and Appendix A). Under the condition of a model 1 fit, the age uncertainty is constant with increasing data scatter (reflected in increasing mswd), until there is a step change in the data treatment at A when the age uncertainty is multiplied by √ mswd. Then, at B, there is another step change in age uncertainty calculation with increasing data scatter forming ISOPLOT model 2 (see text). mean standard weighted deviates (also called the reduced chi-squared statistic); see Eq. (1). For strictly Gaussiandistributed uncertainties, (n − 2) mswd is distributed as chi squared (χ 2 n−2 ), meaning that if uncertainties are correctly assigned, a strong statistical statement can be made about whether the scatter of a particular dataset is solely consistent with the associated data uncertainties (i.e. with no geological scatter), for example, in the form of a 95 % confidence interval on mswd (Wendt and Carl, 1991). Such a confidence interval is not fixed but depends on the number of data points under consideration, so, for example, for n = 10, mswd < 1.94 (meaning that mswd extends from less than 1 through to 1.94), while for n = 50, mswd < 1.36. A dataset with mswd in the chosen range for the number of data points has data scatter that is consistent with the data uncertainties. This situation is commonly referred to as mswd "passes"; otherwise, mswd "fails" in relation to χ 2 n−2 . The situation where mswd passes provides a "pure" interpretation of YORK, and, in ISOPLOT is referred to as a model 1 isochron fit. This is depicted as the horizontal line in Fig. 1, indicating that in this range of mswd, corresponding to a confidence interval, the calculated uncertainty on an isochron age does not vary with mswd. Such a figure is drawn by taking an actual dataset and progressively modifying it to show what happens as mswd varies, as described in Appendix A.
What if mswd is greater than the upper limit of the chosen confidence interval? Then the data are considered to have excess scatter, in addition to that accounted for by the data uncertainties (assuming that they are strictly Gaussian) 1 . At this 1 Excess scatter occurs when the data distribution has higher variability in the tails than is indicated by the variability of the cen-point, ISOPLOT, asks the user whether an alternative -model 2 -calculation should be undertaken. This decision point is indicated at A in Fig. 1. If the user declines, ISOPLOT gives results that are referred to here as model 1×, as shown in Fig. 1. The model 1 age uncertainty is multiplied by √ mswd to reflect the data scatter being more than expected from the data uncertainties alone. With further scatter, the switch is made to model 2, at B in Fig. 1. Alternatively, if the user accepts the switch to model 2, then model 1× is not used, indicated by the vertical line at A extending up to the model 2 line in Fig. 1. The model 2 calculation in ISOPLOT is unrelated to YORK. The data uncertainties are discarded and the slope of the line through the linear trend is calculated as the geometric mean of the lines calculated by unweighted least squares of y on x and of x on y (see Appendix A).
In summary, then, in ISOPLOT, the calculation of ages and their uncertainties involves a number of decision points based around the concept of mswd that impart significant (and in our view unwelcome) step changes in the way that the data are handled and algorithms applied. To assist in further discussion of these matters, we depart from the language of ISOPLOT at this point, reintroducing the term "errorchron", counterposed to isochron, following Brooks et al. (1972). The idea is that isochrons have a higher chance of having age significance, while errorchrons have a lower chance. In particular, it seems to be unhelpful for the results of model 2 calculations to be called isochrons as is the case in ISOPLOT, given that there is excess scatter in the data.

Replacing ISOPLOT
Given that ISOPLOT's implementation of model 1 and 2 line fits is the gold standard of isochron calculations presently, we ask where the problems are and then what can be done to address them. A shortcoming in YORK stems from the assumption that data uncertainties are strictly Gaussian distributed. In real-world applications, this appears to be too restrictive, with datasets that are likely to have age significance being labelled as errorchrons because mswd is too large. While using YORK guided by mswd is optimal statistically if data uncertainties are strictly Gaussian, this logic fails once uncertainties are even slightly non-Gaussian (originally Tukey, 1960). In such circumstances, both mswd and least squares methods themselves, like YORK, become unreliable (e.g. Hampel et al., 1986;Huber, 1981).
Rather than being truly Gaussian, data uncertainties may well be Gaussian distributed in their centres but slightly fattailed, distant from the centres. An isotopic dataset looks intuitively acceptable if the data have a central linear "spine" in which scatter is commensurate with stated analytical uncertainty, but this spine is flanked by data of somewhat larger scatter (i.e. excess scatter from the "fat tail"). This excess tral part of the distribution, for example, if the distribution is Gaussian in the centre but having "fat" tails. scatter may originate in the isotopic analysis or as a result of geological disturbance. Age significance in such data manifests primarily via the position of the spine. In the following, we focus on this spine in the data.
Adopting this spine approach, a successful calculation method for a dataset that may not have strictly Gaussiandistributed uncertainties must, firstly, ascertain whether or not such a spine exists in the data -and hence whether calculations yield an isochron or an errorchron. Secondly, in the case of an isochron calculation, the successful method must reliably locate the spine without being perturbed by vagaries in the more scattered data. Classical statistical methods can do neither of these things, tending to be excessively influenced by the data at the extremes of the scatter. However, the field of robust statistics offers calculation methods that can. When a dataset has no excess scatter, reflected in mswd lying within an appropriate χ 2 -constrained confidence interval, such methods can be devised to retrieve identical (or nearly identical) results to classical statistic methods but, in addition, provide reliable age and age-uncertainty estimates in the presence of excess scatter around a spine. This continuity of operation with increasing mswd contrasts with previous approaches and means that the steps in the ISOPLOT line in Fig. 1, which are certainly undesirable, are circumvented. Moreover, the involvement of potentially unreliable leastsquares-based methods, like ISOPLOT model 2, is avoided when the data show excess scatter.

An algorithm for isochron calculations
An algorithm is sought that finds a robust straight line through a two-dimensional linear data trend, while converging with the classical statistical approach of YORK for datasets with consistent scatter (i.e. mswd passes). This section describes the nature of the problem and the theoretical basis for the robust statistical approach that will be adopted. The algorithm adopted 1. determines a preliminary fit of the data, not dependent on vagaries of the data scatter; 2. determines the spine width in relation to this preliminary fit if the spine width is in an acceptable range: isochron, or if the spine width is not in an acceptable range: errorchron; 3. determines a robust fit of the data, starting from the preliminary fit, with this fit converging with YORK for "good" data.
This algorithm is fleshed out below and then evaluated via simulated datasets and applied to a natural dataset. The central calculation in the algorithm is detailed in Appendix B, and a python implementation is provided in Appendix C.

Uncertainty distributions and data fitting
Geochronological datasets are collected on the presumption that the isotopic compositions were established via an "event" the age of which is to be estimated. Given the focus here on data with linear trends, even if the effect of the event is recorded perfectly by the samples analysed -the isotopic compositions lying on a line -the actual data are measured with finite precision and so the data inevitably scatter about the trend. An uncertainty probability distribution can be used to describe the form of the data scatter. Classical statistical methods assume that the underlying uncertainty distribution of a dataset is known, typically taken to be Gaussian. Under the Gaussian assumption, if the analytical uncertainty on the measurements has been appropriately inferred, mswd, the classical statistics parameter used in YORK to validate an isochron, tests that the scatter of data points is consistent with the inferred uncertainties. But, in general, there is no reason to suppose that a given analytical technique generates a truly Gaussian uncertainty distribution. Even small amounts of geological disturbance destroy the optimality of YORK. If the uncertainty distribution is not strictly Gaussian, then classical methods of data fitting become sub-optimal or worse.
While there are many possible non-Gaussian uncertainty distributions, this paper is concerned with a situation commonly occurring in datasets, in which the data points form a linear spine with Gaussian-like scatter, but additional scatter is seen in the tails of the distribution. Such a dataset still encodes meaningful age information in its spine, yet it will typically fail an mswd test due to its departure from a Gaussian distribution. In this work, datasets of this nature are modelled using a contaminated Gaussian uncertainty distribution (Gaussian mixture) (e.g. Tukey, 1960). Such distributions are written c %dN, meaning that with a probability (100 − c) % the distribution involves a standard deviation, σ , but with a probability c % the distribution has a standard deviation, d σ , both with a mean of zero (see Powell et al., 2002;Maronna et al., 2019, Sect. 2.1). An example is 25 %3N, with c = 25 and d = 3, so that with 25 % probability the uncertainty is drawn from N(0, 3 σ ), and 75 % probability drawn from N(0, σ ), with the N(0, s) notation indicating a Gaussian distribution with a mean of zero and a standard deviation of s. Such distributions provide excess scatter suitable for developing and evaluating a robust line-fitting calculation. It does not matter if excess scatter in real data is drawn from a different distribution. Note that with the sample sizes provided by most modern geochronological techniques, it is not necessarily possible to test for Gaussian behaviour, or such departures from Gaussian behaviour, although they may be evident on quantile-quantile plots (see below).

Isochrons and errorchrons
In YORK, assuming that the data uncertainties are strictly Gaussian distributed, the probability distribution of mswd provides bounds that can be used to distinguish isochrons from errorchrons (e.g. Wendt and Carl, 1991). These bounds come from a 95 % confidence interval on mswd, as discussed in Appendix A. Datasets whose scatter give mswd outside the bounds are deemed to be errorchrons, not isochrons. The focus in this paper is on mswd that is too large, indicating excess scatter. Here, mswd is defined with the residuals, r k , the distance in y of the data point, k, to the line, e k , weighted by the uncertainty on this distance, σ e k : with The line being fitted is y = a + b x; data point, k, is {x k , y k }; the analytical uncertainty on x k , σ x k ; the analytical uncertainty on y k , σ y k ; and the correlation between x k and y k , ρ x k y k (see derivation of Eq. B4 in Appendix B). Note that the slope, b, appears in the denominator of r k , as well as the numerator. If, instead, data uncertainties are c %dN, with unknown c and d, or some other contaminated Gaussian distribution, then there is no equivalent of the mswd argument to say which datasets should give isochrons rather than errorchrons. The approach advocated here is to use a measure that reflects whether the dataset has a linear spine of "good" data within it. The measure suggested, s, coined the spine width, is robust, and is defined as with a normalisation constant 2 , 1.4826. Given that s is based on a median, its magnitude depends on that half of the data that have the smallest absolute values of centred r; in other words, those that would define a spine. If the data were in fact Gaussian distributed, it is expected that s should be in a range about 1 in the same way that mswd is, given that r already involves the analytical uncertainties. The larger the s, greater than 1, the less pronounced the linear spine in the data (or the uncertainties have been underestimated). Whereas the 95 % confidence interval (95 % ci) on mswd for Gaussiandistributed uncertainties comes from a well-established probability distribution, with (n−2)mswd ∼ χ 2 n−2 (e.g Wendt and Carl, 1991), the confidence interval on s needs to be found by simulation (see Appendix D), with the simulated datasets just involving Gaussian-distributed uncertainties. The intervals are given in Table 1. Whereas one-sided confidence intervals are advocated in Appendix A (columns marked with an asterisk in the table), two-sided confidence intervals are also given in the table. Using the column with the asterisk, for example, for a dataset with 10 data points (n = 10), the dataset is deemed to yield an isochron if the observed. s is less than 1.43. If s is larger, the dataset gives an errorchron. For isochrons, the age uncertainty is calculated as in Appendix B. For errorchrons, the age uncertainty is not calculated because the data scatter is not consistent with the analytical uncertainties at 95 % confidence, s being larger than the bound.

A robust statistics approach to isochron calculation
We seek a statistical approach to isochron calculation that is robust (e.g. Huber, 1981;Hampel et al., 1986), meaning that it is not excessively affected by outliers in the data, while having desirable statistical properties, for example, good efficiency (see below). In addition, we require the approach to converge to YORK for a "good" dataset, one with a near-Gaussian uncertainty distribution, allowing seamless compatibility with classical data interpretation. The overall approach adopted will be referred to as SPINE, involving the use of spine width for isochron-errorchron distinction, combined with robust line fitting. The line fitting is based on the approach of Huber (1981), as outlined in Maronna et al. (2019, Sect. 2.2.2). Whereas most robust line-fitting methods use the scatter of the data as a scale, data uncertainties having been discarded (e.g. Powell et al., 2002); here, the data uncertainties are used. This is necessary in order to preserve continuity of the results with YORK, in which the data uncertainties are an integral part of the calculation.
In both the Huber approach in SPINE and in YORK, a straight line is fitted to a dataset by minimising a function of the residuals, r k . In the case of YORK, this is just the mswd (Eq. 1). Since isochron data are generally bivariate with correlated analytical uncertainties in x and y, the analytical uncertainty in data point k can be represented as an ellipse as in Fig. 2. The absolute value of the residual for data point, k, r k , is in fact the scaling factor for the size of the ellipse Figure 2. For an example data point, {x k , y k }, the inner ellipse is calculated with the analytical uncertainties, V k , at the 1σ level (in black). Given a line, y = a + b x (in blue), the ellipse must be drawn at the |r k |σ level (in red) to touch the line, in this case |r k | = 5.73. The data point is x k = 529.14, y k = 0.5614, and σ x k = 1.870, σ y k = 0.00127, and ρ x k y k = −0.967. The line is y = 0.8108 − 0.0004764 x. required to expand it or reduce it until it touches the best-fit line (Fig. 2).
The function that is minimised to find the best-fit line can be written ρ(r k ) for both YORK and SPINE. Whereas in YORK, ρ(r k ) = r 2 k for all r k , in SPINE ρ(r k ) = r 2 k near the centre of the uncertainty distribution (as in YORK) but downweights data points for which the absolute value of the residual is greater than a cut-off value, h. Thus, in SPINE, and in Fig. 3, the Huber ρ is In SPINE, for residuals that have an absolute value less than an adjustable constant, h, the contribution to the sum being minimised is the same as for YORK, but it is linear in the residual for larger absolute values. Note that as h becomes larger and larger, SPINE converges to YORK. The value to use for h is discussed in Maronna et al. (2019) 3 .
The iteration developed in Appendix B minimises k ρ(r k ) with respect to the unknown, θ , a two-element column vector, {a, b} T in the line equation, y = a + bx. The iteration is applicable to SPINE and also YORK. As a starting point of the iteration, data are fit for θ with SIEGEL (Siegel, 1982), which is highly robust toward excess scatter. However, SIEGEL is much less efficient than SPINE (see below), so SPINE is a better ultimate estimator. A full iteration is envisaged in Appendix B. Once θ is calculated, the measure of scatter used to distinguish an isochron from an errorchron can be calculated using Table 1. If an isochron is deemed to have been calculated, the uncertainty on θ, V θ , can be found, as outlined in Appendix B.
The SPINE algorithm can be summarised as follows: 1. determine a preliminary fit of the data using SIEGEL; 2. determine the spine width using nmad if the spine width is in an acceptable range, from col. 6 in Table 1: isochron, or if the spine width is not in the acceptable range: errorchron; 3. determine a robust fit of the data, minimising ρ(r k ), with the Huber ρ(r k ) (Eq. 3) starting from the SIEGEL estimate of θ .

Application of SPINE to simulated datasets
Assessing algorithms for data fitting is best done using simulated datasets, so that the true "age" represented by the data is known. In this case, datasets were generated by drawing data points from a range of uncertainty distributions, all centred on a linear trend reflecting an age of 4 Ma. Full details are provided in Appendix D. Two features of the datasets are varied: the number of data points in the dataset and the uncertainty structure adopted, the latter via varying c and d in c %dN. The algorithm is assessed in terms of its ability to retrieve the specified age of the linear trend on which the simulated datasets are built and on the uncertainty in the age. Given that the datasets investigated have fat-tailed contaminated Gaussian uncertainty distributions, the focus is on the effect of excess scatter in the data or, in other words, data scatter over and above what is expected for Gaussian data uncertainties. Nevertheless, a small proportion of datasets do have small scatter, giving s which is below the lower bound for that number of data points. The analysis below compares the results of YORK, applied only to those simulated datasets that lie within the mswd bounds, with the results of SPINE applied to those datasets that lie within the spine width (s) bounds. The greatest majority of the former are included in the latter, e.g. > 97 % for n = 10). Importantly, however, SPINE typically identifies reliable age information in many more datasets than YORK. In Table 2, m % excl and s % excl are the percentage of simulated datasets excluded on the basis of the mswd and s bounds, respectively. Note that, for example, for n = 10, datasets drawn from 5 %3N, in fact have 100(10 0.95 ) = 59.9 % of the datasets having all uncertainties Gaussian, while 40.1 % have at least one uncertainty drawn from 3 times the Gaussian (3N). For 25 %3N, 5.6 % are Gaussian only, and for 10 %10N, it is 34.9 %. The leftmost columns are 2.5 % by definition.
A 95 % confidence interval on age can be found by ordering the list of ages calculated for the datasets, selecting the lower limit at the 2.5 % point in the list, and the upper limit at the 97.5 % point. For datasets that lie within the s bound in SPINE or mswd bound in YORK, the 95 % confidence intervals on ages are essentially the same, even though a much larger proportion of datasets lie within bounds using SPINE than using YORK. A comparison of ages can also be made between YORK and SPINE for simulated datasets that lie outside the mswd bounds. For n = 10 and Gaussiandistributed data, the 95 % confidence interval is ±0.021 Ma for both YORK and SPINE, but for 5 %3N, 25 %3N, and 10 %10N, the 95 % confidence interval on the YORK ages are ± 0.035, ±0.040, and ±0.092 Ma, respectively, while the SPINE ages are ±0.027, ±0.034, and ±0.034 Ma. The differences between the ages given by the two methods, normalised by the uncertainties on the SPINE ages, are ±1.62, ±1.63, and ±5.54. Comparison of the intervals shows that the mswd bounds are relevant to the application of YORK, in that generally it does not work well outside the bounds, while indicating that SPINE is (much) more reliable for age determination for such datasets.
Considering age uncertainty, it might be expected that the uncertainties suffer from the excess scatter in the data in datasets that yield an errorchron with YORK but an isochron with SPINE.This appears not to be the case, but there is a small degradation in the age uncertainties retrieved caused by an unavoidable efficiency loss using SPINE. Efficiency at the Gaussian distribution is the ratio of the variance obtained by the optimal estimator (YORK) divided by the variance using the chosen robust estimator (in this case, SPINE). SPINE involves an efficiency loss at the Gaussian distribution, which is illustrated in Fig. 4 via kernel density estimate (kde) plots of the age uncertainties calculated for simulated datasets with n = 10 and Gaussian-distributed uncertainties. The kde plots are probability distributions akin to smoothed histograms (Wand and Jones, 1995). The red curve is the kde for datasets that have all |r k | < h, as would be given by YORK on datasets with Gaussian-distributed uncertainties. The blue  5. Kernel density estimates for age uncertainty calculated with SPINE on 10 000 simulated datasets with a range of n values and Gaussian-distributed uncertainties. In each case, the kde for those datasets for which all |r k | < h is in red and the overall kde is in black.
curve is the kde for all datasets with at least one |r k | > h. The overall kde, in black, is the kde of all of the datasets in the red and blue kde, in observed proportion, about 30 % to 70 %. The efficiency loss of SPINE is reflected in the displacement of the black curve to slightly higher age uncertainty than the red curve. The relationships shown in Fig. 4 for n = 10 can be seen for other n in Fig. 5. The pairs of red and black lines correspond to and have the same meaning as the red and black lines in Fig. 4. As expected, the distribution of age uncertainties moves towards larger values as the sample size n decreases.
Whereas SPINE involves an efficiency loss at the Gaussian distribution, it performs better -i.e. it is much more robust -than YORK for data from contaminated Gaussian distribu-  6. Kernel density estimates for age uncertainty calculated with SPINE on 10 000 simulated datasets with n = 10 and several uncertainty structures. The kde for those datasets for which all |r k | < h is in red, the kde for datasets with Gaussian-distributed uncertainties is in black, and the kde for all of the datasets for 5 %3N, 25 %3N, and 10 %10N, respectively, are in blue.
tions (e.g. Maronna et al., 2019, Table 2.2). The ability of SPINE to retrieve age uncertainty information for such data varies with the probability and scale of the contamination, as shown in Fig. 6. Not unexpectedly, the more seriously contaminated distributions (25 %3N and 10 %10N) involve a greater displacement of the kde to higher age uncertainty than the more weakly contaminated 5 %3N distribution. Although the displacement of the blue curves from the black curve is real, the ability of SPINE to retrieve age uncertainties from datasets with contaminated distributions is good.

Application of SPINE to a natural dataset
In order to show the real-world utility of SPINE, we consider data for a carbonate flowstone from the Riversleigh World Heritage fossil site in Queensland, Australia (Sample 0708). Isotope dilution U-Pb data for the bulk sample were previously published by Woodhead et al. (2016) providing a model 2 isochron with an age of 13.72 ± 0.12 Ma and mswd of 3.7. The new data presented here were obtained by laser ablation ICPMS on the same sample using methods outlined and published in Woodhead and Petrus (2019). Such datasets are typically larger with little error correlation (rounder error ellipses) but with larger uncertainties than isotope dilution data. These new data define an errorchron under the YORK assumptions, with mswd = 1.68, and a model 2 age of 13.68 ± 0.31 Ma. These data might therefore be rejected under the mswd criterion despite exhibiting a well-developed linear trend. With SPINE, s = 1.24, within the s range for an isochron, the age is 13.69 ± 0.26 Ma (± is 1.96σ ). The data for 0708 are plotted in Fig. 7, with 95 % confidence ellipses on the data points. Further calculations with this sample, comparing the results of our new algorithm with existing approaches are presented in Appendix A.
With 0708, the SPINE and ISOPLOT ages are very similar, but mswd suggests an errorchron, so there is excess scatter on this basis. Quantile-quantile plots can be used to show the nature of the distribution of the residuals that contribute to such excess scatter. Here, Fig. 8, the ordered residuals normalised to nmad(r) are plotted against the quantiles of the standard Gaussian distribution, N(0, 1) (see Fox, 2016, Sect. 3.1.3). The normalisation means that if the residuals are Gaussian distributed, they should plot close to a line of unit slope through the origin. The figure also includes 95 % pointwise confidence intervals (dashed blue lines). The figure suggests that the residuals are in fact effectively Gaussian distributed, lying between the dashed blue lines (and mswd has been too sensitive in flagging that the residuals are not Gaussian distributed). Generally, geochronological datasets do not have obvious outliers -they might be "cleaned" of isolated data points before an age calculation, or the dataset might even be discarded. But many of the simulated datasets from contaminated Gaussian distributions, as used above, do contain outliers. In Appendix E, an example of a simulated dataset with outliers (from 25 %3N) is used to show how YORK and SPINE behave and what a quantile-quantile plot looks like in those circumstances.

Discussion and conclusions
This work was motivated by the belief that many isotopic datasets contain meaningful age information that cannot be identified using classical statistical methods and may therefore be discarded or discounted. In such cases, the age information is contained in a linear spine in the data, but the data also contain scatter that is inconsistent with a Gaussian uncertainty distribution, having fatter tails than Gaussian. A statistical test based on the spine width is devised, akin to using mswd in classical methods, allowing an isochron-errorchron distinction to be made. Many datasets that give isochrons based on the spine width, give errorchrons under the assumption of a strict Gaussian uncertainty distribution. A statistically robust isochron calculation method is able to retrieve this age information and to provide appropriate uncertainty estimates. Calculated ages and age uncertainties are more reliable than ISOPLOT ones for data with excess scatter.
Contaminated Gaussian distributions provide a model for a type of dataset with excess scatter relative to a strictly Gaussian-distributed one. The robust isochron method presented in this work can however be applied in general to data which are Gaussian distributed only in the central spine of the uncertainty distribution, with non-Gaussian scatter occurring in the tails, arising from analytical or geological uncertainty. Indeed, it can be applied to any dataset with a well-developed spine in the data.
In most robust statistics data-fitting approaches, the formal uncertainties output during data measurement are ignored. Instead, the scale used in the data fitting is derived from the scatter in the data, an approach adopted by Powell et al. (2002). The new approach followed here does include the data measurement uncertainties, and this allows the results to converge on those of YORK when the data lack excess scatter. This provides compatibility with older "good" datasets processed using the classical statistical approach but, going forward, allows age information to be extracted from a much wider range of datasets which might otherwise be rejected for having mswd greater than the isochron cutoff.
A problem with SPINE, shared with YORK, is that the effect of high-leverage data is not taken into account. Such data are easily recognised in x-y plots, when a small proportion of the data -even one data point -is separated from the main body of the data along the trend through the data. Data fitting tends to be overly constrained to fit highleverage data, giving them small residuals, even if the best fit of the main body of the data alone would give the highleverage data larger residuals. In 0708, the point at highest x is of relatively high leverage (hat = 0.171). Robust approaches have been developed to handle high-leverage data, e.g. Maronna et al. (2019, chap. 5), but are not yet developed for the situation where the data uncertainties are taken into account, nor for the relatively small datasets that are typical of geochronological studies (cf. Fig. 7). Huber and Rochetti (2009, chap. 7), have a counter view advocating data assessment, rather than aiming for a black-box method to try and automatically safeguard against the potentially deleterious effect of high-leverage data, an approach we suggest here. In the case of the relatively high-leverage data point in 0708, omitting this data point gives 13.75 ± 0.27 Ma (compare row 11 with row 1 in the table in Appendix A), within uncertainty of the age including this data point. Figure A1. Age uncertainty (age ±) plotted against √ mswd under the ISOPLOT protocol for the progressively modified dataset, 0708 (see text). In model 1, the age uncertainty is constant with increasing data scatter (reflected in increasing mswd), until there is a step change in age uncertainty at A when the ± is multiplied by √ mswd. Then, at B, there is another step change with further increase of data scatter to model 2 (see text). The location of the step at A is based on a 95 % confidence interval for mswd (discussed below), whereas the √ mswd position of the step at B is arbitrary. The y axis is drawn at the √ mswd of the actual data, 1.27 (i.e. no modification of the data). See text.

Appendix A: Algorithms and applications to sample 0708
Here, some results of calculations for sample 0708 are collected, used in Fig. 7, along with some related algorithmic details. Fig. 1 The thought experiment sketched in Fig. 1 aimed to show the consequence for ISOPLOT behaviour of the modification of the observed data in a dataset to reduce or increase the scatter about the linear trend. The calculated equivalent of Fig. 1 for sample 0708 is shown in Fig. A1, including also the corresponding SPINE results. In calculating the figure, the modification of the dataset is achieved by first taking the data points with their attendant error ellipses (i.e. covariance matrices) and moving them all in to lie on the linear trend, considered as fixed by a YORK calculation. Then the points and ellipses are considered to be displaced away from the trend. This is "move" in the table below, going from −1, when the points lie on the trend, through 0, with the points as in the original data, to positive when displaced further away from the trend. "Move" varies more or less linearly with √ mswd from −0.133 at the left edge of the figure to 0.064 on the right edge; the spine width changes from 1.08 to 1.32 across the figure. For these calculations, the last line in the dataset is omitted as it is of relatively high leverage (hat = 0.171), not wishing this data point to affect the results.

A1 Thought experiment in
In the figure, extending from the left, through mswd = 1, to A, the ISOPLOT age uncertainty (model 1, i.e. YORK) is constant because the data scatter is consistent with the data uncertainties. Through this range, the SPINE age uncertainty is above the model 1 line because of the efficiency loss embodied in SPINE, as shown in Figs. 4-5. However, after the age uncertainty steps with increasing mswd, to the right of the diagram, the SPINE age uncertainty is smaller than the model 1× and model 2 age uncertainty. This is because SPINE gives an isochron on the basis of spine width up to C at √ mswd ≈ 1.3, whereas model 2 is an errorchron on the basis of the assumption of strictly Gaussian data uncertainties. The small steps in the SPINE age uncertainty line are an artefact of the approximation used in the calculation of the age uncertainty; see Appendix B.
In the top part of Table A1, the results for SPINE and for ISOPLOT are summarised. The column gives the change to the age from the SPINE age, normalised by the uncertainty on the SPINE age. Below the double line in the table are some results from the above thought experiment, Fig. A1.
Additional algorithmic details for ISOPLOT follow next, in part related to the above thought experiment.

A2 ISOPLOT model 1
In the ISOPLOT model 1 calculation, i.e. in YORK, a decision has to be made about the confidence interval on mswd that is used to denote the range of data scatter (i.e. mswd) that is considered to be accounted for by the data uncertainties, without need to either multiply the age uncertainty by √ mswd (i.e. model 1×) or switch directly to an alternative calculation (which is model 2 in ISOPLOT). On the understanding that data uncertainties are correctly assigned, a onesided confidence interval on mswd can be adopted, acknowledging that mswd is not being used to identify the case where assigned data uncertainties are too large or too small. The upper end of the confidence interval on mswd is where excess scatter is considered to start, a conventional choice being derived from a 95 % confidence interval. Note that there is no argument that this should be at mswd = 1 (cf. Dickin, 2005, p37), unless the number of data points is huge. Even for a dataset of 50 data points, the 95 % confidence interval on mswd extends to 1.36. In terms of the probability of fit measure used in ISOPLOT, this is 100 − 95 = 5 %. It might be noted that the naming of probability of fit seems unhelpful -it is clearer to focus on mswd.

A3 ISOPLOT model 2
The so-called error-in-variables (eiv) or measurement-error problem is avoided in YORK because the uncertainty in the x variable is taken into account explicitly. If it was not, then eiv results in the calculated slope being biased downwards and the approach being inconsistent (e.g. Fuller, 1987). In the ISOPLOT model 2 calculation, eiv is avoided, even though the data uncertainties are discarded, by making the slope of the line through the data be the geometric mean of the slopes of ordinary least squares of y on x, b yx , and x on y, 1/b xy . These are with, in this case, the sign of the square root being negative. The calculation in ISOPLOT does not use this explicit formula, instead adopting an algebraic equivalent that allows the YORK iteration to be used.

A4 ISOPLOT robust
In ISOPLOT, there is an option to use a robust isochron calculation method. The two available have high breakdown point but low efficiency (e.g. Huber, 1981, Sect. 1.2.3). The second method (Siegel, 1982) can be considered to supersede the first. In fact, here, SIEGEL is used in the implementation of SPINE as a starting point for the iteration (see Appendix C). The fit of the data for sample 0708 with SIEGEL is given in line 5 of the table.

Appendix B: Iteration in SPINE
The SPINE algorithm involves minimising k ρ(r k ) with respect to the unknown, θ, a two-element column vector, {a, b} T in the line equation, y = a + bx, in order to fit the data. The residual, r k on data point k is defined below, and the function ρ is defined in Eq. (3) in the main text. The SPINE algorithm subsumes YORK.
Writing the kth data point as {x k , y k }, generally the isotopic data used in isochron calculations involve uncertainties in both x k and y k , and commonly the x k and y k are also correlated. These can be represented by a covariance matrix, V k , x k σ x k σ y k ρ x k y k σ x k σ y k ρ x k y k σ 2 y k , in which σ x k is the standard deviation on x k , σ y k the standard deviation on y k , σ x k σ y k ρ x k y k the covariance between x k and y k , and ρ x k y k the correlation coefficient between x k and y k . The covariance matrix can be represented by an ellipse around the data point in an x-y diagram, as illustrated in Fig. 2. The residual, r k , a measure of the distance of the point {x k , y k } to the line, is calculated from the coordinates of the data points, {x k , y k }, and their uncertainties in V k , by in which e k is the distance of the data point from the line, e k = a + b x k − y k , and σ e k is the standard deviation on e k . The standard deviation, σ e k , is calculated by error propagation using V k : σ 2 e k = ∂e k ∂x k , ∂e k ∂y k V k ∂e k ∂x k , ∂e k ∂y k with the term in curly brackets evaluating to {b, −1}. The residual is then In matrix form, the residuals can be written as a column vector, r r = W e e = W e (Xθ − y), with W e a diagonal matrix with kkth element, 1/σ e k . The minimisation of k ρ(r k ) is iterative, starting from a robust but low efficiency estimate of the line, for example, using Siegel (1982). The minimisation is undertaken using the fact that, at the minimum, the derivative of ρ(r k ) with respect to θ is zero. Defining this function, for the Huber (1981) approach, from Eq. (3), is For YORK, ψ(r k ) = r k , equivalent to SPINE with large h. At the minimum of ρ(r k ), In the curly brackets, writing the first derivative, {1, x k }, as the kth row of a matrix X, and writing the first and second derivatives together as the kth row of a matrix X , then the kth row of X is with b the slope of the line. The second column of X is simply the data x values, while the second column of X is the x values on the line where the uncertainty ellipses around each data point would touch it, x . In matrix form, Eq. (B8) can be written as in which ψ(r) is a column vector whose kth element is ψ(r k ). Equation (B9) constitutes two non-linear equations requiring iteration to solve. Two iteration schemes are proposed. The first iteration, in fact used for all the simulations, proceeds directly from Eq. (B9), while the second iteration is in iteratively reweighted least squares form (e.g. Maronna et al., 2019, Sect. 4.5.2). In the first scheme, at iteration i, progressing towards the minimum with θ i = θ i−1 + θ , for |r k | < h, ψ i (r k ) = ψ i−1 (r k )+X k θ /σ e k and ψ i (r k ) = ψ i−1 (r k ) otherwise. This can be written ψ i (r k ) = ψ i−1 (r k ) +ψ i−1 (r k ) X k θ /σ e k , in whichψ(r k ) = ∂ψ(r k )/∂r k , which is 1 for |r k | < h, and 0 otherwise, from Eq. (B7). Substituting into Eq. (B9) gives X T W e (W e I B θ + ψ(r)) = 0, dropping iteration subscripts, with I = diag(ψ(r)) a modified identity matrix with its kkth element equal toψ(r k ). Equation (B10) can then be rearranged to give θ at the current iteration: θ = −(X T W 2 e I X) −1 X T W e ψ(r).
The iteration proceeds until θ approaches 0. The second iteration scheme is the iteration implemented in the Python code. Such iterations are known to be stable, e.g. Maronna et al. (2019). However, the logic of the first scheme is needed to obtain the covariance matrix of θ, below. In the second scheme, a weight function, w(r k ), is introduced so that the resulting equations have a simple least squares form. For Eq. (B9), this can be done by defining w(r k ) = ψ(r k )/r k , so that ψ(r k ) = r k w(r k ), introducing r explicitly into Eq. (B9). Then in which the kkth element of the diagonal matrix, W, is w(r k )/σ 2 e k , combining the weighting from w with the weighting from the data uncertainties, W e . By rearranging Eq. (B12), the repeated substitution solution to Eq. (B9) is given by Iteration is needed as X and W depend slightly on θ , given a sensible starting point for the iteration as provided by SIEGEL. Accepting that an isochron has been calculated, the covariance matrix of θ , V θ , can be calculated by error propagation of the data to θ using the logic of the first iteration scheme. First it is convenient to transform the problem solved by Eq. (B11) to an identical one in which the data are represented by e rather than {x, y}. First note that Eq. (B9) involves r, a scalar for each data point, the uncertainty on e in W e , and the x position on the best-fit line where the ellipse around each data point touches the line, at x , the second column of X . Given the best-fit θ , the y corresponding to x , is y = X θ . Defining y = y − e, data involving {x , y }, rather than {x, y}, is an identical problem via Eq. (B9). In this identical problem, x and y are fixed (have no uncertainty). With this, the change in θ corresponding to Eq. (B9) becomes θ = −(X T W 2 e I X ) −1 X T W e ψ(r), in terms only of X and not involving X. This is akin to the transform in York et al. (2004) to convert the V θ of York (1969) to that of the V θ of Titterington and Halliday (1979). Assuming that θ is approximately linear in each e k around the minimum in ρ(r k ), then At the solution of Eq. (B9), θ = 0, so, by the chain rule using Eq. (B14), Substituting into Eq. (B15), cancelling, and using W 2 e I W −2 e I W 2 e = W 2 e I , V θ = (X T W 2 e I X ) −1 .
The small steps in the age uncertainty (age ±) curve in Fig. A1 arise when diagonal elements in I change from 1 to 0 as mswd increases. In YORK (or if all |r k | < h in SPINE), then I = I and ψ(r) = r. So and the covariance matrix becomes V θ = (X T W 2 e X ) −1 .
If, in addition, all σ x k = 0, then X = X, and iteration is not involved, e is replaced by −y, and θ = (X T W 2 e X) −1 X T W 2 e y, with a covariance matrix of These are the results for fitting data by simple weighted least squares.  In general, with larger contaminated Gaussian datasets, the differences between the SPINE and YORK ages are not dramatic, even if there are obvious outliers in the quantilequantile plots. In 10 000 simulated datasets with n = 50 and 25 %3N, a 95 % confidence interval on is ±1.5.