Improving age-depth correlations by using the LANDO model ensemble

15 Age-depth correlations are the key elements in paleoenvironmental studies to place proxy measurements into a temporal context. However, potential influencing factors of the available radiocarbon data and the associated modeling process can cause serious divergences of age-depth correlations from true chronologies, which is particularly challenging for paleolimnological studies in Arctic regions. This paper provides geoscientists with a tool-assisted approach to compare outputs from age-depth modeling systems and to strengthen the robustness of 20 age-depth correlations. We primarily focused in the development on age determination data from a data collection of high latitude lake systems (50° N to 90° N, 62 sediment cores, and a total of 661 dating points). Our approach used five age-depth modeling systems (Bacon, Bchron, clam, hamstr, Undatable) that we linked through a multilanguage Jupyter Notebook called LANDO (“Linked age and depth modeling”). Within LANDO we have implemented a pipeline from data integration to model comparison to allow users to investigate the outputs of the 25 modeling systems. In this paper, we focused on highlighting three different case studies: comparing multiple modeling systems for one sediment core with a continuous, undisturbed succession of dating points (CS1 “Undisturbed sequence”), for one sediment core with scattered dating points (CS2 “Inconsistent sequence”), and for multiple sediment cores (CS3 “Multiple cores”). For the first case study (CS1), we showed how we facilitate the output data from all modeling systems to create an ensemble age-depth model. In the special case of scattered 30 dating points (CS2), we introduced an adapted method that uses independent proxy data to assess the performance of each modeling system in representing lithological changes. Based on this evaluation, we reproduced the characteristics of an existing age-depth model (Lake Ilirney, EN18208) without removing age determination data. For the multiple sediment core (CS3) we found that when considering the Pleistocene-Holocene transition, the main regime changes in sedimentation rates do not occur synchronously for all lakes. We linked this behavior to 35 the uncertainty within the modeling process as well as the local variability of the sediment cores within the collection.

The objective of this paper is to reduce the effort to apply different methods for determining age-depth correlations and to make their results comparable. We provide a tool to link five selected modeling systems in a single multilanguage Jupyter Notebook. We introduce an ensemble age-depth model that uses uninformed models to create data-driven, semi-informed age-depth correlation. We demonstrate the power of our tool by highlighting three case studies in which we examine our application for individual sediment cores and a collection of multiple sediment 85 cores. Throughout this paper, the term "LANDO" refers to our implementation, which stands for "Linked age and depth modeling".
In this paper, we use both published and unpublished age determination data from 62 sediment cores from high latitude lake systems (50° N to 90° N). This unique collection of age determination data allows us to thoroughly test LANDO by examining changes of sedimentation rates over time for various modeling and lake systems. We 90 provide an overview on the acquired metadata in the repository mentioned in the "Code and data availability" section. The harmonization of the acquired data follows the conceptual framework described in Pfalz et al. (2021).

Methods
A key element in our data-science based approach for developing comparable age-depth correlations was to facilitate the use of modeling systems independent from their original proprietary development environment. A 95 multi-language data analysis environment, such as the SoS notebook (Peng et al., 2018) or GraalVM (Niephaus et al., 2019), provides an interface that enables the comparison of modeling systems without being limited to one programming language or environment. Our implementation used the SoS notebook as its backbone. The SoS notebook is a native Python-and JavaScript-based Jupyter Notebook (Kluyver et al., 2016), which extends to other languages through so-called "Jupyter kernels". We developed our implementation with the focus on four languages 100 and their respective kernels: Python, R, Octave, and MATLAB. This selection allowed us to use the most common modeling systems.
In our study, we were able to connect five of the above-mentioned modeling systems in the SoS notebook, namely: Bacon, Bchron, clam, hamstr, and Undatable. The workflow of LANDO consists of five major components: Input 110 -Preparation -Execution -Result aggregation -Evaluation of model performance.

Input
To work with LANDO users need to provide age determination data, e.g., data from radiocarbon or OSL dating, and associated metadata as listed in Table 1. We developed two import options for the users: through a single spreadsheet or a connection to a database. For this study, we used a connection to a PostgreSQL database, which 115 we developed after the conceptual framework as described in Pfalz et al. (2021), via the Python package "SQLAlchemy" (Bayer, 2012). We divided age determination input data into two attribute categories: necessary and recommended. The category "necessary" focused on the prerequisites of the individual modeling systems as https://doi.org/10.5194/gchron-2021-40 Preprint. Discussion started: 2 December 2021 c Author(s) 2021. CC BY 4.0 License.
well as project-related attributes, such as unique identifiers, i.e., "measurementid", "labid". However, a larger comprehensive set of descriptive metadata helps a better understanding of the data (Cadena-Vela et al., 2020;120 Thanos, 2017). We added four additional attributes from the category "recommended" to facilitate the interpretation of age-depth models regarding their age determination data. and core length, to ensure comparability to our database implementation. We provide an example spreadsheet with all attributes in the expected format in the repository mentioned in the "Code and data availability" section of this paper.

Preparation
The preparation component consisted of two separate steps. First, we checked each age determination dataset, whether a reservoir effect was influencing the radiocarbon data. In the absence of a known reservoir age or recent surface sample, we used available radiocarbon data points and a fast-calculating modeling system to predict the age of the upper most layer within a sediment core. In our approach, we used the hamstr package with a default 135 value of 6000 iterations. We then compared the predicted value for the upper most layer with the year of the core retrieval, i.e., our target age. We accounted for an uncertainty in the estimate by allowing an extra 10% error between predicted age and target age. If a gap between predicted and target age is observable, then we assumed a reservoir effect is present. We calculated the reservoir effect by subtracting the target age from the mean predicted age, whereas the associated error we based on the two-sigma uncertainty ranges of the prediction. LANDO allow 140 users to add the calculated reservoir age and its uncertainty range to the corresponding attributes ("reservoir_age" and "reservoir_error"). Depending on the choice of the user, this addition affects either all radiocarbon samples or only bulk sediment samples, or users completely discard the output for the subsequent modeling process.
As second step in the preparation component, we built a module that automatically changes the format of the available data to the individually desired input of each of the five modeling systems implemented in LANDO. We 145 primarily used the Python package "pandas" (Reback et al., 2020) for the transformation within the module. We transferred the newly transformed age determination data to the corresponding programming language for agedepth modeling using the built-in "%get" function of SoS notebook.

Execution
We developed LANDO with the specific ability of create multiple age-depth models for multiple dating series 150 from spatially distributed lake systems. Hence, reducing overall computing time was one of our highest priorities.
We achieved this reduction by applying existing parallelization back-ends for both R and Python, such as "doParallel" (Microsoft Corporation and Weston, 2020a) and "Dask" (Dask Development Team, 2016), respectively. For each modeling system in R, we wrote a separate script that takes advantage of the parallelization back-end "doParallel". Besides the individual modeling system packages, we made use of different R libraries,
We neglected the use of parallelization for the Undatable software in MATLAB, since even the sequential execution for several sediment cores in our test setup was on the order of a few minutes. However, we achieved comparable results with Undatable in Octave using the parallelization package "parallel" (Fujiwara et al., 2021).

160
As mentioned before, the selection of model priors and parameters has an impact on the modeling outcome, if no objective prior knowledge exist. To lower our impact and to avoid introducing biases in the modeling process, we used the default values from each modeling system as our own default values (Blaauw et al., 2021;Blaauw, 2021;Parnell et al., 2008;Dolman, 2021;Lougheed and Obrochta, 2019). In our adaptation of clam, the parameter "poly_degree" controls the polynomial degree of models for type 2, while the parameter "smoothing" controls the degree of smoothing for type 4 and 5. In the original version of clam, users adjust both parameters with the single option "smooth" (Blaauw, 2021). Furthermore, the default value for "ssize" within the original version of Bacon is 2000. We increased this value to 8000 to ensure good MCMC mixing for problematic cores (Blaauw et al., 2021). In case the user has in-depth knowledge about his sediment core and wants to change certain values, we opted for making crucial parameters accessible within the SoS notebook outside of the executing scripts. Table 2 170 provides an overview of all values which users can access and change for the individual systems. However, we limited the access to some parameters for operational purposes, such as the number of iterations or the resolution of the output.

Result aggregation
After every model run, we received 10000 age estimates (also known as "iterations" or "realizations") per centimeter from each modeling system for every sediment core. We transferred these results back to Python using the built-in "%put" function of SoS notebook, where in the next module, we calculated per centimeter the median and mean age values as well as one-sigma and two-sigma age ranges. For the summarizing statistics, we used 180 standard Python libraries such as "pandas" (Reback et al., 2020) and "numpy" (Harris et al., 2020). We appended the model name as attribute to the statistics to allocate each result to its modeling system. In addition, we implemented a module, which helped us to push the aggregated result to our initial database to reuse in follow-up research projects. In a similar approach to the input component, we established the connection to our designed PostgreSQL database via the package "SQLAlchemy" (Bayer, 2012).

185
Similarly, we used the 10000 age estimates per centimeter for calculating the sedimentation rates. Our calculation used three different approaches to calculate sedimentation rates: "naïve", "moving average over three depths", and "moving average over five depths". Table 3 lists the appropriate equations for each approach. The user can decide which one of the three approaches best applies to the individual sediment record. We summarized the output into the basic summarizing statistics (mean, median, one-sigma ranges, and two sigma ranges) accessible to the users, core for sedimentation rate calculation, then LANDO will automatically execute the sedimentation rate calculation in parallel using the "Dask" back-end (Dask Development Team, 2016) and the "joblib" Python package (Joblib Development Team, 2020).

Approach Equation
Moving average over three depths sedimentation rate ( Moving average over five depths sedimentation rate (

200
To evaluate the performance of each modeling system, we looked at three different case studies: Case Study no. 1 -Comparison of multiple modeling systems for one sediment core with a continuous, undisturbed sequence of dating points ("Undisturbed sequence" -CS1) Case Study no. 2 -Comparison of multiple modeling systems for one sediment core with a disturbed sequence (including inversions) of dating points ("Inconsistent sequence" -CS2)

205
Case Study no. 3 -Comparison of sedimentation rate changes for multiple sediment cores ("Multiple cores" We examined both sedimentation rate and age-depth modeling results in each of the three case studies. For the first case study, we selected the sediment core EN18218 (Vyse et al., 2021) to showcase the generated output of LANDO. The 6.53 m long sediment record obtained from Lake Rauchuvagytgyn, Chukotka (67.78938° N,

210
168.73352° E, core location water depth: 29.5 m) during an expedition in 2018 consisted of 23 bulk sediment samples used for radiocarbon sampling. The authors determined an existing age offset of 785 ± 31 yr BP (years Before Present, i.e., before 1950 CE), which we used in our modeling process as well.
As counterexample for the second case study, we have chosen the sediment core EN18208 (Vyse et al., 2020).
During the same expedition to Russia's Far East in 2018, scientists recovered this EN18208 core from Lake Ilirney, corresponding references. Both cores are also part of the "Multiple cores" case study with a total of 62 sediment 220 cores ( Figure 1).

Figure 1
-Map of geographical distribution of lake sediment cores used for our study (triangles, n = 62). The orange triangles (n = 39) represent sediment cores for which data of age determination were available either through a corresponding publication or a publicly accessible database, e.g., Pangaea (Diepenbroek et al., 225 2002). Purple triangles (n = 23) indicate unpublished or referenced datasets, such as those provided by the PaleoLake Database . ArcGIS Basemap: GEBCO Grid 2014 modified by AWI. The outer ring in the graphic corresponds to 45° N.

Numerical combination of model outputs
To introduce the ensemble model in LANDO, we combined the outputs from all five modeling systems into one 230 composite model. We considered the outermost limits (min. and max. values) of all confidence intervals (onesigma or two-sigma) as our boundary for the ensemble model. By taking these outermost limits into account, we artificially increased the area of uncertainty covered by the ensemble model, but we made sure that we were representing all possible outcomes and maximizing the likelihood of including the true chronology. We also included a weighted averages ( ̅ ) of the age estimates and sedimentation rates, which we calculated using the with m being the number of participating modeling systems, n as the total number of iterations as well as ��� and representing the median value (either for age estimate or sedimentation rate) and the associated number of 240 iterations from each modeling system, respectively. In some cases, the weights from each modeling system are equal, as they produce the same number of iterations. Then we can simplify Eq. 1 to represent the arithmetic mean: For our "Multiple cores" case study (CS3), we additionally had to ensure comparability of sedimentation rates between sediment cores, since each model assigns a different age value to its sedimentation rate value per 245 centimeter. Therefore, we binned sedimentation rate results into 1000-year bins for each age-depth model as well as the ensemble model and calculated the weighted averages and their confidence intervals within these bins.
Inside LANDO, users can change the initial bin size of 1000 years to the desired resolution.

Detection and filtering of unreasonable models
For cases in which age-depth models do not agree with each other, e.g., "Inconsistent sequence" case study (CS2),

250
we have built in the option of importing data from measured sediment properties, also known as proxies. Because of compositional and density variations of deposits, changes in sedimentation rates imply changes in the deposition of proxies (Baud et al., 2021;Biskaborn et al., 2021;Vyse et al., 2021). By including appropriate, independent proxy data on lithological changes within the sediment core, we can weight each model based on its performance to represent these variations in sedimentation rate. Users should provide the independent sediment proxy data as 255 file with two columns, namely "compositedepth" which should be the measurement depths (as mid-point centimeter below sediment surface), and "value" representing the values of the proxy. This simplification makes it possible to import different available proxies or statistical representations of proxy data, i.e., results from ordination techniques (PCA, MDS, etc.), into the optimization process and to visualize the behavior of the agedepth models in comparison to these proxies.

260
In order to evaluate the performance, we adapted the fuzzy change point approach by Hollaway et al. (2021) to work with our input data and desired outcome on a depth-dependent scale instead of a time series. Similarly to Hollaway et al. (2021), our approach firstly detected change points within the proxy data and each modeling system output by fitting an ARIMA model to the data and then extract change points by using the "changepoint" R package (Killick and Eckley, 2014;Killick et al., 2016) on the residuals of the ARIMA model. If we found no change points 265 in the proxy data via this approach, we applied the "changepoint" R package on the raw independent sediment proxy data instead. Through the additional bootstrapping process introduced by Hollaway et al. (2021), we were able to set up confidence intervals for the extracted change points. Subsequently, we searched for the intersection between the change points plus their confidence interval for each age-depth model with the independent proxy data. After converting the change points for both age-depth model and independent proxy data into triangular fuzzy 270 numbers, we obtained similarity scores using the Jaccard similarity score of the fuzzy number pairs as described in Hollaway et al. (2021). The similarity score can reach numbers between zero (no match) and one (perfect match).
However, the threshold of excluding an age-depth model from the generated combined model depends on the imported proxy data and number of detected change points. Therefore, the user can set the threshold accordingly https://doi.org/10.5194/gchron-2021-40 Preprint. Discussion started: 2 December 2021 c Author(s) 2021. CC BY 4.0 License.
to their proxy within LANDO, but we have implemented the default value for this threshold to 0.1, which 275 corresponds to an overlap of 10% of the change points between model and proxy data.
In addition to the criterion of preparing the proxy data in the format of "depth vs. value" in a separate file, we suggest using a proxy with a high resolution. As a high-resolution proxy, we define a proxy with more than 50 measurements per meter of core length. For our "Inconsistent sequence" case study (CS2), we used highresolution elemental proxy data from XRF (X-ray fluorescence) measurement as our independent proxy data. As 280 our evaluation element to optimize the age-depth models, we selected zircon ("Zr"), which itself is an indicator for minerogenic/detrital input (Vyse et al., 2020 and references therein). The zircon proxy data of EN18208 has a resolution of 200 measurements per meter of core length.
To achieve a realistic comparison between sediment cores in the "Multiple cores" case study (CS3), we looked at the individual age-depth model outputs for each sediment core to determine whether an optimization step was 285 required. We have only selected published sediment cores (n = 39) so that we can refer to lithological boundaries from the original publication. During the analysis, we saw that nine sediment cores needed to be optimized due to strong inconsistencies between models over the entire length of each core. In twelve cases, where models within the lower section of the cores did not match, we considered proxy-based optimization to improve the model outcome when high-resolution data was available. 290

Sedimentation rate development over time
To identify similar temporal shifts in sedimentation regimes in our case study "Multiple cores" (CS3), we examined our data collection of 62 sediment cores regarding a general tendency in sedimentation rate shifts. First, we considered the 11 700-yr BP (Before Present, i.e., before 1950 CE) boundary as our marker for the change between Holocene and Pleistocene to separate the datasets (Rasmussen et al., 2006;Lowe and Walker, 2014;295 Walker et al., 2008). We selected this marker because numerous studies suggest a general difference in sedimentation regimes between these periods (e.g., Brosius et al., 2021;Vyse et al., 2021;Baumer et al., 2021;Bjune et al., 2021;Kublitskiy et al., 2020;Müller et al., 2009;Wolfe, 1996). As some of the models were below the 11 700-yr BP marker, the calculation of the mean sedimentation rate for the Pleistocene featured only a subset of sediment cores (total number of sediment cores with measurement in Pleistocene: 20). Then, for each age model 300 of the sediment cores in the subset, we used the two-sigma ranges around 11 700 yr BP to determine whether the maximum absolute change occurred exactly at 11 700 yr BP or around our set marker. For this investigation, we changed the bin size to 100-year bins to allow comparison between each modeling system and the combined models. Using maximum from the interquartile ranges of the two-sigma ranges for each model (see supplementary material Figure S3), we defined the observation period from 8700 to 14 700 yr BP (corresponds to a range of ± 305 3000 years). We then checked the data within the time span to see where the maximum change in sedimentation rate occurred. If the calculated age for the new marker was at the edge of our time span, we iteratively increased the outer limit by 100 years (up to a maximum of 18 000 yr BP) to see if the calculated age still reflected the maximum absolute change. We then used the newly defined marker to calculate the mean sedimentation rate for before and after the marker.
To display the results from age-depth modeling and sedimentation rate calculation, we decided to create our own plots, instead of reusing the plots from each individual modeling system. Our plot header contains the unique CoreID; additionally, the header indicates whether the user decided to apply a reservoir correction on the radiocarbon data or not. Our single core plots consist of two main panels: On the left-hand side, the panel shows 315 the results from the age-depth modeling process with the calibrated ages (in calibrated years Before Present, i.e., before 1950 CE) on the x-axis and the composite depth of the sediment core (in centimeter) on the inverted y-axis.
On the right-hand side, the panel displays the result from the sedimentation rate calculation (in cm/yr, centimeter per year) on the x-axis plotted against the same composite depth on the inverted y-axis. For better readability of the strong variability of sedimentation rate, we used the log scale for the x-axis of the right panel. Generally,

320
LANDO draws the ensemble age-depth model and sedimentation rate in grey with the weighted average as dashed line.
For all models, LANDO will display the median values for age and sedimentation rate as solid lines. Both panels further display the corresponding one-sigma range and two-sigma range per centimeter for each model. Depending on the user's selection, users can plot both sigma ranges, only one of the two sigma ranges, or just the median 325 ages. To include age determination data within the plots, LANDO internally calibrates the radiocarbon data with the "BchronCalibrate" function of the Bchron package (Haslett and Parnell, 2008;Parnell et al., 2008) with either the IntCal20  or Marine20 (Heaton et al., 2020) calibration curve. Per default, the left panel contains each calibrated age determination as single point with its uncertainty as error bars.
If users decide to filter out unreasonable age-depth models, similar to "Inconsistent sequence" case study (CS2),

330
we added the option to plot the independent proxy data and therefrom derived lithology as an additional panel on the left-hand side for a better interpretability. Further, LANDO highlights the boundaries of lithological change and its confidence interval in both sedimentation rate and age-depth model plots. The optimized plot includes a goodness-of-fit for each involved modeling system to represent the change points at the bottom of the plot.
When using LANDO for multiple sediment cores, the overall plot holds for each sediment core the results from 335 the binned weighted average sedimentation rate calculation (as median sedimentation rate in cm/yr, centimeter per year) against the selected age bins (in calibrated years Before Present, i.e., before 1950 CE) for each modeling system. This visual illustration allows user to compare multiple sediment cores based on the time axis.
For people with color vision deficiency, we incorporated the extra option to plot the resulting age-depth plots with different line styles and textures to support the visual differentiation between each model. Figure S4 in the 340 supplementary material shows the color-blind friendly output created by LANDO. With LANDO we want to support inclusivity in science, but we look forward to feedback from the community on how we can improve LANDO in this regard.

345
All five age-depth models were able to produce an age-depth correlation for sediment core EN18218 ("Lake Rauchuvagytgyn") with only small diversions in between some of the calibrated ages. Figure   with their uncertainty as error bar, which were calibrated using the IntCal20 calibration curve

"Inconsistent sequence" -Case Study no. 2
For the second case study, four out of five modeling systems produced an output for sediment core EN18208 ("Lake Ilirney"). The modeling system clam was unable to produce an age-depth model for this core. Figure 3 shows the visual outputs with all models in panel (a) Vyse et al. (2020).
During the optimization process, our adapted algorithm located four lithological boundaries with its uncertainty 430 range from the independent proxy data: 189.5 cm (182 -192.5 cm), 646 cm (638 -657 cm), 890.5 cm (874 -912 cm), and 1051.5 cm (1043 -1061.5 cm). We found the highest matching score from the optimization for Bchron (Score: 0.0796). Table 4 shows the average sedimentation rate for each proxy-derived lithological unit (PLU) of the ensemble model of EN1808.

"Multiple cores" -Case Study no. 3
In contrast to the previous case studies, this case study focused on studying development of sedimentation rates 440 over time, with the emphasis on the transition from the Holocene to the Pleistocene. We used age determination data from 39 published sediment cores to show the standard output of LANDO for multiple sediment cores, while using both published and unpublished age data for the analyses. Figure 5 shows the ensemble models with weighted average sedimentation rates binned into 1000-year bins from our multi-core investigation with 39 published sediment cores (see Figure S1 for the individual models in the supplementary material). We set the 445 boundaries from 0 to 21 000 cal yr BP within these figures to cover the time span from the present to the Last Glacial Maximum (LGM) (Clark et al., 2009). Below the number for each core in Figure 5 are the proxies used for their optimization. In 19 out of 62 cases within our entire collection, the ensemble model was based on four out of five models, as neither clam or Undatable was able to find a suitable age-depth model (for more details, please see Table S1

Figure 5 -Optimized combined models for 39 published sediment cores displayed as weighted average sedimentation rate (in centimeter per year, cm/yr -y-axis) binned into 1000-year bins (in calibrated years
Before Present, cal. yr BP, i.e. before 1950 CE -x-axis) for the last 21 000 years. Dashed line represents the 460 weighted average sedimentation rate, whereas the grey areas are the respective two-sigma ranges. Each grid cell contains the unique core identifier of each involved sediment core. In seven cases, the letters below each number give the name of the independent proxy used for optimization process.
To visualize the difference in sedimentation rates between two neighboring and fundamentally different environmental settings, i.e. Pleistocene glacial and Holocene interglacial, we used the datasets that were split at 465 the Holocene-Pleistocene boundary at 11 700 yr BP. Figure 6 shows the mean sedimentation rate for Holocene and Pleistocene for each model with its one-sigma uncertainty. Figure     the one-sigma range of sedimentation rate within each dataset for each model and sediment core. In addition, filled circles represent the mean value for the optimized models.

Assessment of different case studies
By comparing the cases for the two single sediment cores, it becomes clear how age-depth correlations may diverge depending on the individual modeling system and its treatment of available dating points (cf. Wright et al., 2017;Trachsel and Telford, 2017;Lacourse and Gajewski, 2020). In the case of EN18218 ("Undisturbed sequence" -505 CS1), all five implemented modeling systems yield an agreeing and continuous chronology. However, the two radiocarbon dates at 81.25 cm and 114.75 cm have significant impact on the model's interpretation for these depths. Vyse et al. (2021) argued that these two dates are outliers resulting from reworking and mixing effects within the sediment column. According to the authors, no additional proxy data from EN18218 would support the immediate increase in sedimentation rate for these depths and hence, they excluded both dates from the modeling 510 process. Because we are not considering any additional proxy data to evaluate age-depth models in their geoscientific context, but rather include all provided age determination data into the modeling process, the consideration of these two radiocarbon dates on the basis of all available models leads to higher sedimentation rate. Nonetheless, the comprehensive application of the different modeling systems may help to identify doubtful dating points.

515
We saw a strong disagreement between the modeling systems in the case of sediment record EN18208 ("Inconsistent sequence" -CS2), which we expected prior to the execution of our application, due to the scattered dating points in the original data. Vyse et al. (2020) linked this scatter of age data points observed in the interval between 282 and 755 cm of EN18208 to the redeposition of older carbon. They implied that to produce reliable age-depth model they had to exclude both OSL and radiocarbon dating points for these depths. However, our 520 optimized combined model agrees with their established age-depth model and can reproduce the characteristics of the existing model by Vyse et al. (2020), without removing dating points. In addition, in three out of four cases, our proxy-derived lithology with its uncertainty matches the lithological boundaries set by the authors of the EN18208 study, according to criteria based on acoustic sub-bottom profiling. Only the first original boundary (196 cm) is outside our confidence interval from 182 cm to 192 cm. We still showed that our approach could set logical 525 boundaries for sediment cores by solely relying on high-resolution proxy data.
Even though LANDO can produce age-depth models for multiple sediment cores ("Multiple cores" -CS3), we must assume limitations in the geoscientific validity for some of the results. In a few cases, an optimization of agedepth models with independent proxy data would have been necessary, but such independent data were inaccessible or did not exist. As for these cases age-depth correlations between implemented modeling systems 530 seem to disagree (see Figure S1 in the supplementary material), the results from our combined model might overor underestimate the true sedimentation rate. On the other hand, optimization using proxy data can reduce these biases. For instance, during the examination of the Holocene and the Pleistocene sedimentation rates (Figure 6), we noticed that one sediment core (PG1228) had an extremely high mean sedimentation rate for the Holocene dataset in Undatable. Similar to the second case study ("Inconsistent sequence" -CS2), we found scattered age 535 data points for this sediment core, which influenced the modeling process of Undatable. Further, the result then affected our combined model by increasing the overall sedimentation rate for the Holocene in this core. However, https://doi.org/10.5194/gchron-2021-40 Preprint. Discussion started: 2 December 2021 c Author(s) 2021. CC BY 4.0 License.
LANDO identified the Undatable model as an outlier based on the lithology established through independent TOC proxy data. The optimized model then agreed well with the original publication by Andreev et al., (2003), which further increased the validity of our approach. Our findings suggest that high-resolution proxy data should 540 accompany geochronological studies to enable a more concise and realistic assessment of the development of sedimentation rates over time in high latitude lake systems.
The detection of sedimentation rate change as indicator for the Holocene-Pleistocene boundary yielded contrasting results. While the results from hamstr were closest to the 11 700-year boundary, all other modeling systems place the largest change in sedimentation rate either before or after 11 700 yr BP. We hypothesize that three factors may 545 have influenced all model results. (1) The age uncertainty (one-sigma range) within each individual model varied on average between 1000 and 3000 years for the period of 11 600 to 11 800 yr BP ( Figure S3 in the supplementary material). This wide range of uncertainty does not provide confidence in pinpointing the boundary to an exact time slice. We expect that a higher amount of dating points close to the Holocene-Pleistocene boundary could constrain the models (Blaauw et al., 2018;Lacourse and Gajewski, 2020;Trachsel and Telford, 2017), which would lead to 550 a better estimate of the boundary.
(2) The age output for each model is not evenly distributed, which means that in the period from 11 600 to 11 800 yr BP there are different numbers of observations for each core and each modeling system. We took this behavior into account by using binning (Alasadi and Bhaya, 2017). Otherwise, an interpolation between both age and sedimentation rate values could lead to potential biases in the interpretation.
(3) While we assumed in our first setup that the main sedimentation rate change would occur at 11 700 yr BP 555 consistently for all sediment cores (Figure 6), we cannot rule out the possibility that the sedimentation rate has changed significantly at different times for different lake systems. As our data collection covers a large area both in latitude and longitude (Figure 1), the variability between the models indicate the local variability between the climate and lithological preferences of the lake catchment for the involved sediment cores (e.g., Lozhkin et al., 2018;Finkenbinder et al., 2015;Anderson and Lozhkin, 2015;Kokorowski et al., 2008;Biskaborn et al., 2016;560 Courtin et al., 2021).

Design of LANDO
From the beginning of the development of LANDO, we decided to integrate most of the default settings for each modeling system as default values (Table 2). Regional studies, such as the one performed by Goring et al. (2012), have shown that specific prior information for the Bayesian modeling systems are needed to best fit the models to 565 lakes within a geographical area. Without this regional information, changing settings within the modeling system to an arbitrary higher or lower value without considering the regional diversity could lead to under-or overfitting, if the constraints are too loose or too strict (Trachsel and Telford, 2017). For the special case that users have indepth knowledge for one lake or multiple lake system, users can easily adapt these parameters within LANDO, as we have made these settings accessible in the Jupyter Notebook itself.

570
Part of the reason we made this decision was that we acquired external age determination datasets where we may not necessarily have all the essential information to specify each model. But we also wanted to simplify the process for users who do not have in-depth modeling knowledge. By using the default values, we can compare models based on their ability to work with the available data. On the other hand, we are sure that the developers have set their default values based on systematic testing. Since we did not tune the age-depth models to the existing core, with the available age determination data. By combining these "uninformed" models into one model, we have created an ensemble model that we consider to be data-driven and "semi-informed".
The advantage of this data-driven, semi-informed model approach is that we are reducing the risk of overfitting by considering the uncertainty of all modeling systems. This allows us to reevaluate existing geoscientific 580 interpretations with larger uncertainty by taking advantage of the ensemble outcome. Additionally, we found that the more information is accessible to generate age-depth models, the more accurate and less uncertain these models become. A higher density of age determination along the depth of the sediment core is desirable for future drilling campaigns (cf. Blaauw et al., 2018).
The disadvantage arises in our second case study ("Inconsistent sequence" -CS2) and the multi-core investigation

585
("Multiple cores" -CS3). For both cases we needed the optimization step to narrow down the most suitable agedepth models for each sediment core, since the unoptimized uncertainty band was otherwise too wide for a clear interpretation. The optimization requires additional and independent proxy data, which are not available for some of our cores, especially for sediment cores obtained some decades ago. Our optimizing step is therefore mainly suitable for recently retrieved and analyzed sediment cores.

590
In addition to the assessment of age-modeling quality, we also checked the time and effort to conduct dating routines. We saw that Bacon had the highest runtime overall in all three case studies of our study design, which we link to our adjustment of the "ssize" parameter from 2000 (per default) to 8000 within the application. We increased this value to ensure good MCMC mixing for problematic cores, as suggested by Blaauw et al., (2021), as well as to guarantee we had enough iterations for our summarizing statistics to compare with other modeling 595 systems. If users decide to reduce the value of "ssize", we implemented an iterative process, which checks whether Bacon produced enough iterations. If this is not the case, then LANDO will iteratively rerun the same sediment core with a higher "ssize" to produce 10 000 iterations.
One unique feature of our application is the predominant use of parallelization within the age-depth modeling of multiple sediment cores. For instance, we used the "Dask" back-end for our sedimentation rate calculation. The

600
advantage over popular Scala-based "Apache Spark" and its Python interface "PySpark" (Zaharia et al., 2016 the likelihood of a more probable mean age for their sediment core.

Technical specifications of LANDO
In the further course of development, we decided to limit the resolution of the age-depth correlations. Using a resolution of one-centimeter increments allows us to match most proxy measurements from each sediment core with our age-depth models, apart from high-resolution measurement, such as XRF measurements. To allow a 620 matching with high-resolution proxy data, we tested for a higher resolution of 0.25 cm for our application. In the single sediment core cases (CS1 and CS2), this change did not affect the workflow of LANDO. In turn, the "Multiple cores" case (CS3) ran into memory issues. Since the SoS notebook and our parallel back-ends store the result data frames in memory, expanding the resulting data frames to a 0.25 cm resolution causes a fourfold increase in memory use, which limits our capability to run our application on a single laptop. As an intermediate 625 solution, we stored the results from each parallelization worker on disk to free the memory and performed combining operations later. Based on this experience, we recommend working with data centers or increasing the available main memory (RAM) of the operating computer for multi-core studies with expected high-resolution output.
Another advantage of parallelization is that most modeling systems only run on one CPU/thread. Nowadays,

630
however, both personal computers and data centers are made up of multiple CPUs/threads. Especially for larger multi-site studies, our application has the advantage of cutting the overall computing time by running each modeling system on multiple CPUs/threads simultaneously, even for personal computers. In comparison to serial execution of multiple models on one CPU/thread, which would take several hours, our parallel execution reduced the computing time per modeling system by a factor up to four. When considering that our setup consisted of six 635 CPUs (12 threads) and 16 GB RAM, user can even further increase this factor by using larger computing facilities.
Sediment core length is the most limiting factor that determines the overall computing time in our application.
However, we want to ensure that users can model each sediment core over its entire length to match proxy data with the correct age-depth correlations. Within our LANDO system, we faced this problem by using extrapolation to calculate ages beyond available dating points. The exception here is the modeling system Undatable, which 640 models only between the first and last dating point, as these two dating points act as anchors for the bootstrapping process (Lougheed and Obrochta, 2019). As a result, we saw the sedimentation rate dropping twice to zero at the end of the sedimentation rate calculations. We link this behavior to the end of the individual modeling processes of Undatable as well as the other implemented systems.
Extrapolating the age-depth models beyond age determination points always bares the risk that the extrapolated 645 dates do not reflect the actual age. The implemented modeling systems account for this circumstance by increasing the uncertainty for these undated regions (Blaauw, 2010). While we are aware of this potential issue, we wanted to allow users to take advantage of the full age-depth coverage for their sediment core. Blaauw et al. (2018) pointed out in their findings that "most existing late-Quaternary studies contain fewer than one date per millennium" and recommended to increase the number of dating points to "a minimum of 2 dates per millennium". This 650 recommendation would further decrease the need of extrapolation and reduce the overall uncertainty of age-depth models. We agree that more age control can improve the age-depth modeling results, but until the associated costs to analyze organic material for radiocarbon dating do not decrease more significantly (Hajdas et al., 2021;Zander et al., 2020), we recommend LANDO as tool to improve age-depth modeling.

655
During the development of our approach, we realized that some programs were not executable or parallelizable under the current circumstances. For instance, we tested OxCal 4.4 as stand-alone version on Windows with NodeJS (version 12.13.1.0) and the R package "oxcAAR" (Martin et al., 2021) within our application. In the case of EN18208, execution duration was above 3 hours until the notebook lost connection to the OxCal interface.
Furthermore, some cores never fully reached convergence within OxCal. We tried adapting our set-ups including 660 changing the internal constraints, i.e. placement and number of boundaries, or using different depositions models, i.e. alternating between sequential model ("Sequence()") and Poisson-process deposition model ("P_Sequence()").
According to Bronk Ramsey and Lee (2013), the long-term plan of OxCal is to make the entire source code openly accessible, which we fully support. An open source code would allow us to identify the current bottleneck so that we could implement OxCal in a future release.

665
To determine the most fitting age-depth model through the clam modeling software, we added the "best fit" option to LANDO by default. The "best fit" option utilizes the negative log fit results from all clam outputs and identifies the fit with the lowest result as best fit. We included two further exclusion criteria for clam models within LANDO: if a) there are too many age reversals within the models, or b) the fit reaches infinity. Under specific circumstances, some sediment cores will not have a fitting model, as is the case, for instance, in the "Inconsistent sequence" case 670 study (CS2). Including models that do not fit the data would lead to erroneous estimations of the age-depth correlation. This comes with the cost of losing an established model in the combined model, if no fitting clam model is available. However, we think that the benefit of having a more fitting model outweighs this cost.
Although Undatable is open source and the fastest modeling system within LANDO, its original development environment (MATLAB) is not free of charge. That is why we implemented Undatable in the open source 675 MATLAB-equivalent Octave. Since the Octave version of Undatable was slower than the original MATLAB version, we used the parallelization package "parallel" (Fujiwara et al., 2021) to provide comparable results in terms of computing time. To use Undatable with MATLAB within our application, users must acquire a license of MATLAB and link the MATLAB kernel to their license. Unfortunately, we do not have the capacity to provide individual licenses with LANDO. For users with an active MATLAB license, we provide in the repository 680 mentioned in the "Code and data availability" section the appropriate code to run the MATLAB version of Undatable in LANDO.
A potential expansion option of LANDO within the multi-language environment is to extent the application and allow future data analysis to use powerful tools, such as Python's machine learning libraries, e.g., keras (Chollet and and others, 2015) and tensorflow (Abadi et al., 2016). We anticipate that other developers can use LANDO as 685 their starting point in building larger limnological data analysis application.

Conclusion
This paper introduced our application LANDO -a linked age-depth modeling notebook approach. We presented an improved age-depth modeling procedure for sediment cores from high-latitude lake systems by linking five established systems: Bacon, Bchron, clam, hamstr, and Undatable. The added value of our application is the reduced effort to use established modeling systems in a single Jupyter Notebook for both single and multiple dating series and at the same time make the results comparable. In addition, we introduced an ensemble model that uses the output from all models to create a more robust age-depth correlation. In the case of scattered age determination data, we further implemented an adapted version of the fuzzy change point approach that allow users to integrate independent proxy data as indicator of lithological changes. This option helps evaluate the performance of 695 modeling systems across lithological boundaries while providing a more reliable ensemble age-depth model by filtering inappropriate model runs for problematic datasets. Our application also allows users to run large datasets with multiple sediment cores in parallel to reduce the overall computation time. In our data collection of 62 sediment cores from northern lake systems at high latitudes, we found that the main regime changes in sedimentation rates do not occur synchronously for all lakes at the Pleistocene-Holocene boundary. However, we 700 linked this behavior to the uncertainty within the modeling process as well as the local variability of the sediment cores within the collection.