Miniature radiocarbon measurements (< 150 μg C) from sediments of Lake Żabińskie, Poland: effect of precision and dating density on age–depth models

The recent development of the MIni CArbon DAting System (MICADAS) allows researchers to obtain radiocarbon (14C) ages from a variety of samples with miniature amounts of carbon (< 150 μg C) by using a gas ion source input that bypasses the graphitization step used for conventional 14C dating with accelerator mass spectrometry (AMS). The ability to measure smaller samples, at reduced cost compared with graphitized samples, allows for greater dating density of sediments with low macrofossil concentrations. In this study, we use a section of varved sediments from Lake Żabińskie, NE Poland, as a case study to assess the usefulness of miniature samples from terrestrial plant macrofossils for dating lake sediments. Radiocarbon samples analyzed using gas-source techniques were measured from the same depths as larger graphitized samples to compare the reliability and precision of the two techniques directly. We find that the analytical precision of gas-source measurements decreases as sample mass decreases but is comparable with graphitized samples of a similar size (approximately 150 μg C). For samples larger than 40 μg C and younger than 6000 BP, the uncalibrated 1σ age uncertainty is consistently less than 150 years (±0.010 F14C). The reliability of 14C ages from both techniques is assessed via comparison with a best-age estimate for the sediment sequence, which is the result of an OxCal V sequence that integrates varve counts with 14C ages. No bias is evident in the ages produced by either gas-source input or graphitization. None of the 14C ages in our dataset are clear outliers; the 95 % confidence intervals of all 48 calibrated 14C ages overlap with the median best-age estimate. The effects of sample mass (which defines the expected analytical age uncertainty) and dating density on age–depth models are evaluated via simulated sets of 14C ages that are used as inputs for OxCal P-sequence age–depth models. Nine different sampling scenarios were simulated in which the mass of 14C samples and the number of samples were manipulated. The simulated age–depth models suggest that the lower analytical precision associated with miniature samples can be compensated for by increased dating density. The data presented in this paper can improve sampling strategies and can inform expectations of age uncertainty from miniature radiocarbon samples as well as age–depth model outcomes for lacustrine sediments.


Introduction
Radiocarbon ( 14 C) dating is the most widely used technique to date sedimentary sequences that are less than 50,000 years old.
The robustness of age-depth models can be limited by the availability of suitable material for dating; this is particularly a problem for studies on sediments from alpine, polar, or arid regions where terrestrial biomass is scarce. Most accelerator mass 40 spectrometry (AMS) labs recommend that samples contain 1 mg or more of carbon for reliable 14 C age estimations. It is well established that terrestrial plant macrofossils are the preferred material type for dating lake sediments because bulk sediments or macrofossils may have an aquatic source of carbon, which can bias 14 C ages (Groot et al., 2014;MacDonald et al., 1991;Tornqvist et al., 1992;Barnekow et al., 1998;Grimm et al., 2009). Furthermore, a high density of 14 C ages (i.e. one age per 500 years) is recommended to reduce the overall chronologic uncertainty of age-depth models (Blaauw et al., 2018). 45 Researchers working on sediments with low abundances of terrestrial plant macrofossils face difficult choices about whether to date sub-optimal materials (e.g. bulk sediment or aquatic macrofossils), pool material from wide sample intervals, or rely on few ages for their chronologies. The problem of insufficient material can affect age estimates at all scales from an entire sedimentary sequence to a specific event layer which a researcher wishes to determine the age of as precisely as possible.

50
Recent advances have reduced the required sample mass for AMS 14 C analysis, opening new opportunities for researchers (Delqué-Količ et al., 2013;Freeman et al., 2016;Santos et al., 2007;Shah Walter et al., 2015). a gaseous form, thus omitting sample graphitization (Ruff et al., 2007(Ruff et al., , 2010bSynal et al., 2007;Szidat et al., 2014;Wacker et al., 2010aWacker et al., , 2013. Samples containing as little as a few μg C can be dated using the gas-source input of the 55 MICADAS. The analysis of such small samples provides several potential benefits for dating lake sediments: 1) the possibility to date sediments that were previously not dateable using 14 C due to insufficient material, 2) the ability to date sedimentary profiles with a greater sampling density and lower costs per sample, and 3) the ability to be more selective when choosing what material will be analyzed for 14 C. The disadvantage of miniature samples is increased analytical uncertainty, which is caused by lower counts of carbon isotopes and the greater impact of contamination on the measurement results. The goal of 60 this study is to assess the potential benefits and limits of applying miniature 14 C measurements to dating lake sediments. We aim to answer the following questions in this study: 1) How reliable and how precise are gas-source 14 C ages compared with graphitized ages? 2) What is the variability of 14 C ages obtained from a single stratigraphic level? 3) How do analytical precision and dating density affect the accuracy and precision of age-depth models for lake sediments?

65
In this study, we use the sediments of Lake Żabińskie Poland, as a case study to investigate the application of gas-source 14 C measurements to lake sediments. We focus on a continuously varved segment of the core, which spans from roughly 2.1 to 6.8 ka. We report the results of 48 radiocarbon measurements (17 using graphitization and 31 using the gas-source input) in order to compare the precision and reliability of gas-source 14 C ages with graphitized samples. The core was sampled such that up to five ages were obtained from 14 distinct stratigraphic depths. A floating varve chronology was integrated with the 14 C ages 70 to produce a best-age estimate using the OxCal V-sequence routine (Bronk Ramsey, 2008). This best-age estimate is used as a benchmark for the 14 C results. The results of our 14 C measurements were used to constrain a statistical model designed to simulate sets of 14 C ages in order to test nine different hypothetical sampling scenarios in which we manipulate the number of ages and the mass of C per sample, which determines the analytical uncertainty of the simulated ages. By comparing the results of the simulated age-depth model outputs from these simulated 14 C ages with the best-age estimate from which the simulated 75 ages were derived, we can improve our understanding of how the number of ages and their analytical precision influence the accuracy and precision of radiocarbon-based age-depth models.  (Bonk et al., 2015;Żarczyński et al., 2018). Published downcore varve counts stop above a ~90-cm-thick slump/deformed unit that is dated to 1962-2071cal yr BP 85 (present = 1950. This study focuses on a section of core (7.3-13.1 m depth in our composite sequence) directly below this slump unit, which was selected because it features well-preserved varves continuously throughout the section.
Samples of 1-to 2-cm-thick slices of sediment were taken from the core (sample locations and core images are found in Supplementary File 1), then sieved with a 100 μm sieve. Macrofossil remains were identified and photographed 90 (Supplementary File 2), and only identifiable terrestrial plant material was selected for 14 C measurements. Suitable macrofossils from a single stratigraphic level were divided into subsamples for analysis, with the goal of producing one graphitized 14 C age and 2-4 gas-source ages from each depth. When convenient, we grouped samples by the type of material (leaves, periderm, needles, seeds or woody scales), though 11 samples are a mixture of material types. In most cases, subsamples within a stratigraphic level are assumed to be independent, meaning they may have different true ages. However, 95 there are some subsamples that were taken from single macrofossil fragments (six subsamples taken from two fragments sampled from two different depths), thus these samples have the same true age. It is also possible that subsamples from a single depth may be from the same original material without our knowledge (i.e. a macrofossil could break into several pieces while sieving, and these pieces could be analyzed as separate subsamples).

100
Sample material was treated with an acid-base-acid (ABA) method at 40°C, using 0.5 mol/L HCl, 0.1 mol/L NaOH and 0.5 mol/L HCl for 3 h, 2 h and 3 h, respectively. After drying at room temperature, samples were weighed, and those less than 300 μg were input to the gas ion source via combustion in an Elementar Vario EL Cube elemental analyser (Salazar et al., 2015).
Radiocarbon data was processed using the software BATS . Additional corrections were applied to the 105 https://doi.org/10.5194/gchron-2019-19 Preprint. Discussion started: 17 December 2019 c Author(s) 2019. CC BY 4.0 License. data to account for cross contamination (carryover), and constant contamination (blanks) (Gottschalk et al., 2018;Salazar et al., 2015). The parameters for these corrections were calculated based on standard materials (the primary NIST standard oxalic acid II (SRM 4990C) and sodium acetate (Sigma-Aldrich, No. 71180) as 14 C-free material) run with the sample batches. We applied a constant contamination correction of 1.5 ± 0.2 µg C with 0.72 ± 0.11 F 14 C and a cross contamination correction of (1.2 ± 0.3 %) from the previously run sample. Radiocarbon age uncertainties were fully propagated for each correction. In 110 total, 48 ages were obtained from 14 distinct stratigraphic levels (17 graphitized and 31 gas-source measurements).

Varve count
Varves in Lake Żabińskie are biogenic, with calcite-rich laminae deposited in spring and summer, and darker laminae containing organic detritus and fine clastic material deposited in winter (Żarczyński et al., 2018). We defined the boundary of each varve year by the onset of calcite precipitation (i.e., the upper boundary of dark laminae and lower boundary of light-115 colored laminae). Varves were counted using CooRecorder software (Larsson, 2003) on core images obtained from a Specim PFD-CL-65-V10E linescan camera (Butz et al., 2015). Three people performed independent varve counts, and these three counts were synthesized, and uncertainties calculated according to the methodology recommended by Żarczyński et al. (2018).
Because of the slump deposit above our section of interest, the varve chronology is 'floating' and must be constrained by the 120 14 C ages. Several different approaches could be used to compare the varve count with the 14 C ages, all of which rely on some assumptions. One method would be to select a dated level within the core and tie the varve count to the age at this level. Such an approach assumes that the radiocarbon-based age at the tie point is correct. Instead, we used the OxCal V-sequence to integrate all available chronological information including varve counting and 14 C ages into a single model to determine a bestage estimate for the sequence. The advantages of this approach are that all ages are considered equally likely to be correct, and 125 the error estimate of the V-sequence is relatively consistent along the profile, whereas the error associated with the varve count is small at the top of the section, but increases downcore.

Age-depth modeling
Age-depth modeling was performed using OxCal 4.3, which integrates the IntCal13 calibration curve for 14 C ages with statistical models that can be used to construct age-depth sequences (Bronk Ramsey, 2008, 2009Bronk Ramsey and Lee, 130 2013;Reimer et al., 2013). As an initial test to compare the reliability of gas-source ages and graphitized ages, and their effect on age-depth models, we produced three P-sequence models: one using all obtained 14 C ages, one using only graphitized ages, and one using only gas-source ages. For all OxCal models in this study, ages measured from the same depth were combined (using the function R_combine) into a single 14 C age with uncertainty before calibration and integration into the age-depth sequence. The OxCal P-sequence uses a Bayesian approach for modelling sediment deposition in which a parameter (k) 135 determines the extent to which sedimentation rates are allowed to vary. For all P-sequence models in this study, we used a uniformly distributed prior for k such that k0 = 1, and log10(k/k0) ~ U(−2, 2); this allows k to vary between 0.01 and 100.
The varve counts and all 14 C ages were incorporated into an OxCal V-sequence in an approach similar to that used by Rey et al. (2019). We input the number of varves in 10 cm intervals to the V-sequence as an age 'Gap' with associated uncertainty. 140 The OxCal V-sequence assumes normally distributed uncertainties for each gap, whereas our varve count method produces asymmetric uncertainty estimates. We used the mean of the positive and negative uncertainties as input to the V-sequence.
However, OxCal sets the minimum uncertainty of each 'Gap' equal to 5 years, which in most cases is larger than the mean uncertainty in our varve count over a 10 cm interval. The V-sequence combines the varve information with the 14 C ages (both graphitized and gas-source ages) to produce a more precise age-depth sequence. 145

Age-depth model simulation
In order to test the effects of analytical uncertainty and dating density (number of ages per time interval) on age-depth models, we designed an experiment in which nine different sampling scenarios were simulated for the Lake Żabińskie sedimentary sequence to determine the expected precision and accuracy of resulting age-depth models. Three different sampling densities were simulated for the 5.8-m-long section: 5 ages, 10 ages, and 20 ages (equivalent to approximately 1, 2, and 4 ages per 150 millennium, respectively). For each of these sampling densities three different sample-size scenarios were simulated: 35 μg C, 90 μg C, 500 μg C. These scenarios were designed to represent different sampling circumstances such as high or low abundances of suitable material for 14 C analysis, and different budgets for 14 C analysis. Radiocarbon ages were simulated using a technique similar to Trachsel and Telford (2017). In brief, we distributed the simulated samples evenly by depth across the 5.8-m-long section, and then used the median output of the OxCal V-sequence as the assumed true age for a given depth. This 155 calibrated assumed true age was back-converted to 14 C years using IntCal13 (Reimer et al., 2013). A random error term was added to the 14 C age to simulate the analytical uncertainty. The error term was drawn from a normal distribution with mean zero and standard deviation equivalent to the based on the relationship between sample mass and precision found in the results of our 14 C measurements (Fig. 1). The same expected analytical uncertainty was used for the age uncertainty for each simulated age. These simulated 14 C ages were input into an OxCal P-sequence using the same uniform distribution for the k-parameter as described in the previous section. This experiment was repeated 30 times for each scenario to assess the variability of possible age-model outcomes. We quantify the accuracy of the age-depth models as the deviation of the median modelled age from the true age at a given depth. We define precision as the width of the age-depth model confidence interval (CI).

Radiocarbon measurements 165
In total, 48 radiocarbon measurements on terrestrial plant macrofossils were obtained from the section of interest resulting in a range of ages from 2028 to 5988 14 C years (Table 1). Thirty-one ages were measured using the gas-source input; these samples contained between 11 and 168 μg C. Seventeen samples containing between 115 and 691 μg C were measured using graphitization. Analytical uncertainties for the 14 C measurements range from ± 41 to ± 328 14 C years with higher values associated with the smallest sample masses. The uncertainties for gas-source measurements and graphitized measurements are 170 comparable for samples that contain a similar amount of carbon ( Figure 1). Based on an assumed Poisson distribution of the counting statistics, one would expect age uncertainty to decrease as sample masses become larger following the relationship N -0.5 , where N is the number of the measured 14 C atoms in the sample. This relationship fits our data well for larger samples, however, as the mass of C is reduced, the uncertainty becomes greater than predicted by this relationship due to corrections applied for cross-contamination and constant contamination (see Sect. 2.1; Gottschalk et al., 2018;Salazar et al., 2015), which 175 have a greater effect on smaller samples. Samples containing less than 40 μg C (roughly equivalent to 80 μg of dry plant material) produce uncertainties greater than ± 150 years (1σ). We use a power-model fit with least-squares regression, to estimate the typical age uncertainty for a given sample mass (r 2 = 0.90, p < 0.001, Fig. 1).
When comparing measurements taken from within a single sediment slice we find good agreement for all 14 C ages, regardless 180 of whether the samples were analyzed with the gas-source input or via a graphitized target, and no clear bias based on the type of macrofossil that was dated ( Figure 2, Figure 3). One method to test whether the scatter of ages is consistent with the expectations of the analytical uncertainty is a reduced chi-squared statistical test, also known as Mean Square Weighted Deviation (MSWD) in geochronological studies (Reiners et al., 2017). If the spread of ages is exactly what would be expected from the analytical uncertainty, the value of this statistic is 1. Lower values represent less scatter than expected, and larger 185 values represent more scatter than expected. Of the 11 sampled depths with three or more ages, only one grouping of ages (811 https://doi.org/10.5194/gchron-2019-19 Preprint. Discussion started: 17 December 2019 c Author(s) 2019. CC BY 4.0 License. cm, MSWD = 3.07) returned an MSWD that exceeds a 95% significance threshold for acceptable MSWD values that are consistent with the assumption that the age scatter is purely the result of analytical uncertainty.

Varve count and age-depth modeling
In total, 4644 (+155/-176) varves were counted in the section of interest, with a mean varve thickness of 1.26 ± 0.58 mm (full 190 varve count results are available at https://dx.doi.org/10.7892/boris.134606). Sedimentation rates averaged over 10 cm intervals range from 0.91 to 2.78 mm/year. All chronological data ( 14 C ages and varve counts) were integrated to generate a best-age estimate for the section of interest using an OxCal V-sequence (output of the Oxcal V-sequence is available at https://dx.doi.org/10.7892/boris.134606). This produced a well-constrained age-depth model with a 95% confidence interval (CI) width that ranges from 69 to 114 years (mean 86 years). OxCal uses an agreement index to assess how well the posterior 195 distributions produced by the model (modelled ages at the depth of 14 C ages) agree with the prior distributions (calibrated 14 C ages). The agreement index for our OxCal V-sequence is 66.8%, which is greater than the acceptable index of 60%. Three of the fourteen dated levels in the V-sequence had agreement indices less than the acceptable value of 60% (A = 22.8, 48.5, 52.6% for sample depths = 1283.0, 1176.1, 732.5 cm, respectively), nonetheless we find the model fit acceptable as all 48 14 C ages overlap with the median output of the V-sequence. We use the V-sequence as a best-age estimate for subsequent data 200 comparisons and analyses.
To test the reliability of gas-source ages versus graphitized ages we created three OxCal P-sequences using: 1) all 14 C ages, 2) only graphitized ages, and 3) only gas-source ages. The results of all three of these age-depth models agree well with the bestage estimate of the V-sequence, although with larger 95% CIs ( Figure 2). The agreement index was greater than the acceptable 205 value of 60 for all three models overall, and for each dated depth within all three models. The P-sequence using all 14 C ages spans 4838 ± 235 years, which is slightly greater than, but overlapping with, the total number of varves counted (the Vsequence estimates 4681 ± 79 years in the section). There is no clear bias observed in the age-depth models produced using either the gas-source or graphitized samples. The age-depth model outputs clearly show that a very precise age can narrowly constrain the age-model uncertainty at the depth of that sample, however, if dating density is low, the uncertainty related to 210 interpolation between ages becomes large. Despite the lower precision of the gas-source ages, the model based on only gassource ages actually has a lower mean CI width than the model with graphitized ages (mean 95% CI width: 373 years for the gas-source model, 438 years for the graphitized model). However, the direct comparison between the gas-source-only and the graphitized-only age models is confounded by differences in the number and spacing of samples. Specifically, there are no graphitized ages between the top of the section (724 cm) and 811 cm, and between 1082 and 1200 cm, which results in wide 215 CI in these sections. On the other hand, uncertainty is reduced compared to the gas-source model in the depths adjacent to the graphitized ages due to higher precision such that 40% of the section (in terms of depth) has lower age uncertainty in the graphitized model.

Age-depth model simulations
Nine different sampling scenarios (described in Sect. 2.3) were simulated to test the effects of dating density and analytical 220 precision on age-depth model confidence intervals. For each of the nine scenarios, sets of 14 C ages were simulated 30 times to create an ensemble of age-depth models for each scenario. One set of these simulated age-depth models is shown in Figure 4, and an animation of the full set of simulated models is available online (Supplementary File 3). The age-depth models were evaluated for their precision (mean width of the 95% CI) and accuracy (the mean absolute deviation from the best-age estimate; summarized in Figure 5 and Table 2). As expected, we find that increased dating density and increased sample masses improve 225 both the accuracy and precision of the age-depth models. It is notable that increasing the number of ages can compensate for the greater uncertainty associated with smaller sample sizes. For instance, the mean CI of age-depth models based on ten, 90 μg C samples is narrower than age-depth models with five, 500 μg C samples (Table 2). However, the effect of analytical precision is greater on the mean absolute deviation from the best-age estimate. Increased dating density does tend to reduce the deviation from the best-age estimate (especially if the ages are imprecise), but the three scenarios that use 500 μg samples 230 perform better than all other scenarios, as applied to our study site, in terms of deviation from the best-age estimate, regardless of the sampling density. Additionally, increased dating density does not improve the deviation from the best-age estimate for the 500 μg sample scenarios. This result may be due to the relatively constant sedimentation rates in our sedimentary sequence, which reduces errors caused by interpolation in scenarios with low dating density. Another prominent pattern visible in the simulations is the large spread of performance for models with relatively few and imprecise ages ( Figure 5). Increasing the 235 number of samples and, especially, the mass of samples has a large impact on the agreement among the model ensembles.
An additional measure of age-model quality is the Chron Score rating system (Sundqvist et al., 2014), which uses three criteria to assess the reliability of age-depth models: 1) delineation of downcore trend (D), 2) quality of dated materials (Q), and 3) precision of calibrated ages (P). These metrics are combined using a reproducible formula to provide a Chron Score (G) in 240 which higher values represent more reliable age-depth models:

G = -wDD + wQQ + wPP
We used the default weighting parameters (wD, wQ, and wP = 0.001, 1 and 200) for each component of the Chron Score 245 formula as described in Sundqvist et al. (2014). The Q parameter depends on two factorsthe proportion of ages which are not rejected or reversed (i.e. an older age stratigraphically above a younger age), and a qualitative classification scheme for material types. We modified the threshold for determing if an age is considered a reversal such that if a 14 C age is older than a stratigraphically higher age by more than the age uncertainty (1σ), the age is considered to be stratigraphically reversed. This is different from the default setting, which is 100 years. For the material type classification (m), the simulated age models were 250 assigned the value 4, which is the value assigned to chronologies based on terrestrial macrofossils. For more details on the Chron Score calculation see Sundqvist et al. (2014). The mean Chron Scores for the simulated age models (Table 2) show that doubling dating density substantially improves the Chron Score, but the effect is greater when moving from 5 to 10 ages than from 10 to 20 ages. The effect of increased precision on the Chron Score is also substantial; it is essentially defined by the Chron Score formula, in which precision is assessed as P = s -1 where s is the mean 95% range of all calibrated 14 C ages. The 255 effect of precision on the Chron Score is also determined by the weighting factors mentioned above.

Radiocarbon measurements
The results of our 14 C measurements from repeated sampling of single stratigraphic levels provide useful information for other researchers working with miniature 14 C analyses, or any 14 C samples from lake sediments. We show that there is an exponential 260 relationship between sample mass and the resulting analytical uncertainty. The exact parameters of this relationship will depend on several factors that are not considered here, such as the laboratory conditions, and the age of the material (Gottschalk et al., 2018), however, the general shape of the relationship should hold. These data can inform researchers about the expected range of uncertainty for 14 C ages from samples of a given size. We find that samples larger than 40 μg C yield ages that are precise enough to be useful for dating Holocene lake sediments in most applications, and even smaller samples can provide 265 useful ages if no other material is available.
It is well documented that 14 C ages can be susceptible to sources of error that are not included within the analytical uncertainty of the measurements. Such errors can be due to lab contamination, sample material which is subject to reservoir effects (i.e. bulk sediments or aquatic organic matter; Groot et al., 2014;MacDonald et al., 1991;Tornqvist et al., 1992), or from 270 depositional lags (terrestrial organic material which is older than the sediments surrounding it; Bonk et al., 2015;Howarth et al., 2013;Krawiec et al., 2013). Errors related to reservoir effects can be avoided by selecting only terrestrial plant material for dating (Oswald et al., 2005). Dating fragile material such as leaves (as opposed to wood) may reduce the chances of dating reworked material with a depositional lag, but generally this source of error is challenging to predict and dependent on the characteristics of each lake's depositional system. To identify ages affected by depositional lags, it is necessary to compare 275 with other age information. Consequently, the identification of outlying ages is facilitated by increased dating density.
In our dataset, multiple 14C measurements were performed on material taken from a single layer, which enables outlier detection. We find that the scatter of 14 C ages obtained from the same depths is generally consistent with what would be expected based on the analytical uncertainties of the ages. There are no clear outliers in the data, and every single 14 C age has 280 a calibrated 95% CI that overlaps with the median of our best-age estimate OxCal V-sequence. This can be explained in part by the fact that the V-sequence is fit to the 14 C ages, but it is also evidence that no age in this dataset is incongruent with the other available chronological information (other 14 C ages and varve counts). This notion is further demonstrated by the fact that 10 of 11 sampled levels from which we obtained three or more ages returned an MSWD within the 95% confidence threshold for testing age scatter (see Sect. 3.1; Reiners et al., 2017). This test is typically used for repeated measurements on 285 the same sample material, however, in our study, many of the measurements from within a single sediment slice are from material that has different true ages. The MSWD test indicates that the variability in ages among samples from within a single sediment slice can reasonably be expected given the analytical uncertainty. However, in this study, no more than five samples were measured per depth, and thus the range of acceptable values for the MSWD is relatively wide due to the small number of degrees of freedom. Additionally, the analytical uncertainties are relatively large for the gas-source samples, allowing for 290 wide scatter in the data without exceeding the MSWD critical value. Despite these caveats, the consistency between the variability among ages from one level and the analytical uncertainties allows us to make two important conclusions. 1) The analytical precision estimates are reasonable, even for miniature gas-source samples. 2) When material is carefully selected and taxonomically identified for dating, the sources of error that are not considered in the analytical uncertainty (e.g. contamination or depositional lags) are relatively minor in our case study. However, this second conclusion is highly dependent 295 on the sediment transport and depositional processes, which are site specific. Depositional lags still likely have some impact on our chronology. Six 14 C ages from plant material collected from the Lake Żabińskie catchment in 2015 yielded a range of ages from 1978-2014 CE (Bonk et al., 2015) suggesting that the assumption that 14 C ages represent the age of the sediments surrounding macrofossils is often invalid. The scale of these age offsets is likely on the scale of a few decades for Lake Żabińskie sediments, which is inconsequential for many radiocarbon-based chronologies, but is the same order of magnitude 300 as the uncertainty of our best-age estimate from the OxCal V-sequence, and should be considered when reporting or interpreting radiocarbon-based age determinations with very high precision.
The lack of outliers in our dataset is an apparent contrast with the findings of Bonk et al. (2015), who report that 17 of 32 radiocarbon samples taken from the uppermost 1000 years of the Lake Żabińskie core were outliers. The outlying ages were 305 older than expected based on the varve chronology, and this offset was attributed to reworking of terrestrial plant material. The identification of outliers did not take into account uncertainties of the radiocarbon calibration curve and varve counts, which could explain some of the differences between the 14 C and the varve ages. Still, 8 of 32 ages reported by Bonk et al. (2015) have calibrated 2σ age ranges that do not overlap with varve count age (including the varve count uncertainty). The higher outlier frequency in the Bonk et al. (2015) data might be explained by their generally more precise ages and the fact that their 310 varve count is truly independent from the 14 C ages.
Additionally, our dataset allows us to compare the results of 14 C ages obtained from different types of macrofossil materials, which we grouped into the following categories: leaves (including associated twigs), needles, seeds, periderm, woody scales, and samples containing mixed material types (Figure 3). When comparing the calibrated median age of each sample to the 315 median of our best-age estimate, we find that the difference between the age offsets of the different material types is not significant at the α = 0.05 level (ANOVA, F = 2.127, p = 0.08). This is likely due to our selective screening of sample material, which only includes terrestrial plant material while avoiding aquatic insect remains or possible aquatic plant material, as well as the relatively small number of samples within each material type. There does appear to be a tendency for seeds to produce younger ages, and two of the three woody scale samples yielded ages that are approximately 300 years older than the best-age 320 estimate. This could be due to the superior durability of woody materials compared with other macrofossil materials, which enables wood to be stored on the landscape prior to being deposited in the lake sediments. A larger number of samples would allow for more robust conclusions about the likelihood of certain material types to produce biased ages.

The OxCal V-sequence best-age estimate
Prominent varves in the sediments of Lake Żabińskie provide additional chronological information that we use to inform our 325 assessment of 14 C ages. This approach to integrating varve counts with 14 C ages can provide more precise and more reliable https://doi.org/10.5194/gchron-2019-19 Preprint. Discussion started: 17 December 2019 c Author(s) 2019. CC BY 4.0 License.
age estimates than either technique alone. The resulting age-depth relation has a relatively narrow CI (mean 95% CI is 86 yr).
Extremely precise age estimates were also produced using this method for Moossee, Switzerland by Rey et al. (2019). A combination of varve counts and 14 C ages from the Moossee sediments generated a V-sequence output with a mean 95% CI of 38 years. The higher precision in the Moossee study compared to our V-sequence output is primarily attributed to the higher 330 dating density in Moosse with 27 radiocarbon ages over ~3000 years (3.9-7.1 ka) versus our study, which used 48 ages, but from only 14 unique depths, over ~4700 years. This comparison shows that repeated measurements from the same depth are less useful than analyses from additional depths. This approach to integrating varve counts and 14 C ages could potentially be improved by a better integration of varve count uncertainties into the OxCal program. Currently the uncertainties on age 'Gaps' in OxCal are assumed to be normally distributed and cannot be less than 5 years. Nevertheless, the result of the OxCal V-335 sequence is an age-depth model that is much more precise than those constructed only using 14 C ages and provides a useful reference to compare with the 14 C ages. It is important to note that the best-age estimate is not independent of the 14 C ages; it is directly informed by the 14 C ages.

Age-depth model simulations
The simulated age-depth modelling experiment allows us to assess the effects of dating density and sample mass (expected 340 precision) on the outputs of age-depth models constructed for the section of interest in the Lake Żabińskie sediment core.
Models based on relatively few, but very precise ages, are tightly constrained at the sample depths, but the CI widens further away from these depths (Figure 4, Supplementary File 3). In contrast, models based on a greater sampling density produce confidence intervals with relatively constant width. If models are built using a high density of imprecise ages, the CI of the model output can actually be narrower than the CI of the individual ages. Bayesian age-depth models in particular can take 345 advantage of the stratigraphic order of samples to constrain age-depth models to be more precise than the individual ages that make up the model (Blaauw et al., 2018), however this is only achievable when dating density is high enough. The results from this experiment suggest that, in the case of the Lake Żabińskie sequence, doubling the number of ages can approximately compensate for an increased analytical uncertainty of 50 years.

350
The Chron Score results provide a succinct summary of the reliability of the chronologies produced in the different simulated sampling scenarios. The Chron Score becomes more sensitive to changes in precision as precision increases, so the difference in the Chron Scores between the 500 μg and 90 μg scenarios (1σ uncertainty of ± 39 and 92 years, respectively) is greater than the difference between the 90 μg and 35 μg scenarios (1σ uncertainty of ± 92 and 148 years, respectively). Increased dating https://doi.org/10.5194/gchron-2019-19 Preprint. Discussion started: 17 December 2019 c Author(s) 2019. CC BY 4.0 License. density consistently improves the Chron Score results, with a stronger impact seen when shifting from 5 to 10 ages compared 355 to shifting from 10 to 20 ages. The improvement of the Chron Score due to increased dating density is generally consistent for each of the different sample mass scenarios This differs from the age-depth model statistics where increased dating density has a greater impact on mean age-depth model precision in the larger sample mass scenarios (more precise ages). The opposite effect is seen in the mean absolute deviation results, where mean absolute deviation is reduced substantially as dating density increases for the smaller sample scenarios, and not at all for the 500 μg scenario. For all measures of chronologic performance, 360 we find a greater improvement when increasing the number of ages from 5 to 10 ages compared to increasing from 10 to 20 ages, suggesting there are some diminishing returns from increased dating density. This result is in accordance with the results of Blaauw et al. (2018). While the Chron Score results are strictly dependent on the parameters chosen for the calculation, they intuitively make sense. Because Chron Score results use only the simulated 14C ages as input and are unaffected by the age modelling routine, the patterns exhibited in the scores may be more applicable to a variety of sedimentary records. 365 In real-world applications, there are additional advantages from increasing dating density. Many lacustrine sequences have greater variability in sedimentation rates than the sequence modelled here. More fluctuations in sedimentation rate require a greater number of ages to delineate the changes in sedimentation. Additionally, outlying ages and age scatter beyond analytical uncertainty are not considered in this modelling experiment. In most cases, detecting outlying ages becomes easier as dating 370 density increases. Because this experiment is only applied to a single sedimentary sequence, the results may not be directly applicable for other sedimentary records with different depositional conditions. In the future, this type of age model simulation could be applied to a range of sedimentary sequences with a variety of depositional conditions.

Recommendations for radiocarbon sampling strategy
Radiocarbon sampling strategies will always be highly dependent on project-specific considerations such as how the 375 chronology will affect the scientific goals of the project, budget and labor constraints, the nature of the sedimentary record in question, and the availability of suitable materials. A goal of this study is to provide data that can inform sampling strategies for building robust chronologies, particularly in cases where suitable material may be limited. Firstly, an iterative approach to 14 C measurements is preferred. An initial batch of measurements should target a low dating density of perhaps one date per 2000 years. Subsequent samples should aim to fill in gaps where age uncertainty remains highest (Blaauw et al., 2018), or 380 where preliminary age-depth trends appear to be non-linear. In accordance with many previous studies (e.g. Howarth et al., 2013;Oswald et al., 2005), we advocate for careful selection of material identified as terrestrial in origin. If the mass of such material is limited, the MICADAS gas-source is useful for dating miniature samples, and we are convinced that miniature samples of terrestrial material are preferable to dating questionable material or bulk sediments. Samples as small as a few μg C can be measured using the MICADAS, though samples larger than 40 μg C are recommended for more precise results 385 (Holocene samples with 40 μg C are expected to have analytical uncertainty of ~138 years). Dating small amounts of material from single depths is also preferable to pooling material from depth segments that may represent long time intervals. A general rule of thumb is to avoid taking samples with depth intervals representing more time than the expected uncertainty of a 14 C age. To improve the accuracy of age-depth models, a higher priority should be placed on achieving sufficiently high dating density (ideally greater than one age per 500 years; Blaauw et al., 2018) using narrow sample-depth intervals. In most cases, 390 this goal should be prioritized over the goal of gathering larger sample masses in order to reduce analytical uncertainties.
Multiple measurements from within a single stratigraphic depth, as we have done in this study, can be useful in sediments where age scatter (possibly from reworked material) is expected. In such cases, multiple measurements from a single depth could allow for identification of certain types of material that should be avoided, and if age results do not agree well, the youngest age is most likely to be correct (assuming no contamination by modern carbon). If age scatter is not expected, single 395 measures of pooled macrofossils are more cost-effective than repeat measurements from a single depth. Although increased dating density does incur greater cost, gas-source ages have reduced costs substantially compared to graphitized ages allowing for greater dating density at similar cost. Analytical costs for gas-source analyses are laboratory specific but there is a substantial reduction in cost for gas-source ages compared to graphitized sample measurements. Use of smaller samples can reduce the labor time required to isolate suitable material from the sediment, however handling and cleaning miniature samples 400 can add additional challenges.

Conclusions
• AMS 14 C analysis of Holocene terrestrial plant macrofossils using the MICADAS gas-ion source produces unbiased ages with similar precision compared to graphitized samples that contain similar mass of carbon (approximately 120-160 µg C). 405 • The precision of a 14 C age can be approximately estimated based on the amount of carbon within a sample. Holocene samples containing greater than 40 μg C produce ages with analytical uncertainty expected to be less than 150 years. • The variability among ages obtained from 1-or 2-cm-thick samples in the Lake Żabińskie sediment core is compatible 410 with the variability expected due to analytical uncertainty alone.
• We find no clear evidence in our dataset for age bias based on the type of macrofossil material dated, which we limited to terrestrial plant material.
• Judging from the output of age-depth models, the lower precision of miniature gas-source ages can be compensated for by increasing sampling density. Based on sets of simulated 14 C ages that mimic the 14 C ages of our study core, together 415 with age-depth models generated using OxCal, doubling dating density roughly compensates for a decrease in analytical precision of 50 years.
• The effect of 14 C age precision is among several factors that influence chronological precision. The thickness of the depth interval used to obtain samples, the ability to select identifiable terrestrial materials or to analyze more than one type of material, the reliability of detecting age outliers, and the amount of variability in sedimentation rate all determine the 420 accuracy and precision of an age-depth model, which are both improved by increasing the number of ages.
• This study can inform sampling strategies and provide expectations about radiocarbon-based age-depth model outcomes.

Data Availability
The key datasets associated with this manuscript (Varve count results and the best-age estimate OxCal V-sequence output) are  . From left to right: OxCal V-sequence using all 14 C ages as well as varve counts as inputs; OxCal P-sequence using all 540 14 C ages as inputs; OxCal P-sequence using only gas-source 14 C ages; OxCal P-sequence using only graphitized 14 C ages. The median age of the V-sequence is considered the best-age estimate and is repeated in all four panels as a red line. Gray lines represent the upper and lower limits of the 95% confidence interval of each model. Black lines represent the median ages of the P-sequences. B) Radiocarbon calibrated age probability density functions for each measured age, grouped by composite depth. The best-age estimates from the OxCal V-sequence are plotted as red lines for comparison. The = symbol adjacent to some probability density 545 functions indicates that these ages (within a single depth) came from the same specimen and have the same true age. Figure 3: Offsets between median calibrated 14 C ages and the best age estimate from the OxCal V-sequence. Data are grouped by material type. Higher values indicate that the sample age is older than the best-age estimate.  560 Table 1: Results of the 48 14 C analyses obtained for this study. Uncertainties of 14 C ages refer to 68% probabilities (1σ) whereas ranges of calibrated and modelled ages represent 95% probabilities. precision and reliability of OxCal P-sequence models generated from simulated 14 C ages. Each of the nine scenarios was simulated 30 times; presented values are the mean of the 30-member ensemble. Precision is assessed by the mean width of the age-depth model 565 95% confidence interval. Accuracy is measured by the mean absolute deviation from the OxCal V-sequence best-age estimate, which is the reference from which 14 C ages were simulated. Chron Score is a metric designed to assessing the reliability of age-depth models where higher numbers represent greater reliability (