--- title: "CharAnalysis: Diagnostic and analytical tools for peak detection and fire-history interpretations using high-resolution sediment-charcoal records" author: "Philip Higuera" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 fig_width: 7 fig_height: 5 vignette: > %\VignetteIndexEntry{CharAnalysis: Diagnostic and analytical tools for peak detection and fire-history interpretations using high-resolution sediment-charcoal records} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", warning = FALSE, message = FALSE ) ``` ## Overview *CharAnalysis* reconstructs local fire histories from lake-sediment charcoal records. Charcoal preserved in lake sediments is a direct proxy for past fire activity: individual fire events near a lake deposit pulses of charcoal that appear as peaks above a slowly varying background signal. *CharAnalysis* formalises this peak-detection logic as a reproducible, quantitative workflow. This R package (v2.0.0) is a direct translation of *CharAnalysis* v2.0 (MATLAB), validated against reference outputs on four benchmark datasets spanning a range of record lengths, sampling resolutions, ecosystems, and analysis configurations. Analytical methods are described in Higuera et al. (2009). The full workflow proceeds in five steps: 1. **Pretreatment** — Resample the raw charcoal series to equal time steps and compute charcoal accumulation rate (CHAR, pieces cm^-2^ yr^-1^). 2. **Smoothing** — Estimate low-frequency trends (C~background~) using lowess or a moving-window statistic. 3. **Peak decomposition** — Compute the high-frequency residual C~peak~ = C~interp~ − C~background~ (or ratio C~interp~ / C~background~). 4. **Thresholding** — Define a noise threshold and flag C~peak~ values that exceed it as candidate fire events. Thresholds can be global (one distribution fitted to the full record) or local (sliding-window distribution). 5. **Screening and metrics** — Apply a minimum-count significance test, then compute fire-return intervals (FRIs), Weibull statistics, smoothed fire frequency, signal-to-noise index (SNI), and goodness-of-fit. --- ## Installation ```{r install, eval = FALSE} # Install from GitHub (requires devtools) devtools::install_github("phiguera/CharAnalysis", subdir = "CharAnalysis_2_0_R") # Suggested packages for figures install.packages(c("ggplot2", "patchwork", "ggtext")) ``` --- ## Worked example: Code Lake Code Lake (CO; Colorado, USA) is the primary validation dataset for *CharAnalysis*. The record spans approximately 7,300 calibrated years BP and is analysed here using a local Gaussian mixture model (GMM) threshold — the recommended default configuration. ### Input files *CharAnalysis* reads two CSV files: - **`CO_charData.csv`** — the charcoal data table (depth, age, volume, count). - **`CO_charParams.csv`** — the analysis parameter file. Both files are included in the package's `inst/extdata/` directory and can be located with `system.file()`: ```{r paths, eval = FALSE} params_file <- system.file("extdata", "CO_charParams.csv", package = "CharAnalysis") ``` For this vignette we reference the files directly from the repository: ```{r set-paths, eval = FALSE} params_file <- "path/to/CO_charParams.csv" # adjust to your local path ``` ### Running the full pipeline A single call to `CharAnalysis()` runs all five analytical steps and returns a named list of results. ```{r run, eval = FALSE} library(CharAnalysis) out <- CharAnalysis(params_file) ``` The progress messages show each step completing in sequence: ``` (1) Reading input file... ...done. (1b) Validating input parameters... (2) Pretreating charcoal data... ...done. (3) Smoothing resampled CHAR to estimate low-frequency trends and calculating peak CHAR... ...done. (4) Defining possible thresholds for peak identification... ...done. (5) Identifying peaks and applying minimum-count screening... ...done. (6) Post-processing: fire-return intervals, Weibull statistics... ...done. (7) Pipeline complete. Call char_write_results(...) to save output CSV. ``` ### Exploring the output `CharAnalysis()` returns a named list. The most commonly used elements are: ```{r output-structure, eval = FALSE} names(out) #> [1] "charcoal" "pretreatment" "smoothing" "peak_analysis" #> [5] "results" "site" "gap_in" "char_thresh" #> [9] "post" "char_results" ``` #### The `charcoal` object `out$charcoal` holds all time-series outputs, both raw and processed: ```{r charcoal, eval = FALSE} # Inspect the first few rows of key time series head(data.frame( age_BP = out$charcoal$ybpI, # interpolated age (yr BP) CHAR = out$charcoal$accI, # C_interpolated (pieces cm-2 yr-1) C_bkg = out$charcoal$accIS, # C_background C_peak = out$charcoal$peak, # C_peak (residuals) peaks = out$charcoal$charPeaks[, 4] # final-threshold peak flags (0/1) )) ``` #### The `char_thresh` object `out$char_thresh` holds threshold values, SNI, and goodness-of-fit results: ```{r thresh, eval = FALSE} # Threshold at the final percentile (column 4 = threshValues[4]) range(out$char_thresh$pos[, 4], na.rm = TRUE) # Signal-to-noise index (SNI): values > 3 indicate a strong signal summary(out$char_thresh$SNI) ``` #### Post-processing metrics `out$post` holds fire-history summary metrics: ```{r post, eval = FALSE} # Fire-return intervals (FRIs) and mean FRI cat("Number of FRIs:", length(out$post$FRI), "\n") cat("Mean FRI:", round(mean(out$post$FRI), 1), "yr\n") # Per-zone Weibull statistics (zone 1) fri_z1 <- out$post$FRI_params_zone[1, ] cat(sprintf( "Zone 1 — nFRI: %d mFRI: %.1f yr WBLb: %.1f WBLc: %.2f\n", fri_z1[1], fri_z1[2], fri_z1[5], fri_z1[8] )) ``` #### The `char_results` matrix `out$char_results` is the complete N × 33 output matrix, with columns matching the MATLAB `charResults` CSV exactly: ```{r char-results, eval = FALSE} dim(out$char_results) #> [1] 500 33 # Total number of fire events identified sum(out$charcoal$charPeaks[, 4], na.rm = TRUE) #> [1] 39 ``` --- ### Output figures *CharAnalysis* provides five publication-quality ggplot2 figures. All are produced by `char_plot_all()`, or individually by their dedicated functions. #### Figure 3 — C~interp~, C~background~, and C~peak~ ```{r fig3, eval = FALSE} char_plot_peaks(out) ``` The top panel shows resampled CHAR (black bars) with the smoothed C~background~ trend (grey line). The bottom panel shows C~peak~ with the positive and negative threshold lines (red), identified fire events (+ symbols), and peaks that failed the minimum-count screen (grey dots). #### Figure 5 — Cumulative peaks through time ```{r fig5, eval = FALSE} char_plot_cumulative(out) ``` The slope of the cumulative curve at any point reflects the instantaneous fire frequency. Changes in slope indicate periods of higher or lower fire activity. #### Figure 6 — FRI distributions by zone ```{r fig6, eval = FALSE} char_plot_fri(out) ``` Each panel shows a histogram of fire-return intervals within one stratigraphic zone (20-yr bins, normalised to proportions). A two-parameter Weibull probability density function is overlaid where the Kolmogorov-Smirnov goodness-of-fit test passes (p > 0.10 for n < 30; p > 0.05 for n ≥ 30). Weibull scale (b) and shape (c) parameters, mean FRI, and sample size are annotated. #### Figure 7 — Continuous fire history ```{r fig7, eval = FALSE} char_plot_fire_history(out) ``` Three stacked panels show (from top to bottom): peak magnitude (integrated C~peak~ above threshold per fire event, pieces cm^-2^ peak^-1^); fire-return intervals through time as filled squares with the smoothed mean FRI curve (black line) and bootstrapped 95% CI ribbon (grey); and lowess-smoothed fire frequency (fires per 1000 yr). #### Figure 8 — Between-zone CHAR comparisons ```{r fig8, eval = FALSE} char_plot_zones(out) ``` The left panel shows empirical cumulative distribution functions of raw CHAR within each zone, with pairwise two-sample Kolmogorov-Smirnov test p-values annotated. The right panel shows box plots (10th, 25th, 50th, 75th, 90th percentiles) of raw CHAR by zone. #### Saving all figures to PDF `out_dir` is a required argument when `save = TRUE`; the example below writes to a temporary directory so it can be run on any system, but you would normally substitute a path of your choosing (for example, `"Results"` inside your project folder). ```{r save-figs, eval = FALSE} char_plot_all(out, save = TRUE, out_dir = tempdir()) # Saves to tempdir(): # CO_03_CHAR_analysis.pdf # CO_05_cumulative_peaks.pdf # CO_06_FRI_distributions.pdf # CO_07_continuous_fire_hx.pdf # CO_08_zone_comparisons.pdf ``` > **Note:** Figures 9 (threshold sensitivity detail) and 10 (multi-site comparisons) from the MATLAB v2.0 interface are not implemented in this R package. All core analytical outputs are available through Figures 1–8 above. --- ### Writing results to CSV `char_write_results()` writes the 33-column output matrix to a CSV file whose column names and format match the MATLAB `charResults` output exactly. `out_dir` is required (no default); substitute a path of your choosing for the temporary directory used here. ```{r write, eval = FALSE} char_write_results(out$char_results, site = out$site, out_dir = tempdir()) # Writes: /CO_charResults.csv ``` The output CSV contains one row per interpolated time step and 33 columns covering all analytical outputs from C~interp~ through to per-zone Weibull confidence intervals. Column headers match the MATLAB reference file exactly to facilitate direct numerical comparison. --- ## Key parameters The parameter file (`*_charParams.csv`) controls all aspects of the analysis. The most commonly adjusted parameters are: | Parameter | Default | Description | |-----------|---------|-------------| | `yrInterp` | 15 | Resampling resolution (yr). Set to 0 for automatic (median raw resolution). | | `yr` | 500 | Smoothing window width (yr) for C~background~ estimation. | | `threshType` | 2 | Threshold type: 1 = global, 2 = local (sliding window). | | `threshMethod` | 3 | Noise distribution: 2 = Gaussian, 3 = Gaussian mixture model. | | `threshValues` | 0.95, 0.99, 0.999, 0.99 | Percentile thresholds; the final value defines the working threshold. | | `minCountP` | 0.05 | Alpha level for the minimum-count significance screen. | | `peakFrequ` | 1000 | Window width (yr) for smoothed fire frequency and FRI statistics. | | `zoneDiv` | record bounds | Zone boundaries (yr BP) for stratified FRI and Weibull analysis. | --- ## Comparison with MATLAB v2.0 *CharAnalysis* v2.0.0 (R) is a direct translation of *CharAnalysis* v2.0 (MATLAB). Outputs are quantitatively equivalent on all validated reference datasets. The table below summarises results across the four validation datasets; full details are in `inst/z_Validation_report_R_vs_MATLAB_V_2.0.md`. | Dataset | Site | Smoothing | charBkg max\|diff\| | Peaks R v2.0.0 | Peaks MATLAB v2.0 | |---------|------|-----------|-------------------|---------|-------------| | CO | Code Lake, AK | Method 1 (lowess) | ~5 × 10^-6^ | 39 | 48 | | CH10 | Chickaree Lake, CO | Method 2 (robust lowess) | 0.267 | 59 | 50 | | SI17 | Silver Lake, CO | Method 2 (robust lowess) | 0.130 | 25 | 31 | | RA07 | Raven Lake, AK | Method 2 (robust lowess) | < 0.001 | 15 | 17 | Two sources of numerical difference are documented: **1. Robust lowess background (smoothing method 2)** — MATLAB's Curve Fitting Toolbox `smooth(..., 'rlowess')` and the R `char_lowess()` port produce slightly different C~background~ series. For gap-free records (RA07) the difference is negligible (< 0.001). For records with NaN gaps (CH10) the difference is larger (≤ 0.267) because the two implementations handle gap positions differently inside the bisquare robustness iteration. Smoothing method 1 (plain lowess) is not affected and agrees to within floating-point noise on all datasets. **2. Gaussian mixture model (GMM) peak counts** — The R package ports the MATLAB `GaussianMixture.m` EM algorithm directly. Floating-point arithmetic accumulates differently across languages during EM iterations, causing the two implementations to reach slightly different threshold values in some local windows. Peak counts differ by 10–20% across datasets, with the direction varying (R sometimes higher, sometimes lower). All threshold and peak differences are downstream consequences of this single source; interpolation and peak-magnitude outputs agree to within numerical precision. **Weibull confidence intervals** — Bootstrap CIs use random resampling and will differ between R and MATLAB runs regardless of any other differences. Weibull point estimates (scale *b*, shape *c*) agree within a few percent on datasets where peak counts are similar. --- ## Citation If you use *CharAnalysis* in published research, please cite Higuera et al. (2009), the first study to apply the core analytical tools implemented in the program. If you used *CharAnalysis* v2.0 (MATLAB or R v2.0.0) specifically, please also cite the software: > Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. *Ecological Monographs* 79:201–219. https://doi.org/10.1890/07-2019.1 > Higuera, P.E. 2026. *CharAnalysis*: Diagnostic and analytical tools for peak analysis in sediment-charcoal records (Version 2.0). Zenodo. https://doi.org/10.5281/zenodo.19304064 --- ## References Higuera, P.E., L.B. Brubaker, P.M. Anderson, F.S. Hu, and T.A. Brown. 2009. Vegetation mediated the impacts of postglacial climate change on fire regimes in the south-central Brooks Range, Alaska. *Ecological Monographs* 79:201–219. https://doi.org/10.1890/07-2019.1