LACHESIS - Real-time inference of evolutionary dynamics during tumor initiation based on whole genome sequencing data

11/10/2025

Abstract

Understanding tumor evolution, particularly the temporal order of driver mutations, is increasingly important for improving diagnosis and treatment strategies. A recent study by Körber et al. (Nature Genetics, 2023) identified a bimodal distribution in the timing of the most recent common ancestor (MRCA) in neuroblastomas. This early or late MRCA distinction strongly correlated with clinical outcome, suggesting its potential as a prognostic biomarker. To facilitate the application of these findings, we developed LACHESIS, a package to estimate MRCA timing using the molecular clock of somatic single nucleotide variant (SSNV) accumulation. LACHESIS is based on the statistical model introduced by Körber et al. and provides tools for analyzing tumor evolution. This vignette details the functionality of the package and demonstrates its application through typical workflows, enabling users to integrate MRCA timing into their analyses. LACHESIS package version: 0.99.4

Installation

To install this package, start R and enter:

if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("LACHESIS")

For older versions of R, please refer to the appropriate Bioconductor release.

Citation

If you use LACHESIS in published research, please cite:

Körber V, Stainczyk SA, Kurilov R, Henrich KO, Hero B, Brors B, Westermann F, Höfer T. (2023) Neuroblastoma arises in early fetal development and its evolutionary duration predicts outcome. Nat Genet., 55:619-630. 10.1038/s41588-023-01332-y

Input Data

LACHESIS takes sample names, somatic single nucleotide variants and allele-specific copy number information as input. Though not strictly required, you may, ideally, also provide an estimate for the tumor cell purity and ploidy. Sample name, paths to respective variant files, and optionally further specifications can either be specified by vectors or in a tab-delimited sample-specification file. The package can be run on a single sample or on a cohort. A detailed log file will be generated, capturing the date, time, and provided input parameters.

Copy Number Variation (CNV) File

The CNV file is a tab-delimited file that contains segmented copy number information for your sample of interest. The CNV file requires columns for the chromosome number, start and end position of the segment, and either the total copy number or the number of A- and B-alleles. We encourage you to provide allele-specific copy number information; if this information is not provided, LACHESIS will assume that a given total copy arose by the minimal number of genomic changes (e.g., whole genome duplication, resulting in a 2:2 configuration for a total copy number of 4 or single-chromosomal missegregation, resulting in a 2:1 configuration for a copy number of 3), which may yield wrong results in some cases.

Single Nucleotide Variant (SNV) File

The SNV file should be in VCF format (Variant Call Format), a standardized text file format with a header section (lines starting with # and ##) and a data section where each row represents a variant. We recommend using only the variants that have passed the filtering process.

Package Output

LACHESIS utilizes whole genome sequencing data to determine the mutational timing of a tumor’s most recent common ancestor (MRCA). In addition, it will time the emergence of clonal (and thus early) copy number gains relative to the MRCA. If multiple chromosomal gains preceded the MRCA, LACHESIS tests whether they can be explained by a single aneuploidization event, in an earlier common ancestor (ECA) of the tumor, or whether they resulted from sequential chromosomal misseggregations. LACHESIS generates an output folder for each patient ID, which contains the following information:

Datatables (.txt)

SNV_timing_per_segment_ID.txt: purity, ploidy, and the estimated mean, lower, and upper bounds for the timing of the tumor’s MRCA and ECA (if detected)
SNV_timing_per_SNV_ID.txt: SNV context, clonality status and association with ECA/ MRCA.
MRCA_densities_ID.txt: per-segment mutation time, differentiated by minor and major copy number (A and B) For more details go to Estimating Mutation Densities at ECA/MRCA (MRCA)

Graphs (.pdf)

SNV_densities.pdf: histograms of normalized, mean single-copy and multi-copy mutation densities and timeline of early tumor evolution
VAF_histogram_strat.pdf: measured copy numbers along the genome and variant allele frequency (VAF) histograms of SNVs stratified by copy number and minor/major allele count
VAF_histogram: variant allele frequency histogram and density distribution of all SNVs
SNV_timing_per_SNV_ID.pdf: clonality status of SNVs associated with specific chromosomal gains

Additionally, if the function was applied to a cohort, the following files will be included:

SNV_densities_cohort.pdf: histograms and cumulative distributions (with 95% confidence intervals) of mean mutation densities per segment at MRCA and ECA
Driver_mutations_cohort.pdf: known driver SNVs and their evolutionary timing clonal (pre-CNV, post-CNV, unknown) or subclonal
lachesis_clin.par.pdf: correlation between SNV densities computed by LACHESIS and clinical parameters (age, survival,…)
survival_plots.pdf: Kaplan-Meier plots stratified by MRCA density if providing survival information

Structure

There are two ways to perform analyses using LACHESIS: either by utilizing the sub-functions individually or by calling the wrapper function. All of the functions presented in the main workflow are also invoked by the wrapper function. In addition, the wrapper function summarizes results across samples, allowing cohort analysis to be performed.

LACHESIS Main Workflow

Importing Data

Importing Copy Number Information (readCNV)

The readCNV function converts a user-specified BED file containing copy number information into a standardized format. It performs several quality checks on the input file and returns a clean, structured dataframe. Users can specify column identifiers for chromosomal positions and allele-specific copy number (A and B alleles) using either column names or indices. If no identifiers are provided, the function will attempt to identify the relevant columns using standard nomenclature (e.g., “chrom”, “start”, “end”, etc.).

Handling Missing Allele Information

If total copy number information is available but the number of A and B alleles is missing, the function estimates allele counts. To this end, the function assumes the molecular scenario with minimal step count to be the most likely one. For example, a total copy number of 4 is assumed to stem from whole genome duplication, corresponding to a 2:2 configuration; likewise a total copy number of 3 is assumed to stem from chromosomal missegregation, corresponding to a 2:1 configuration. Formally, the total copy number is divided by two, with one allele rounded up and the other rounded down.

Input

readCNV(cn.info = NULL, chr.col = NULL, start.col = NULL, end.col = NULL, A.col = NULL, B.col = NULL, tcn.col = NULL, merge.tolerance = 10^5, ignore.XY = TRUE, max.cn = 4, tumor.id = NULL)

Output

A data.table containing the following information:

Output	Definition
Chr	the chromosome number
Start	start position of the segment
End	end position of the segment
A	copy number of the major allele
B	copy number of the minor allele
TCN	the total copy number

Example using output from ACESeq as copy number information

aceseq_cn <- system.file("extdata", "ACESeq/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(aceseq_cn)
head(cn_data, n = 1)

##       Chr  Start      End     A     B   TCN
##    <char>  <int>    <int> <num> <num> <num>
## 1:      1 840009 12839977     1     1     2

Example using output from ASCAT as copy number information

ascat_cn <- system.file("extdata", "ASCAT/S98.segments.txt", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(ascat_cn)
head(cn_data, n = 1)

##       Chr   Start       End     A     B   TCN
##    <char>   <int>     <int> <num> <num> <num>
## 1:      1 1695590 244820741     1     1     2

Example using output from PURPLE as copy number information

purple_cn <- system.file("extdata", "PURPLE/NB-S-599-T.purple.cnv.somatic.tsv", package = "LACHESIS")
cn_data <- LACHESIS::readCNV(purple_cn)
head(cn_data, n = 1)

##       Chr Start       End     A     B   TCN
##    <char> <int>     <int> <num> <num> <num>
## 1:      1     1 248956422     1     1     2

Importing Variant Information (readVCF)

The readVCF function is used to import a VCF file and extract key genomic information such as read counts and VAFs. Supported tools to generate the VCF file are Mutect2, Strelka2, Sentieon and the DKFZ SNV Calling Workflow.

Input

readVCF(vcf = NULL, ignore.XY = TRUE, vcf.source = "strelka", min.vaf = 0.01, min.depth = 30, t.sample = NULL, info.af = "AF", info.dp = "DP", filter.value = "PASS")

Output

A data.table containing the following information:

Output	Definition
chrom	the chromosome
pos	the chromosomal coordinate of the SNV
ref	the reference nucleotide at this position
alt	the alternate nucleotide at this position
t_ref_count	the number of reads reporting the reference nucleotide
t_alt_count	the number of reads reporting the alternate nucleotide
t_depth	the read coverage at this position
t_vaf	the variant allele frequency of the SSNV

Example: SNV calls obtained with Mutect

mutect_vcf <- system.file("extdata", "mutect.somatic.vcf.gz", package = "LACHESIS")
m_data <- LACHESIS::readVCF(vcf = mutect_vcf, vcf.source = "mutect", filter.value = ".")
head(m_data, n = 1)

##     chrom      pos    ref    alt t_ref_count t_alt_count t_depth      t_vaf
##    <char>    <int> <char> <char>       <num>       <num>   <num>      <num>
## 1:      2 29467995      A      G          70           4      74 0.05405405

Example: SNV calls obtained with Strelka

strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = strelka_vcf, vcf.source = "strelka", filter.value = "PASS")
head(s_data, n = 1)

##     chrom    pos    ref    alt t_ref_count t_alt_count t_depth t_vaf
##    <char>  <int> <char> <char>       <num>       <num>   <num> <num>
## 1:      1 746368      C      A          42          14      56  0.25

Example: SNV calls obtained with the dkfz SNV calling workflow

dkfz_vcf <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
d_data <- LACHESIS::readVCF(vcf = dkfz_vcf, vcf.source = "dkfz")
head(d_data, n = 1)

##     chrom     pos    ref    alt t_ref_count t_alt_count t_depth     t_vaf
##    <char>   <num> <char> <char>       <num>       <num>   <num>     <num>
## 1:      1 5474212      C      T          41          13      54 0.2407407

Plotting Variant Allele Frequencies (plotVAFdistr)

The plotVAFdistr function generates a VAF histogram of somatic SNVs. If showdensity is set to TRUE, an additional density plot is displayed.

Input

plotVAFdistr(vaf = NULL, vaf.interval = 0.05, t_sample = NULL, vaf.show.counts = FALSE, vaf.show.density = TRUE, vaf.col = "#34495e", vaf.border = "#bdc3c7", srtcounts = 45, output.file = NULL)

Example

strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = strelka_vcf, vcf.source = "strelka")
head(s_data, n = 1)

##     chrom    pos    ref    alt t_ref_count t_alt_count t_depth t_vaf
##    <char>  <int> <char> <char>       <num>       <num>   <num> <num>
## 1:      1 746368      C      A          42          14      56  0.25

LACHESIS::plotVAFdistr(s_data)

Figure 1: VAF distribution of SSNVs

This VAF histogram provides a first overview of the tumor’s mutational landscape. The observed clonal (~0.3) and subclonal (~0.1) peaks align with expectations based on the tumor purity (0.83) and ploidy (2.59).

Assigning copy number states to single nucleotide variants (nbImport)

The function nbImport integrates CNV and SNV information into a single datatable. Each variant is assigned to its corresponding copy number segment and status.

Input

nbImport(cnv = NULL, snv = NULL, purity = NULL, ploidy = NULL)

Assignment of mutational signatures

LACHESIS can incorporate mutational signatures generated by either SigProfilerAssignment or MutationalPatterns. SigProfilerAssignment is a Python-based tool developed by the Alexandrov Lab, whereas MutationalPatterns is an R package available through Bioconductor. Both tools assign previously known mutational signatures to sequenced samples. If signature assignment has already been performed with SigProfilerAssignment, the outputfile, typically named “Decomposed_MutationType_Probabilities.txt”, can be provided via sig.file parameter. If sig.file is not specified and sig.assign = TRUE, the probabilities for the SBS signatures will be automatically computed using MutationalPatterns.

A global random seed (set.seed()) should be specified before running LACHESIS to create reproducable results.

Output

A data.table containing the following information:

Output	Definition
chrom	the chromosome
cn_start	start of the segment
cn_end	end of the segment
A	copy number of the major allele
B	copy number of the minor allele
TCN	the total copy number
snv_start	the chromosomal start coordinate of the SNV
ref	the reference nucleotide at this position
alt	the alternate nucleotide at this position
t_ref_count	the number of reads reporting the reference nucleotide
t_alt_count	the number of reads reporting the alternate nucleotide
t_depth	the read coverage at this position
t_vaf	the variant allele frequency of the SSNV
snv_end	the chromosomal end coordinate of the SNV

Example using all variants from vcf file

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)

Example using variants associated with specific SBS mutational signatures from vcf file

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
sig.filepath <- system.file("extdata", "NBE15_Decomposed_MutationType_Probabilities.txt", package = "LACHESIS")
head(sig.filepath, n = 1)

## [1] "/tmp/RtmpQ5IPno/Rinst1673c73b893c/LACHESIS/extdata/NBE15_Decomposed_MutationType_Probabilities.txt"

nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.file = sig.filepath, sig.select = c("SBS1", "SBS5", "SBS40a", "SBS18"))
head(nb, n = 1)

##    MutationType  chrom  cn_start    cn_end     A     B   TCN snv_start    ref
##          <char> <char>     <int>     <int> <num> <num> <num>     <num> <char>
## 1:      A[C>A]A      1 104680006 119099997     2     0     2 105350598      G
##       alt t_ref_count t_alt_count t_depth     t_vaf   snv_end sequence_context
##    <char>       <num>       <num>   <num>     <num>     <num>           <char>
## 1:      T          28          19      47 0.4042553 105350598              ACA
##           SBS1  SBS2     SBS3  SBS4       SBS5  SBS6 SBS7a SBS7b SBS7c SBS7d
##          <num> <int>    <num> <int>      <num> <int> <int> <int> <int> <int>
## 1: 0.001524286     0 0.252638     0 0.03705652     0     0     0     0     0
##     SBS8  SBS9 SBS10a SBS10b SBS10c SBS10d SBS11 SBS12 SBS13 SBS14 SBS15 SBS16
##    <int> <int>  <int>  <int>  <int>  <int> <int> <int> <int> <int> <int> <int>
## 1:     0     0      0      0      0      0     0     0     0     0     0     0
##    SBS17a SBS17b     SBS18 SBS19 SBS20 SBS21 SBS22a SBS22b SBS23 SBS24 SBS25
##     <int>  <int>     <num> <int> <int> <int>  <int>  <int> <int> <int> <int>
## 1:      0      0 0.3112239     0     0     0      0      0     0     0     0
##    SBS26 SBS27 SBS28 SBS29 SBS30 SBS31 SBS32 SBS33 SBS34 SBS35 SBS36 SBS37
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1:     0     0     0     0     0     0     0     0     0     0     0     0
##    SBS38 SBS39    SBS40a SBS40b SBS40c SBS41 SBS42 SBS43 SBS44 SBS45 SBS46
##    <int> <int>     <num>  <int>  <int> <int> <int> <int> <int> <int> <int>
## 1:     0     0 0.3975573      0      0     0     0     0     0     0     0
##    SBS47 SBS48 SBS49 SBS50 SBS51 SBS52 SBS53 SBS54 SBS55 SBS56 SBS57 SBS58
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1:     0     0     0     0     0     0     0     0     0     0     0     0
##    SBS59 SBS60 SBS84 SBS85 SBS86 SBS87 SBS88 SBS89 SBS90 SBS91 SBS92 SBS93
##    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1:     0     0     0     0     0     0     0     0     0     0     0     0
##    SBS94 SBS95 SBS96 SBS97 SBS98 SBS99 Signature Probability
##    <int> <int> <int> <int> <int> <int>    <char>       <num>
## 1:     0     0     0     0     0     0    SBS40a   0.3975573

nb.2 <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.select = c("SBS1", "SBS5", "SBS40", "SBS18"))
head(nb.2, n = 1)

##    MutationType  chrom  cn_start    cn_end     A     B   TCN snv_start    ref
##          <char> <char>     <int>     <int> <num> <num> <num>     <num> <char>
## 1:      A[C>A]A      1 198450024 249209945     2     2     4 230037652      G
##       alt t_ref_count t_alt_count t_depth     t_vaf   snv_end sequence_context
##    <char>       <num>       <num>   <num>     <num>     <num>           <char>
## 1:      T          66          28      94 0.2978723 230037652              ACA
##           SBS1         SBS2      SBS3       SBS4  SBS5         SBS6 SBS7a
##          <num>        <num>     <num>      <num> <num>        <num> <num>
## 1: 0.001247748 3.878031e-07 0.1328299 0.05499724     0 0.0008617779     0
##          SBS7b SBS7c SBS7d       SBS8  SBS9      SBS10a SBS10b SBS10c SBS10d
##          <num> <num> <num>      <num> <num>       <num>  <num>  <num>  <num>
## 1: 0.001411271     0     0 0.01327665     0 0.001012315      0      0      0
##    SBS11 SBS12 SBS13 SBS14        SBS15 SBS16      SBS17a SBS17b     SBS18
##    <num> <num> <num> <num>        <num> <num>       <num>  <num>     <num>
## 1:     0     0     0     0 7.034001e-05     0 0.000815158      0 0.2586182
##           SBS19 SBS20 SBS21 SBS22 SBS23      SBS24 SBS25 SBS26        SBS28
##           <num> <num> <num> <num> <num>      <num> <num> <num>        <num>
## 1: 0.0004193747     0     0     0     0 0.01135855     0     0 9.048954e-05
##         SBS29       SBS30 SBS31      SBS32 SBS33       SBS34 SBS35 SBS36
##         <num>       <num> <num>      <num> <num>       <num> <num> <num>
## 1: 0.06919884 0.002149741     0 0.01802011     0 0.002618774     0     0
##          SBS37       SBS38 SBS39     SBS40 SBS41 SBS42 SBS44 SBS84       SBS85
##          <num>       <num> <num>     <num> <num> <num> <num> <num>       <num>
## 1: 0.007408653 0.001324511     0 0.1929303     0     0     0     0 0.005897893
##          SBS86 SBS87 SBS88     SBS89 SBS90        SBS91 SBS92 SBS93 SBS94
##          <num> <num> <num>     <num> <num>        <num> <num> <num> <num>
## 1: 0.002183156     0     0 0.2210739     0 0.0001847195     0     0     0
##    Signature Probability
##       <char>       <num>
## 1:     SBS40   0.1929303

Plotting Imported Data (plotNB)

This function visualizes the results generated by nbImport. The top plot displays the inferred copy numbers along the genome. By default, the human genome build hg19 is used as reference. However, alternative genome builds such as hg18 or hg38 can be specified if required. The bottom plots show VAF histograms of SNVs, stratified by copy number and minor/major alleles, as well as the clonality status of each SNV.

Input

plotNB(nb = NULL, snvClonality = NULL, ref.build = "hg19", min.cn = 2, max.cn = 4, nb.col.abline = "gray70", nb.col.cn.2 = "#7f8c8d", nb.col.cn = "#16a085", nb.col.hist = "#34495e", nb.border = NA, nb.breaks = 100, samp.name = NULL, output.file = NULL, sig.show = FALSE)

Example using all variants from vcf file

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotNB(nb = nb, snvClonality = snvClonality)

Figure 2: top plot - copy number profile along the genome, bottom plots - VAF distribution of SSNVs stratified by copy number and minor/major alleles

The copy number plot highlights multiple chromosomal aberrations across different copy number states, including a gain of chromosome 1q to four copies and a whole-chromosome gain of chromosome 5 to three copies. SNVs located on these segments are shown in the per-copy-number VAF histograms below. The dotted lines indicate the expected clonal VAFs for each copy number state. As we would expect, subclonal SNVs appear to the left of these lines, while clonal peaks align closely with the dotted lines.

Example using variants assosciated with specific SBS mutational signatures from vcf file

set.seed(123)
snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
sig.filepath <- system.file("extdata", "NBE15_Decomposed_MutationType_Probabilities.txt", package = "LACHESIS")
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, sig.assign = TRUE, ID = "NBE15", sig.file = sig.filepath, sig.select = c("SBS1", "SBS5", "SBS40a", "SBS18"))
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotNB(nb = nb, snvClonality = snvClonality, sig.show = TRUE)

Figure 3: top plot - copy number profile along the genome, bottom plots - VAF distribution of SSNVs stratified by copy number and minor/major alleles

Figure 4: top plot - copy number profile along the genome, bottom plots - VAF distribution of SSNVs stratified by copy number and minor/major alleles

In addition to the VAF histograms stratified by clonality, here we show the same histograms stratified by mutational signature. In this case, SBS1, SBS5, SBS18, and SBS40a are observed—a typical constellation in untreated neuroblastoma. These signatures appear to be distributed relatively evenly and do not show any clear association with SNVs occurring before or after copy number gains or clonal expansion.

Processing Clonal Mutations

Counting Clonal Mutations (ClonalMutationalCounter)

This function counts the number of clonal mutations residing on a single or multiple copies per genomic segment. Segments of equal copy number and A:B conformation are chromosome-wise merged.

Input

clonalMutationCounter(nbObj = NULL, min.cn = 1, max.cn = 4, chromosomes = c(1:22))

Output

A data.table containing the following information:

Column name	Definition
chrom	the chromosome
TCN	the total copy number
A	copy number of the major allele
B	copy number of the minor allele
Seglength	the segment length
n_mut_A	the number of mutations per copy with a multiplicity of A
n_mut_B	the number of mutations per copy with a multiplicity of B
n_mut_total_clonal	the total number of clonal mutations

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
head(cl_muts, n = 1)

##     chrom   TCN     A     B Seglength  Start       End n_mut_A n_mut_B
##    <char> <num> <num> <num>     <int>  <int>     <int>   <num>   <num>
## 1:      1     2     2     0 115964914 840009 120529963       0       0
##    n_mut_total_clonal n_mut_total_subclonal n_mut_total n_mut_firstpeak
##                 <num>                 <num>       <int>           <num>
## 1:                 32                    40          72              32
##         p_sc      p_lc  p_ec   p_c
##        <num>     <num> <num> <num>
## 1: 0.5555556 0.4444444     0     0

Normalizing Clonal Mutation Counts (normalizeCounts)

This function translates the clonal mutation counts obtained with clonalMutationCounter into “molecular time”. The normalized counts correspond to the number of mutations accumulated between conception/gastrulation and MRCA/copy number gain on a single DNA copy.

Input

normalizeCounts(countObj)

Output

A data.table containing the following information:

Column name	Definition
chrom	the chromosome
TCN	the total copy number
A	copy number of the major allele
B	copy number of the minor allele
Seglength	the segment length
n_mut_A	the number of mutations per copy with a multiplicity of A
n_mut_B	the number of mutations per copy with a multiplicity of B
n_mut_total_clonal	the total number of mutations
density_total_mean	mean mutation densities (1/Mb) per copy
density_total_lower	lower limit of mutation densities (1/Mb) per copy according to a 95% confidence interval
density_total_upper	upper limit of mutation densities (1/Mb) per copy ccording to a 95% confidence interval
density_A_mean	mean mutation densities (1/Mb) per copy with a multiplicity of A
density_A_lower	lower limit of mutation densities (1/Mb) per copy with a multiplicity of A according to a 95% confidence interval
density_A_upper	upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval
density_B_mean	mean mutation densities (1/Mb) per copy with a multiplicity of B
density_B_lower	lower limit of mutation densities (1/Mb) per copy with a multiplicity of B
density_B_upper	upper limit of mutation densities (1/Mb) per copy with a multiplicity of B

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
head(norm_muts, n = 1)

##     chrom   TCN     A     B Seglength  Start       End n_mut_A n_mut_B
##    <char> <num> <num> <num>     <int>  <int>     <int>   <num>   <num>
## 1:      1     2     2     0 115964914 840009 120529963       0       0
##    n_mut_total_clonal n_mut_total_subclonal n_mut_total n_mut_firstpeak
##                 <num>                 <num>       <int>           <num>
## 1:                 16                    40          72              32
##         p_sc      p_lc  p_ec   p_c density_total_mean density_A_mean
##        <num>     <num> <num> <num>              <num>          <num>
## 1: 0.5555556 0.4444444     0     0          0.1379728              0
##    density_B_mean density_total_lower density_total_upper density_A_lower
##             <num>               <num>               <num>           <num>
## 1:              0          0.07886336           0.2240591               0
##    density_A_upper density_B_lower density_B_upper
##              <num>           <num>           <num>
## 1:      0.03181031               0      0.03181031

Inferring Clonal Evolution

Estimating Mutation Densities at ECA/MRCA (MRCA)

This function estimates mutation densities at MRCA and, if present, ECA from the normalized clonal mutation counts obtained with normalizeCounts. Optionally, the average false positive rate of clonal mutations (e.g. due to incomplete tissue sampling) and the standard deviation of the false positive rate can be defined (fp.mean, fp.sd).

Input Parameters

MRCA(normObj = NULL, min.seg.size = 10^7, fp.mean = 0, fp.sd = 0, excl.chr = NULL)

Output

SNV_timing_per_segment_ID.txt: purity, ploidy, and the estimated mean, lower, and upper bounds for the timing of the tumor’s MRCA and ECA (if detected).
MRCA_densities_ID.txt: per-segment mutation time, differentiated by minor and major copy number (A and B). For each segment,

Column Name	Definition
chrom	the chromosome
TCN	the total copy number
A	copy number of the major allele
B	copy number of the minor allele
Seglength	the segment length
n_mut_A	the normalized number of mutations per copy with a multiplicity of A
n_mut_B	the normalized number of mutations per copy with a multiplicity of B
n_mut_total_clonal	the normalized number of mutations per copy
density_total_mean	mean mutation densities (1/Mb) per copy
density_total_lower	lower limit of mutation densities (1/Mb) per copy according to a 95% confidence interval
density_total_upper	upper limit of mutation densities (1/Mb) per copy according to a 95% confidence interval
density_A_mean	mean mutation densities (1/Mb) per copy with a multiplicity of A
density_A_lower	lower limit of mutation densities (1/Mb) per copy with a multiplicity of A according to a 95% confidence interval
density_A_upper	upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval
density_B_mean	mean mutation densities (1/Mb) per copy with a multiplicity of B
density_B_lower	lower limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval
density_B_upper	upper limit of mutation densities (1/Mb) per copy with a multiplicity of B according to a 95% confidence interval
p_total_to_mrca	probability that the mutation density per copy agrees with the mean mutation density at the MRCA
p_A_to_mrca	probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the MRCA
p_B_to_mrca	probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the MRCA
p_adj_total_to_mrca	Holm-corrected probability that the mutation density per copy agrees with the mean mutation density at the MRCA
p_adj_A_to_mrca	Holm-corrected probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the MRCA
p_adj_B_to_mrca	Holm-corrected probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the MRCA
MRCA_qual	Quality control - PASS if `p_adj_to_mrca >= 0.01`
p_A_to_eca	probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the ECA
p_B_to_eca	probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the ECA
p_adj_A_to_eca	Holm-corrected probability that the mutation density per copy with a multiplicity of A agrees with the mean mutation density at the ECA
p_adj_B_to_eca	Holm-corrected probability that the mutation density per copy with a multiplicity of B agrees with the mean mutation density at the ECA
A_time	Time point at which the copy number change of the major allele occurred (can be `ECA`, `MRCA`, or, if `A=1`, NA)
B_time	Time point at which the copy number change of the major allele occurred (can be `ECA`, `MRCA`, or, if `B=1`, NA)

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
head(mrca, n = 1)

##     chrom   TCN     A     B Seglength  Start       End n_mut_A n_mut_B
##    <char> <num> <num> <num>     <int>  <int>     <int>   <num>   <num>
## 1:      1     2     2     0 115964914 840009 120529963       0       0
##    n_mut_total_clonal n_mut_total_subclonal n_mut_total n_mut_firstpeak
##                 <num>                 <num>       <int>           <num>
## 1:                 16                    40          72              32
##         p_sc      p_lc  p_ec   p_c density_total_mean density_A_mean
##        <num>     <num> <num> <num>              <num>          <num>
## 1: 0.5555556 0.4444444     0     0          0.1379728              0
##    density_B_mean density_total_lower density_total_upper density_A_lower
##             <num>               <num>               <num>           <num>
## 1:              0          0.07886336           0.2240591               0
##    density_A_upper density_B_lower density_B_upper p_total_to_mrca  p_A_to_mrca
##              <num>           <num>           <num>           <num>        <num>
## 1:      0.03181031               0      0.03181031      0.02762894 9.327933e-12
##    p_B_to_mrca p_adj_total_to_mrca p_adj_A_to_mrca p_adj_B_to_mrca MRCA_qual
##         <lgcl>               <num>           <num>           <num>    <char>
## 1:          NA           0.6354657    3.078218e-10              NA      PASS
##    p_A_to_eca p_B_to_eca p_adj_A_to_eca p_adj_B_to_eca A_time B_time
##         <num>      <num>          <num>          <num> <char> <char>
## 1:  0.4999709         NA              1             NA    ECA   <NA>

Plotting Mutation Densities (plotMutationalDensities)

This function visualizes the results of the MRCA analysis. The top plot presents histograms of mean mutation densities, while the bottom plot illustrates a timeline of early tumor evolution. This timeline highlights mutation densities (mean values and 95% confidence intervals) associated with individual chromosomal gains. Additionally, the mutation densities at the ECA and MRCA are depicted. Optionally, the real-time estimation can be calculated based on a user-specified mutation rate (SNVs/day). Details on these rates are available in the literature. For neuroblastoma, a default value of 3.2 SNVs/ day is set according to calculations performed by Körber et al. (Nature 2023).

Input

plotMutationDensities(mrcaObj = NULL, samp.name = NULL, min.seg.size = 10^7, mut.col.zero = "#4FB12B", mut.col.multi = "#176A02", mut.border = NULL, mut.show.density = TRUE, mut.breaks = NULL, mut.xaxis = NULL, mut.show.realtime = FALSE, mut.snv.rate = 3.2, output.file = NULL)

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
LACHESIS::plotMutationDensities(mrca)

*top plots* - Distribution of non-amplified and amplified clonal mutation densities across segments, *bottom plot* - Evolutionary timeline showing mutation densities at chromosomal gains (mean and 95% CI) and at the tumor's ECA and MRCA (mean and 95% CI of)

Figure 5: top plots - Distribution of non-amplified and amplified clonal mutation densities across segments, bottom plot - Evolutionary timeline showing mutation densities at chromosomal gains (mean and 95% CI) and at the tumor’s ECA and MRCA (mean and 95% CI of)

The single-copy SNVs place the MRCA at approximately 0.22 SNVs per Mb. The ECA, defined by multi-copy SNVs, is positioned before the MRCA, close to 0 SNVs per Mb and most of the chromosomal gains are asscoiated with it. This pattern is typical in high-risk neuroblastoma, where aneuploidy occurs during the first trimester, followed by the emergence of the MRCA around birth or during the first year of life.

Estimating Clonality per SNV (estimateClonality)

This function determines the clonality status of each SNV, classifying them as clonal or subclonal. For clonal variants, the function further infers whether the mutation arose before or after a copy number alteration (pre-CNV or post-CNV) if possible.

Input

estimateClonality(nbObj = NULL, mrcaObj = NULL, ID = NULL, purity = NULL)

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51)
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)

## Key: <chrom, snv_start, ref, alt>
##        chrom snv_start    ref    alt Sample   TCN     A     B  cn_start
##       <char>     <int> <char> <char> <char> <num> <num> <num>     <int>
##    1:      1   5474212      C      T  NBE15     2     2     0    840009
##    2:      1   5490411      G      A  NBE15     2     2     0    840009
##    3:      1   5686838      A      G  NBE15     2     2     0    840009
##    4:      1   6151570      A      C  NBE15     2     2     0    840009
##    5:      1   6168077      T      G  NBE15     2     2     0    840009
##   ---                                                                  
## 2414:      9 134113112      G      A  NBE15     3     2     1 108920026
## 2415:      9 137314314      T      C  NBE15     3     2     1 108920026
## 2416:      9 138907717      G      A  NBE15     3     2     1 108920026
## 2417:      9 139407460      A      C  NBE15     3     2     1 108920026
## 2418:      9 140067532      G      A  NBE15     3     2     1 108920026
##          cn_end      t_vaf Signature A_time B_time Clonality known_driver_gene
##           <int>      <num>    <char> <char> <char>    <char>            <char>
##    1:  12839977 0.24074074      <NA>    ECA   <NA>        SC              <NA>
##    2:  12839977 0.09090909      <NA>    ECA   <NA>        SC              <NA>
##    3:  12839977 0.47222222      <NA>    ECA   <NA>   Postcnv              <NA>
##    4:  12839977 0.11538462      <NA>    ECA   <NA>        SC              <NA>
##    5:  12839977 0.15000000      <NA>    ECA   <NA>        SC              <NA>
##   ---                                                                         
## 2414: 141019791 0.08035714      <NA>   MRCA   <NA>        SC              <NA>
## 2415: 141019791 0.09523810      <NA>   MRCA   <NA>        SC              <NA>
## 2416: 141019791 0.31858407      <NA>   MRCA   <NA>         C              <NA>
## 2417: 141019791 0.05617978      <NA>   MRCA   <NA>        SC              <NA>
## 2418: 141019791 0.32183908      <NA>   MRCA   <NA>         C              <NA>

Plotting Clonality per SNV (plotClonality)

This function visualizes the output from estimateClonality, providing an overview of clonal and subclonal SNVs across the chromosomal segments. If a mutational signature file is supplied, the function further stratifies SNV clonality by SBS signature within each chromosome for detailed analysis.

Input

plotClonality(snvClonality = snvClonality, nb = NULL, sig.assign = FALSE, output.file = NULL)

Example

snvs <- system.file("extdata", "NBE15", "snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
s_data <- LACHESIS::readVCF(vcf = snvs, vcf.source = "dkfz")
aceseq_cn <- system.file("extdata", "NBE15", "NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS")
c_data <- LACHESIS::readCNV(aceseq_cn)
nb <- LACHESIS::nbImport(cnv = c_data, snv = s_data, purity = 1, ploidy = 2.51, ID = "NBE15")
cl_muts <- LACHESIS::clonalMutationCounter(nb)
norm_muts <- LACHESIS::normalizeCounts(cl_muts)
mrca <- LACHESIS::MRCA(norm_muts)
snvClonality <- LACHESIS::estimateClonality(nbObj = nb, mrcaObj = mrca, ID = "NBE15", purity = 1)
LACHESIS::plotClonality(snvClonality = snvClonality, nb = nb, sig.assign = FALSE)

This graph provides a first overview of whether SNVs occurred before or after copy number gains. A more detailed analysis of specific SNVs—such as known driver mutations—and their timing relative to the gains can be performed using the per-SNV table output. In this case, we observe a large number of SNVs occurring after the gain on chromosome 7, potentially triggered by this gain, whereas the small number of preceding mutations on chromosome 1 may provide insights into the initiation of the chromosome 1 gain.

LACHESIS Wrapper Function

This function calls all of the previously mentioned functions, and saves the results to the specified output directory. In addition, for every patient a datatable with mean, lower and upper bounds of mutation density at MRCA and ECA (if detected) is generated. These data tables will be combined into a single table, enabling comprehensive cohort analysis. Input can be specified as vectors or in a tab-delimited sample-specification file

Input

LACHESIS(
    input.files = NULL, ids = NULL, vcf.tumor.ids = NULL, cnv.files = NULL, snv.files = NULL,
    vcf.source = NULL, purity = NULL, ploidy = NULL, cnv.chr.col = NULL, cnv.start.col = NULL,
    cnv.end.col = NULL, cnv.A.col = NULL, cnv.B.col = NULL, cnv.tcn.col = NULL, age = NULL,
    OS.time = NULL, OS = NULL, EFS.time = NULL, EFS = NULL, output.dir = NULL,
    ignore.XY = TRUE, min.cn = 1, max.cn = 4, merge.tolerance = 10^5, min.vaf = 0.01,
    min.depth = 30, vcf.info.af = "AF", vcf.info.dp = "DP", min.seg.size = 10^7, fp.mean = 0,
    fp.sd = 0, excl.chr = NULL, ref_build = "hg19", ...
)

Output

A data.table containing the following information:

Column Name	Definition
Sample_ID	the tumor sample ID
MRCA_time_mean	lower and upper bounds of mutation density at MRCA as calculated in MRCA function
MRCA_time_lower/ MRCA_time_upper	mean mutation density at MRCA as calculated in MRCA function
ECA_time_mean	mean mutation density at ECA as calculated in MRCA function
ECA_time_lower/ ECA_time_upper	lower and upper bounds of mutation density at ECA as calculated in MRCA function
Ploidy	user-specified average copy number in the tumor sample
Purity	user-specified tumor cell content
Age	user-specified age at diagnosis
OS.time	user-specified overall survival time
OS	user-specified overall survival indicator variable
EFS.time	user-specified eventfree survival time
EFS	user-specified eventfree survival indicator variable

Example with vectors

strelka_vcf <- system.file("extdata", "strelka2.somatic.snvs.vcf.gz", package = "LACHESIS")
aceseq_cn <- system.file("extdata", "ACESeq/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(ids = "NBE11", cnv.files = aceseq_cn, snv.files = strelka_vcf, vcf.source = "strelka", purity = 0.83, ploidy = 2.59)
head(lachesis, n = 3)

## Key: <Sample_ID, MRCA_time_mean, MRCA_time_lower, MRCA_time_upper, ECA_time_mean, ECA_time_lower, ECA_time_upper, Ploidy, Purity>
##    Sample_ID MRCA_time_mean MRCA_time_lower MRCA_time_upper ECA_time_mean
##       <char>          <num>           <num>           <num>         <num>
## 1:     NBE11      0.5206327       0.4839514       0.5706315    0.01397875
##    ECA_time_lower ECA_time_upper Ploidy Purity   Age OS.time    OS EFS.time
##             <num>          <num>  <num>  <num> <num>   <num> <num>    <num>
## 1:     0.01145084     0.01661722   2.59   0.83    NA      NA    NA       NA
##      EFS
##    <num>
## 1:    NA

Example with tab-delimited sample-specification file

lachesis_input <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")

lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)
head(lachesis, n = 3)

## Key: <Sample_ID, MRCA_time_mean, MRCA_time_lower, MRCA_time_upper, ECA_time_mean, ECA_time_lower, ECA_time_upper, Ploidy, Purity, Age, OS.time, OS, EFS.time, EFS>
##    Sample_ID MRCA_time_mean MRCA_time_lower MRCA_time_upper ECA_time_mean
##       <char>          <num>           <num>           <num>         <num>
## 1:     NBE11    0.450628573     0.423534062       0.4810443   0.012440280
## 2:     NBE15    0.223850144     0.207142800       0.2423009   0.006135603
## 3:     NBE26    0.008700811     0.007140055       0.0110810           NaN
##    ECA_time_lower ECA_time_upper Ploidy Purity   Age OS.time    OS EFS.time
##             <num>          <num>  <num>  <num> <int>   <int> <int>    <int>
## 1:    0.007181583     0.01634677   2.59   0.83  4094    6706     0      240
## 2:    0.001887577     0.01127038   2.51   1.00  2143    2407     0      330
## 3:             NA             NA   3.25   0.88    50    5117     0     5117
##      EFS
##    <int>
## 1:     1
## 2:     1
## 3:     0

Example with multiple sample and data frame input

nbe11_vcf <- system.file("extdata", "NBE11/snvs_NBE11_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nbe11_cn <- read.delim(system.file("extdata", "NBE11/NBE11_comb_pro_extra2.59_0.83.txt", package = "LACHESIS"), sep = "\t", header = TRUE)
nbe15_vcf <- system.file("extdata", "NBE15/snvs_NBE15_somatic_snvs_conf_8_to_10.vcf", package = "LACHESIS")
nbe15_cn <- read.delim(system.file("extdata", "NBE15/NBE15_comb_pro_extra2.51_1.txt", package = "LACHESIS"), sep = "\t", header = TRUE)

lachesis <- LACHESIS::LACHESIS(ids = c("NBE11", "NBE15"), cnv.files = list(nbe11_cn, nbe15_cn), snv.files = c(nbe11_vcf, nbe15_vcf), vcf.source = c("dkfz", "dkfz"), purity = c(0.83, 1), ploidy = c(2.59, 2.51), cnv.chr.col = c(1, 1), cnv.start.col = c(2, 2), cnv.end.col = c(3, 3), cnv.A.col = c(34, 34), cnv.B.col = c(35, 35), cnv.tcn.col = c(37, 37))
head(lachesis, n = 3)

## Key: <Sample_ID, MRCA_time_mean, MRCA_time_lower, MRCA_time_upper, ECA_time_mean, ECA_time_lower, ECA_time_upper, Ploidy, Purity>
##    Sample_ID MRCA_time_mean MRCA_time_lower MRCA_time_upper ECA_time_mean
##       <char>          <num>           <num>           <num>         <num>
## 1:     NBE11      0.4506286       0.4244405       0.4826348   0.012440280
## 2:     NBE15      0.2238501       0.2062892       0.2444149   0.006135603
##    ECA_time_lower ECA_time_upper Ploidy Purity   Age OS.time    OS EFS.time
##             <num>          <num>  <num>  <num> <num>   <num> <num>    <num>
## 1:    0.007209882     0.01662725   2.59   0.83    NA      NA    NA       NA
## 2:    0.002325113     0.01137357   2.51   1.00    NA      NA    NA       NA
##      EFS
##    <num>
## 1:    NA
## 2:    NA

Plotting Cohort Analysis (plotLACHESIS)

The function plotLACHESIS plots cumulative distributions of SSNV densities at ECA and MRCA across a cohort.

Input

plotLachesis(lachesis = NULL, lach.suppress.outliers = FALSE, lach.log.densities = FALSE, lach.col.multi = "#176A02", lach.border = NULL, binwidth = NULL, lach.col.zero = "#4FB12B", output.file = NULL)

Example

lachesis_input <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")

lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)
LACHESIS::plotLachesis(lachesis)

Figure 6: Cumulative distribution of SSNV densities in ECA (dark green) and MRCA (light green)

This graph provides a cohort-level overview, illustrating two late-MRCA (SNVs per Mb > 0.05) and one early-MRCA sample. In the late-MRCA cases, an early ECA was detected, indicating a longer evolution rather than a later onset.

Plotting Clinical Correlations (plotClinicalCorrelations)

This function takes SNV densities as input and correlates them with clinical data specified by the user, such as age at diagnosis, survival data etc.

Input

plotClinicalCorrelations(lachesis = NULL, clin.par = "Age", clin.suppress.outliers = FALSE, clin.log.densities = FALSE, output.file = NULL)

Example

input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)

LACHESIS::plotClinicalCorrelations(lachesis)

Figure 7: Correlation between age at diagnosis and SNV densities at ECA and MRCA

The cohort size in this case is too small to draw reliable conclusions from these correlation plots. However, in larger neuroblastoma cohorts (Körber et al.), a correlation between patient age and MRCA timing has been observed.

Plotting Survival (plotSurvival)

The function plotSurvival plots correlation between SNV density timing at MRCA with Survival

Input

plotSurvival(lachesis = NULL, mrca.cutpoint = NULL, output.dir = NULL, surv.time = "OS.time", surv.event = "OS", surv.palette = c("dodgerblue", "dodgerblue4"), surv.time.breaks = NULL, surv.time.scale = 1, surv.title = "Survival probability", surv.ylab = "Survival")

Example

Figure 8: Correlation between Survival and SNV densities at ECA and MRCA

The timing of the MRCA serves as a robust predictor of event-free survival. Tumors with an early MRCA are linked to favorable overall survival outcomes, while late-MRCA tumors are more frequently associated with relapse.

Classifying Tumor Evolution (classifyLACHESIS)

The function classifyLACHESIS classifies a tumor’s start of clonal outgrowth during tumorigenesis as “early” or “late” (favorable/ unfavorable prognosis) depending on the mutation density at its MRCA.

Input

classifyLACHESIS(lachesis, mrca.cutpoint = NULL, output.dir = NULL, entity = "neuroblastoma", infer.cutpoint = FALSE, surv.time = "OS.time", surv.event = "OS")

Example

input.files <- system.file("extdata", "Sample_template.txt", package = "LACHESIS")
lachesis <- LACHESIS::LACHESIS(input.files = lachesis_input)

LACHESIS::classifyLACHESIS(lachesis, entity = "neuroblastoma")

## Key: <Sample_ID, MRCA_time_mean, MRCA_time_lower, MRCA_time_upper, ECA_time_mean, ECA_time_lower, ECA_time_upper, Ploidy, Purity, Age, OS.time, OS, EFS.time, EFS>
##    Sample_ID MRCA_time_mean MRCA_time_lower MRCA_time_upper ECA_time_mean
##       <char>          <num>           <num>           <num>         <num>
## 1:     NBE11    0.450628573     0.425641969      0.48222703   0.012440280
## 2:     NBE15    0.223850144     0.206502076      0.24534514   0.006135603
## 3:     NBE26    0.008700811     0.007168898      0.01095501           NaN
##    ECA_time_lower ECA_time_upper Ploidy Purity   Age OS.time    OS EFS.time
##             <num>          <num>  <num>  <num> <int>   <int> <int>    <int>
## 1:    0.007287815     0.01626279   2.59   0.83  4094    6706     0      240
## 2:    0.002424622     0.01115538   2.51   1.00  2143    2407     0      330
## 3:             NA             NA   3.25   0.88    50    5117     0     5117
##      EFS MRCA_timing
##    <int>      <fctr>
## 1:     1        late
## 2:     1        late
## 3:     0       early

Quality control/ Filtering

Hypermutants

To exclude hypermutant cases, the tumor mutational burden (TMB) should be below 10 mut/Mb (Campbell et al).

Ploidy and purity

A minimum tumor purity of 30% is recommended for accurate estimation of variant numbers. Additionally, some pipelines (e.g., AceSeq) may assign a default purity of 1 when it cannot be reliably estimated. Such cases should be excluded from analysis.

To ensure valid results, tumor ploidy should be below the max.cn parameter (e.g., readcnv). LACHESIS default value is max.cn = 4, therefore we recommend excluding samples with ploidy >4.

VAF distribution

The user should verify that the measured clonal peaks align with the expected variant allele frequencies (VAFs), indicated by the dashed lines in the stratified VAF histograms. If the observed VAFs do not match the expected patterns, an alternative purity and ploidy solution if provided by the CNV calling pipeline should be used. Otherwise the sample should not be considered for further analysis.

Positive Example

Figure 9: Clonal peaks (i.e. ->) correlate with expected VAFs (—-)

Negative Example

Figure 10: Clonal peaks cannot be clearly identified, a load of noise (i.e. ->) between the expected VAFs (—-)

Warning Messages

readCNV

see Importing Copy Number Information for details

warning("Removing x segments with start > end")

This warning indicates that there are rows inside the provided cnv file where the end position is less than the start position. Corresponding genomic segments will be removed from the data.

warning("Less than 10% of the genome with valid copy number information.")

The total length of all valid genomic segments in the cnv file is less than 300 million base pairs (3*10^8 bp). The function assumes a default human genome size of 3 × 10⁹ bp.

warning("No *x* identifier provided, assuming *y*")

The user should specify the column names or index for the cnv file (cnv.x.col/ x.col). If no identifiers are provided, the function will attempt to identify the relevant columns using standard nomenclature. Please verify that the expected data for x is located in column y.

warning("Allele information is not provided and will be assumed 1:1 in disomic regions, 2:1 in trisomic regions, 2:2 in tetrasomic regions, ... .")

This warning appears when no major allele column (cnv.A.col/ A.col) is specified or cannot be inferred using standard nomenclature. LACHESIS will assume that the total copy number arose through the minimal number of genomic changes. However, this assumption may lead to incorrect results in some cases. To ensure accuracy, we strongly recommend providing allele-specific copy number information.

nbImport

see Assigning copy number states to single nucleotide variants for details

warning("Removed x variants with no copy number overlaps")

After integrating overlapping snv and cnv data, x variants with empty cnv start column were removed.

MRCA

see Estimating Mutation Densities at ECA/MRCA for details

warning(sum(workObj$MRCA_qual == "FAIL"), " segments did not conform to the mutation density at MRCA.")

LACHESIS

see LACHESIS Wrapper Function for details

warning("No sample name provided for samples x; sample name was set to 1 - y")

This warning message specifies samples where no ID was provided, the total number of missing IDs (y) was calculated and IDs were specified as numeric values (1, 2, 3, …, y).

warning("No CNV/ SNV file provided for sample(s); sample(s) will be excluded")

The cnv/ snv file is missing for the specified sample, therefor it was excluded from the analysis. Please refer to Input Data for details on data requirements.

warning("No output directory specified. Per-sample output will be discarded.")

If the output directory (output.dir) was not specified, LACHESIS Output will not be displayed or saved.

warning("Insufficient data for sample ", x$ID)

This warning appears if the datatable generated at nbImport is empty.

warning("No column identifier  provided for samples x; column name will be inferred")

The parameter vcf.tumor.ids should be specified by the user, it will be passed to readVCF as t.sample, indicating the tumor sample name.

warning("Too few segments to estimate MRCA density for sample x.")

The datatable generated at normalizeCounts only has one row, no reliable estimation of MRCA density is possible.

plotLachesis

see Plotting Cohort Analysis for details

warning("Cannot produce summary statistics for a single case. Returning null.")

This warning messages indicates that the data for a single patient was specified to LACHESIS and therefore cumulative distributions of SSNV densities at ECA and MRCA across a cohort can not be plotted.

plotLachesis, plotClinicalCorrelations, plotSurvival

see Plotting Cohort Analysis, Plotting Clincal Correlations, Plotting Survival for details

warning("Removing x samples with missing MRCA density estimate.")

The column MRCA_time_mean of LACHESIS wrapper function output for the specified samples is empty; these samples will not be considered for further analysis.

warning("No sample with MRCA density estimate provided. Returning zero.")

LACHESIS wrapper function output is empty; no further analysis will be performed

plotSurvival

see Plotting Survival for details

warning("Removing x samples with missing survival time.")

The column named according to parameter surv.time of LACHESIS wrapper function output for specified samples is empty; these samples will not be considered for further analysis.

warning("No survival events in cohort Returning zero.")

The column named according to parameter surv.event of LACHESIS wrapper function is empty; no further analysis will be performed.

How To Get Help

In case of any questions, feel free to open an issue on Github.

Session Info

sessionInfo()

## R Under development (unstable) (2025-10-20 r88955)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bigmemory_4.6.4     Biobase_2.71.0      BiocGenerics_0.57.0
## [4] generics_0.1.4      knitr_1.50          BiocStyle_2.39.0   
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3                jsonlite_2.0.0                   
##   [3] magrittr_2.0.4                    magick_2.9.0                     
##   [5] farver_2.1.2                      rmarkdown_2.30                   
##   [7] BiocIO_1.21.0                     vctrs_0.6.5                      
##   [9] Cairo_1.7-0                       Rsamtools_2.27.0                 
##  [11] RCurl_1.98-1.17                   rstatix_0.7.3                    
##  [13] tinytex_0.57                      htmltools_0.5.8.1                
##  [15] S4Arrays_1.11.0                   curl_7.0.0                       
##  [17] broom_1.0.10                      SparseArray_1.11.1               
##  [19] Formula_1.2-5                     pracma_2.4.6                     
##  [21] sass_0.4.10                       bslib_0.9.0                      
##  [23] plyr_1.8.9                        zoo_1.8-14                       
##  [25] cachem_1.1.0                      uuid_1.2-1                       
##  [27] commonmark_2.0.0                  GenomicAlignments_1.47.0         
##  [29] lifecycle_1.0.4                   iterators_1.0.14                 
##  [31] pkgconfig_2.0.3                   Matrix_1.7-4                     
##  [33] R6_2.6.1                          fastmap_1.2.0                    
##  [35] MatrixGenerics_1.23.0             digest_0.6.37                    
##  [37] colorspace_2.1-2                  S4Vectors_0.49.0                 
##  [39] MutationalPatterns_3.21.0         GenomicRanges_1.63.0             
##  [41] ggpubr_0.6.2                      vegan_2.7-2                      
##  [43] labeling_0.4.3                    km.ci_0.5-6                      
##  [45] httr_1.4.7                        abind_1.4-8                      
##  [47] mgcv_1.9-4                        compiler_4.6.0                   
##  [49] rngtools_1.5.2                    withr_3.0.2                      
##  [51] doParallel_1.0.17                 S7_0.2.0                         
##  [53] backports_1.5.0                   BiocParallel_1.45.0              
##  [55] carData_3.0-5                     R.utils_2.13.0                   
##  [57] ggsignif_0.6.4                    MASS_7.3-65                      
##  [59] DelayedArray_0.37.0               rjson_0.2.23                     
##  [61] BSgenome.Hsapiens.UCSC.hg19_1.4.3 permute_0.9-8                    
##  [63] tools_4.6.0                       ape_5.8-1                        
##  [65] R.oo_1.27.1                       glue_1.8.0                       
##  [67] restfulr_0.0.16                   nlme_3.1-168                     
##  [69] gridtext_0.1.5                    grid_4.6.0                       
##  [71] gridBase_0.4-7                    cluster_2.1.8.1                  
##  [73] reshape2_1.4.4                    memuse_4.2-3                     
##  [75] gtable_0.3.6                      BSgenome_1.79.1                  
##  [77] KMsurv_0.1-6                      R.methodsS3_1.8.2                
##  [79] tidyr_1.3.1                       survminer_0.5.1                  
##  [81] pinfsc50_1.3.0                    data.table_1.17.8                
##  [83] xml2_1.4.1                        car_3.1-3                        
##  [85] XVector_0.51.0                    markdown_2.0                     
##  [87] foreach_1.5.2                     pillar_1.11.1                    
##  [89] stringr_1.6.0                     splines_4.6.0                    
##  [91] ggtext_0.1.2                      dplyr_1.1.4                      
##  [93] lattice_0.22-7                    survival_3.8-3                   
##  [95] rtracklayer_1.71.0                tidyselect_1.2.1                 
##  [97] registry_0.5-1                    Biostrings_2.79.2                
##  [99] bigmemory.sri_0.1.8               gridExtra_2.3                    
## [101] litedown_0.8                      bookdown_0.45                    
## [103] IRanges_2.45.0                    Seqinfo_1.1.0                    
## [105] SummarizedExperiment_1.41.0       stats4_4.6.0                     
## [107] xfun_0.54                         NMF_0.28                         
## [109] matrixStats_1.5.0                 UCSC.utils_1.7.0                 
## [111] stringi_1.8.7                     yaml_2.3.10                      
## [113] evaluate_1.0.5                    codetools_0.2-20                 
## [115] cigarillo_1.1.0                   tibble_3.3.0                     
## [117] BiocManager_1.30.26               cli_3.6.5                        
## [119] xtable_1.8-4                      jquerylib_0.1.4                  
## [121] survMisc_0.5.6                    GenomeInfoDb_1.47.0              
## [123] dichromat_2.0-0.1                 Rcpp_1.1.0                       
## [125] vcfR_1.15.0                       XML_3.99-0.20                    
## [127] parallel_4.6.0                    ggplot2_4.0.0                    
## [129] ggalluvial_0.12.5                 bitops_1.0-9                     
## [131] viridisLite_0.4.2                 scales_1.4.0                     
## [133] LACHESIS_0.99.4                   purrr_1.2.0                      
## [135] crayon_1.5.3                      rlang_1.1.6