xcms 3.16.1
Package: xcms
Authors: Johannes Rainer
Modified: 2021-10-26 14:51:41
Compiled: Sun Nov 21 08:13:29 2021
In a typical LC-MS-based metabolomics experiment compounds eluting from the
chromatography are first ionized before being measured by mass spectrometry
(MS). During the ionization different (multiple) ions can be generated from the
same compound which all will be measured by MS. In general, the resulting data
is then pre-processed to identify chromatographic peaks in the data and to group
these across samples in the correspondence analysis. The result are distinct
LC-MS features, characterized by their specific m/z and retention time
range. Different ions generated during ionization will be detected as different
features. Compounding aims now at grouping such features presumably
representing signal from the same originating compound to reduce data set
complexity (and to aid in subsequent annotation steps). General MS feature
grouping functionality if defined by the MsFeatures package with
additional functionality being implemented in the xcms package to enable the
compounding of LC-MS data.
This document provides a simple compounding workflow using xcms. Note that the
present functionality does not (yet) aggregate or combine the actual features
per values, but does only define the feature groups (one per compound).
We demonstrate the compounding (feature grouping) functionality on the simple
toy data set used also in the xcms package and provided through
the faahKO package. This data set consists of samples from 4 mice with
knock-out of the fatty acid amide hydrolase (FAAH) and 4 wild type
mice. Pre-processing of this data set is described in detail in the xcms
vignette of the xcms package. Below we load all required packages and the
result from this pre-processing updating also the location of the respective raw
data files on the current machine.
library(xcms)
library(faahKO)
library(MsFeatures)
data("xdata")
## Update the path to the files for the local system
dirname(xdata) <- c(rep(system.file("cdf", "KO", package = "faahKO"), 4),
                    rep(system.file("cdf", "WT", package = "faahKO"), 4))Before performing the feature grouping we inspect the result object. With
featureDefinitions we can extract the results from the correspondence
analysis.
featureDefinitions(xdata)## DataFrame with 225 rows and 11 columns
##           mzmed     mzmin     mzmax     rtmed     rtmin     rtmax    npeaks
##       <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001     200.1     200.1     200.1   2901.63   2880.73   2922.53         2
## FT002     205.0     205.0     205.0   2789.39   2782.30   2795.36         8
## FT003     206.0     206.0     206.0   2788.73   2780.73   2792.86         7
## FT004     207.1     207.1     207.1   2718.12   2713.21   2726.70         7
## FT005     219.1     219.1     219.1   2518.82   2517.40   2520.81         3
## ...         ...       ...       ...       ...       ...       ...       ...
## FT221    591.30     591.3     591.3   3005.03   2992.87   3006.05         5
## FT222    592.15     592.1     592.3   3022.11   2981.91   3107.59         6
## FT223    594.20     594.2     594.2   3418.16   3359.10   3427.90         3
## FT224    595.25     595.2     595.3   3010.15   2992.87   3013.77         6
## FT225    596.20     596.2     596.2   2997.91   2992.87   3002.95         2
##              KO        WT            peakidx  ms_level
##       <numeric> <numeric>             <list> <integer>
## FT001         2         0  287, 679,1577,...         1
## FT002         4         4     47,272,542,...         1
## FT003         3         4     32,259,663,...         1
## FT004         4         3     19,249,525,...         1
## FT005         1         2  639, 788,1376,...         1
## ...         ...       ...                ...       ...
## FT221         2         3    349,684,880,...         1
## FT222         1         3     86,861,862,...         1
## FT223         1         2  604, 985,1543,...         1
## FT224         2         3     67,353,876,...         1
## FT225         0         2  866,1447,1643,...         1Each row in this data frame represents the definition of one feature, with its
average and range of m/z and retention time. Column "peakidx" provides the
index of each chromatographic peak which is assigned to the feature in the
chromPeaks matrix of the result object. The featureValues function allows to
extract feature values, i.e. a matrix with feature abundances, one row per
feature and columns representing the samples of the present data set.
Below we extract the feature values with and without filled-in peak data. Without the gap-filled data only abundances from detected chromatographic peaks are reported. In the gap-filled data, for samples in which no chromatographic peak for a feature was detected, all signal from the m/z - retention time range defined based on the detected chromatographic peaks was integrated.
head(featureValues(xdata, filled = FALSE))##        ko15.CDF  ko16.CDF  ko21.CDF  ko22.CDF  wt15.CDF  wt16.CDF  wt21.CDF
## FT001        NA  506848.9        NA  169955.6        NA        NA        NA
## FT002 1924712.0 1757151.0 1383416.7 1180288.2 2129885.1 1634342.0 1623589.2
## FT003  213659.3  289500.7        NA  178285.7  253825.6  241844.4  240606.0
## FT004  349011.5  451863.7  343897.8  208002.8  364609.8  360908.9        NA
## FT005        NA        NA        NA  107348.5  223951.8        NA        NA
## FT006  286221.4        NA  164009.0  149097.6  255697.7  311296.8  366441.5
##         wt22.CDF
## FT001         NA
## FT002 1354004.93
## FT003  185399.47
## FT004  221937.53
## FT005   84772.92
## FT006  271128.02head(featureValues(xdata, filled = TRUE))##        ko15.CDF   ko16.CDF   ko21.CDF  ko22.CDF  wt15.CDF  wt16.CDF  wt21.CDF
## FT001  159738.1  506848.88  113441.08  169955.6  216096.6  145509.7  230477.9
## FT002 1924712.0 1757150.96 1383416.72 1180288.2 2129885.1 1634342.0 1623589.2
## FT003  213659.3  289500.67  162897.19  178285.7  253825.6  241844.4  240606.0
## FT004  349011.5  451863.66  343897.76  208002.8  364609.8  360908.9  223322.5
## FT005  135978.5   25524.79   71530.84  107348.5  223951.8  134398.9  190203.8
## FT006  286221.4  289908.23  164008.97  149097.6  255697.7  311296.8  366441.5
##         wt22.CDF
## FT001  140551.30
## FT002 1354004.93
## FT003  185399.47
## FT004  221937.53
## FT005   84772.92
## FT006  271128.02In total 225 features have been defined in the present data set, many of which most likely represent signal from different ions (adducts or isotopes) of the same compound. The aim of the grouping functions of are now to define which features most likely come from the same original compound. The feature grouping functions base on the following assumptions/properties of LC-MS data:
The main method to perform the feature grouping is called groupFeatures which
takes an XCMSnExp object (result object from the xcms pre-processing) as
input as well as a parameter object to chose the grouping algorithm and specify
its settings. xcms provides and supports the following grouping approaches:
SimilarRtimeParam: perform an initial grouping based on similar retention
time.AbundanceSimilarityParam: perform a feature grouping based on correlation
of feature abundances (values) across samples.EicSimilarityParam: perform a feature grouping based on correlation of
EICs.Calling groupFeatures on an xcms result object will perform a feature
grouping assigning each feature in the data set to a feature group. These
feature groups are stored as an additional column called "feature_group" in
the featureDefinition data frame of the result object and can be accessed with
the featureGroups function. Any subsequent groupFeature call will
sub-group (refine) the identified feature groups further. It is thus possible
to use a single grouping approach, or to combine multiple of them to generate
the desired feature grouping. While the individual feature grouping algorithms
can be called in any order, it is advisable to use the EicSimilarityParam as
last refinement step, because it is the computationally most expensive one,
especially if applied to a result object without any pre-defined feature groups
or if the feature groups are very large. In the subsequent sections we will
apply the various feature grouping approaches subsequently.
Note also that we perform here a grouping of all defined features, but it would also be possible to just group a subset of interesting features (e.g. features found significant by a statistical analysis of the data set). This is described in the last section of this vignette.
The most intuitive and simple way to group features is based on their retention time. Before we perform this initial grouping we evaluate retention times and m/z of all features in the present data set.
plot(featureDefinitions(xdata)$rtmed, featureDefinitions(xdata)$mzmed,
     xlab = "retention time", ylab = "m/z", main = "features",
     col = "#00000080")
grid()
Figure 1: Plot of retention times and m/z for all features in the data set
Several features with about the same retention time (but different m/z) can be seen, especially at the beginning of the LC. We thus below group features within a retention time window of 10 seconds into feature groups.
xdata <- groupFeatures(xdata, param = SimilarRtimeParam(10))The results from the feature grouping can be accessed with the featureGroups
function. Below we determine the size of each of these feature groups (i.e. how
many features are grouped together).
table(featureGroups(xdata))## 
## FG.001 FG.002 FG.003 FG.004 FG.005 FG.006 FG.007 FG.008 FG.009 FG.010 FG.011 
##      3      3      3      3      2      4      5      6      4      2      5 
## FG.012 FG.013 FG.014 FG.015 FG.016 FG.017 FG.018 FG.019 FG.020 FG.021 FG.022 
##      3      4      3      5      3      3      5      3      3      3      3 
## FG.023 FG.024 FG.025 FG.026 FG.027 FG.028 FG.029 FG.030 FG.031 FG.032 FG.033 
##      3      3      6      3      3      3      3      2      3      3      4 
## FG.034 FG.035 FG.036 FG.037 FG.038 FG.039 FG.040 FG.041 FG.042 FG.043 FG.044 
##      3      2      2      3      2      2      4      2      2      2      3 
## FG.045 FG.046 FG.047 FG.048 FG.049 FG.050 FG.051 FG.052 FG.053 FG.054 FG.055 
##      4      2      3      3      3      2      2      3      4      2      3 
## FG.056 FG.057 FG.058 FG.059 FG.060 FG.061 FG.062 FG.063 FG.064 FG.065 FG.066 
##      2      2      2      3      2      3      2      2      2      3      2 
## FG.067 FG.068 FG.069 FG.070 FG.071 FG.072 FG.073 FG.074 FG.075 FG.076 FG.077 
##      2      3      2      2      2      3      2      2      1      1      1 
## FG.078 FG.079 FG.080 FG.081 FG.082 FG.083 FG.084 
##      1      1      1      1      1      1      1In addition we visualize these feature groups with the plotFeatureGroups
function which shows all features in the m/z - retention time space with grouped
features being connected with a line.
plotFeatureGroups(xdata)
grid()
Figure 2: Feature groups defined with a rt window of 10 seconds
Let’s assume we don’t agree with this feature grouping, also knowing that there
were quite large shifts in retention times between runs. We thus re-perform the
feature grouping based on similar retention time with a larger rt window. Prior
to the groupFeatures call we have however to drop the previously defined
feature groups as otherwise these would be simply refined (i.e. further
subgrouped).
## Remove previous feature grouping results to repeat the rtime-based
## feature grouping with different setting
featureDefinitions(xdata)$feature_group <- NULL
## Repeat the grouping
xdata <- groupFeatures(xdata, SimilarRtimeParam(20))
table(featureGroups(xdata))## 
## FG.001 FG.002 FG.003 FG.004 FG.005 FG.006 FG.007 FG.008 FG.009 FG.010 FG.011 
##      3      3      3      3      2      4      5      6      4      2      5 
## FG.012 FG.013 FG.014 FG.015 FG.016 FG.017 FG.018 FG.019 FG.020 FG.021 FG.022 
##      3      4      3      5      3      3      5      3      3      3      3 
## FG.023 FG.024 FG.025 FG.026 FG.027 FG.028 FG.029 FG.030 FG.031 FG.032 FG.033 
##      3      3      6      3      3      3      4      2      3      3      4 
## FG.034 FG.035 FG.036 FG.037 FG.038 FG.039 FG.040 FG.041 FG.042 FG.043 FG.044 
##      3      2      3      3      2      2      4      3      2      2      4 
## FG.045 FG.046 FG.047 FG.048 FG.049 FG.050 FG.051 FG.052 FG.053 FG.054 FG.055 
##      4      2      3      3      3      2      2      3      4      2      3 
## FG.056 FG.057 FG.058 FG.059 FG.060 FG.061 FG.062 FG.063 FG.064 FG.065 FG.066 
##      3      2      2      3      3      3      2      2      2      3      2 
## FG.067 FG.068 FG.069 FG.070 FG.071 FG.072 FG.073 FG.074 FG.075 FG.076 FG.077 
##      2      3      2      2      3      3      2      2      1      1      1plotFeatureGroups(xdata)
grid()
Figure 3: Feature groups defined with a rt window of 20 seconds
Grouping by similar retention time grouped the in total 225 features into 77 feature groups.
Assuming we are OK with the crude initial feature grouping from the previous
section, we can next refine the feature groups considering also the feature
abundances across samples. We can use the groupFeatures method with an
AbundanceSimilarityParam object. This approach performs a pairwise
correlation between the feature values (abundances; across samples) between all
features of a predefined feature group (such as defined in the previous
section). Features that have a correlation >= threshold are grouped
together. Feature grouping based on this approach works best for features with a
higher variability in their concentration across samples. Parameter subset
allows to restrict the analysis to a subset of samples and allows thus to
e.g. exclude QC sample pools from this correlation as these could artificially
increase the correlation. Other parameters are passed directly to the internal
featureValues call that extracts the feature values on which the correlation
should be performed.
Before performing the grouping we could also evaluate the correlation of features based on their (log2 transformed) abundances across samples with a heatmap.
library(pheatmap)
fvals <- log2(featureValues(xdata, filled = TRUE))
cormat <- cor(t(fvals), use = "pairwise.complete.obs")
ann <- data.frame(fgroup = featureGroups(xdata))
rownames(ann) <- rownames(cormat)
res <- pheatmap(cormat, annotation_row = ann, cluster_rows = TRUE,
                cluster_cols = TRUE)
Figure 4: Correlation of features based on feature abundances
Some large correlations can be observed for several groups of features, but many of them are not within the same feature group that were defined in the previous section (i.e. are not eluting at the same time).
Below we use the groupFeatures with the AbundanceSimilarityParam to group
features with a correlation higher than 0.7 including both detected and
filled-in signal. Whether filled-in or only detected signal should be used in
the correlation analysis should be evaluated from data set to data set. By
specifying transform = log2 we tell the function to log2 transform the
abundance prior to the correlation analysis. See the help page for
groupFeatures with AbundanceSimilarityParam in the xcms package for
details and options.
xdata <- groupFeatures(xdata, AbundanceSimilarityParam(threshold = 0.7,
                                                       transform = log2),
                       filled = TRUE)
table(featureGroups(xdata))## 
## FG.001.001 FG.001.002 FG.002.001 FG.002.002 FG.002.003 FG.003.001 FG.003.002 
##          2          1          1          1          1          2          1 
## FG.004.001 FG.005.001 FG.006.001 FG.006.002 FG.007.001 FG.007.002 FG.007.003 
##          3          2          3          1          2          1          1 
## FG.007.004 FG.008.001 FG.008.002 FG.008.003 FG.008.004 FG.009.001 FG.009.002 
##          1          3          1          1          1          3          1 
## FG.010.001 FG.011.001 FG.011.002 FG.011.003 FG.012.001 FG.013.001 FG.013.002 
##          2          2          2          1          3          2          1 
## FG.013.003 FG.014.001 FG.014.002 FG.015.001 FG.015.002 FG.015.003 FG.015.004 
##          1          2          1          2          1          1          1 
## FG.016.001 FG.016.002 FG.017.001 FG.017.002 FG.018.001 FG.018.002 FG.018.003 
##          2          1          2          1          2          1          1 
## FG.018.004 FG.019.001 FG.019.002 FG.019.003 FG.020.001 FG.020.002 FG.021.001 
##          1          1          1          1          2          1          1 
## FG.021.002 FG.021.003 FG.022.001 FG.022.002 FG.022.003 FG.023.001 FG.023.002 
##          1          1          1          1          1          2          1 
## FG.024.001 FG.024.002 FG.024.003 FG.025.001 FG.025.002 FG.025.003 FG.025.004 
##          1          1          1          2          1          1          1 
## FG.025.005 FG.026.001 FG.026.002 FG.026.003 FG.027.001 FG.027.002 FG.027.003 
##          1          1          1          1          1          1          1 
## FG.028.001 FG.028.002 FG.029.001 FG.029.002 FG.030.001 FG.030.002 FG.031.001 
##          2          1          2          2          1          1          3 
## FG.032.001 FG.032.002 FG.032.003 FG.033.001 FG.033.002 FG.033.003 FG.033.004 
##          1          1          1          1          1          1          1 
## FG.034.001 FG.034.002 FG.034.003 FG.035.001 FG.035.002 FG.036.001 FG.036.002 
##          1          1          1          1          1          2          1 
## FG.037.001 FG.037.002 FG.038.001 FG.039.001 FG.040.001 FG.040.002 FG.041.001 
##          2          1          2          2          2          2          1 
## FG.041.002 FG.041.003 FG.042.001 FG.042.002 FG.043.001 FG.044.001 FG.044.002 
##          1          1          1          1          2          2          1 
## FG.044.003 FG.045.001 FG.045.002 FG.045.003 FG.045.004 FG.046.001 FG.047.001 
##          1          1          1          1          1          2          1 
## FG.047.002 FG.047.003 FG.048.001 FG.048.002 FG.048.003 FG.049.001 FG.049.002 
##          1          1          1          1          1          2          1 
## FG.050.001 FG.050.002 FG.051.001 FG.052.001 FG.052.002 FG.053.001 FG.053.002 
##          1          1          2          2          1          2          1 
## FG.053.003 FG.054.001 FG.054.002 FG.055.001 FG.055.002 FG.055.003 FG.056.001 
##          1          1          1          1          1          1          1 
## FG.056.002 FG.056.003 FG.057.001 FG.058.001 FG.058.002 FG.059.001 FG.059.002 
##          1          1          2          1          1          1          1 
## FG.059.003 FG.060.001 FG.060.002 FG.061.001 FG.061.002 FG.062.001 FG.062.002 
##          1          2          1          2          1          1          1 
## FG.063.001 FG.064.001 FG.064.002 FG.065.001 FG.065.002 FG.066.001 FG.066.002 
##          2          1          1          2          1          1          1 
## FG.067.001 FG.067.002 FG.068.001 FG.069.001 FG.070.001 FG.070.002 FG.071.001 
##          1          1          3          2          1          1          2 
## FG.071.002 FG.072.001 FG.072.002 FG.073.001 FG.073.002 FG.074.001 FG.075.001 
##          1          2          1          1          1          2          1 
## FG.076.001 FG.077.001 
##          1          1Many of the larger retention time-based feature groups have been splitted into
two or more sub-groups based on the correlation of their feature abundances. We
evaluate this for one specific feature group "FG.040" by plotting their
pairwise correlation.
fts <- grep("FG.040", featureGroups(xdata))
pairs(t(fvals[fts, ]), gap = 0.1, main = "FG.040")
Figure 5: Pairwise correlation plot for all features initially grouped into the feature group FG.040
Indeed, correlations can be seen only between some of the features in this retention time feature group, e.g. between FT117 and FT120 and between FT195 and FT200. Note however that this abundance correlation suffers from relatively few samples (8 in total), and a relatively small variance in abundances across these samples.
After feature grouping by abundance correlation, the 225 features have been grouped into 170 feature groups.
The chromatographic peak shape of an ion of a compound should be highly similar
to the elution pattern of this compound. Thus, features from the same compound
are assumed to have similar peak shapes of their EICs within the same
sample. A grouping of features based on similarity of their EICs can be
performed with the groupFeatures and the EicSimilarityParam object. It is
advisable to perform the peak shape correlation only on a subset of samples
(because peak shape correlation is computationally intense and because
chromatographic peaks of low intensity features are notoriously noisy). The
EicSimilarityParam approach has thus the parameter n which allows to select
the number of top samples (ordered by total intensity of feature abundances per
feature group) on which the correlation should be performed. With an value of n = 3, the 3 samples with the highest signal for all features in that group will
be first identified for each feature group and then a pairwise similarity
calculation will be performed within each of these samples. The resulting
similarity score from these 3 samples will then be aggregated into a single
score by taking the 75% quantile across the 3 samples. This value is then
subsequently compared with the cut-off for similarity (parameter threshold)
and features with a score >= threshold are grouped into the same feature
group.
Below we group the features based on similarity of their EICs in the two samples
with the highest total signal for the respective feature groups. By default, a
Pearson correlation coefficient is used as similarity score but any
similarity/distance metric function could be used instead (parameter FUN of
the EicSimilarityParam - see the respective help page ?EicSimilarityParam
for details and options). We define as a threshold a correlation coefficient of
0.7.
xdata <- groupFeatures(xdata, EicSimilarityParam(threshold = 0.7, n = 2))This is the most computationally intense approach since it involves also loading the raw MS data to extract the ion chromatograms for each feature. The results of the grouping are shown below.
table(featureGroups(xdata))## 
## FG.001.001.001 FG.001.002.001 FG.002.001.001 FG.002.002.001 FG.002.003.001 
##              2              1              1              1              1 
## FG.003.001.001 FG.003.002.001 FG.004.001.001 FG.005.001.001 FG.006.001.001 
##              2              1              3              2              3 
## FG.006.002.001 FG.007.001.001 FG.007.002.001 FG.007.003.001 FG.007.004.001 
##              1              2              1              1              1 
## FG.008.001.001 FG.008.001.002 FG.008.002.001 FG.008.003.001 FG.008.004.001 
##              2              1              1              1              1 
## FG.009.001.001 FG.009.002.001 FG.010.001.001 FG.010.001.002 FG.011.001.001 
##              3              1              1              1              2 
## FG.011.002.001 FG.011.003.001 FG.012.001.001 FG.013.001.001 FG.013.002.001 
##              2              1              3              2              1 
## FG.013.003.001 FG.014.001.001 FG.014.002.001 FG.015.001.001 FG.015.002.001 
##              1              2              1              2              1 
## FG.015.003.001 FG.015.004.001 FG.016.001.001 FG.016.002.001 FG.017.001.001 
##              1              1              2              1              2 
## FG.017.002.001 FG.018.001.001 FG.018.002.001 FG.018.003.001 FG.018.004.001 
##              1              2              1              1              1 
## FG.019.001.001 FG.019.002.001 FG.019.003.001 FG.020.001.001 FG.020.001.002 
##              1              1              1              1              1 
## FG.020.002.001 FG.021.001.001 FG.021.002.001 FG.021.003.001 FG.022.001.001 
##              1              1              1              1              1 
## FG.022.002.001 FG.022.003.001 FG.023.001.001 FG.023.002.001 FG.024.001.001 
##              1              1              2              1              1 
## FG.024.002.001 FG.024.003.001 FG.025.001.001 FG.025.001.002 FG.025.002.001 
##              1              1              1              1              1 
## FG.025.003.001 FG.025.004.001 FG.025.005.001 FG.026.001.001 FG.026.002.001 
##              1              1              1              1              1 
## FG.026.003.001 FG.027.001.001 FG.027.002.001 FG.027.003.001 FG.028.001.001 
##              1              1              1              1              2 
## FG.028.002.001 FG.029.001.001 FG.029.002.001 FG.029.002.002 FG.030.001.001 
##              1              2              1              1              1 
## FG.030.002.001 FG.031.001.001 FG.032.001.001 FG.032.002.001 FG.032.003.001 
##              1              3              1              1              1 
## FG.033.001.001 FG.033.002.001 FG.033.003.001 FG.033.004.001 FG.034.001.001 
##              1              1              1              1              1 
## FG.034.002.001 FG.034.003.001 FG.035.001.001 FG.035.002.001 FG.036.001.001 
##              1              1              1              1              1 
## FG.036.001.002 FG.036.002.001 FG.037.001.001 FG.037.002.001 FG.038.001.001 
##              1              1              2              1              2 
## FG.039.001.001 FG.039.001.002 FG.040.001.001 FG.040.002.001 FG.041.001.001 
##              1              1              2              2              1 
## FG.041.002.001 FG.041.003.001 FG.042.001.001 FG.042.002.001 FG.043.001.001 
##              1              1              1              1              2 
## FG.044.001.001 FG.044.001.002 FG.044.002.001 FG.044.003.001 FG.045.001.001 
##              1              1              1              1              1 
## FG.045.002.001 FG.045.003.001 FG.045.004.001 FG.046.001.001 FG.047.001.001 
##              1              1              1              2              1 
## FG.047.002.001 FG.047.003.001 FG.048.001.001 FG.048.002.001 FG.048.003.001 
##              1              1              1              1              1 
## FG.049.001.001 FG.049.002.001 FG.050.001.001 FG.050.002.001 FG.051.001.001 
##              2              1              1              1              2 
## FG.052.001.001 FG.052.002.001 FG.053.001.001 FG.053.001.002 FG.053.002.001 
##              2              1              1              1              1 
## FG.053.003.001 FG.054.001.001 FG.054.002.001 FG.055.001.001 FG.055.002.001 
##              1              1              1              1              1 
## FG.055.003.001 FG.056.001.001 FG.056.002.001 FG.056.003.001 FG.057.001.001 
##              1              1              1              1              2 
## FG.058.001.001 FG.058.002.001 FG.059.001.001 FG.059.002.001 FG.059.003.001 
##              1              1              1              1              1 
## FG.060.001.001 FG.060.001.002 FG.060.002.001 FG.061.001.001 FG.061.002.001 
##              1              1              1              2              1 
## FG.062.001.001 FG.062.002.001 FG.063.001.001 FG.064.001.001 FG.064.002.001 
##              1              1              2              1              1 
## FG.065.001.001 FG.065.002.001 FG.066.001.001 FG.066.002.001 FG.067.001.001 
##              2              1              1              1              1 
## FG.067.002.001 FG.068.001.001 FG.068.001.002 FG.069.001.001 FG.070.001.001 
##              1              2              1              2              1 
## FG.070.002.001 FG.071.001.001 FG.071.001.002 FG.071.002.001 FG.072.001.001 
##              1              1              1              1              1 
## FG.072.001.002 FG.072.002.001 FG.073.001.001 FG.073.002.001 FG.074.001.001 
##              1              1              1              1              2 
## FG.075.001.001 FG.076.001.001 FG.077.001.001 
##              1              1              1In most cases, pre-defined feature groups (by the abundance correlation) were
not further subdivided. Below we evaluate some of the feature groups, starting
with FG.008.001 which was split into two different feature groups based on EIC
correlation. We first extract the EICs for all features from this initial
feature group. With n = 1 we specify to extract the EIC only from the sample
with the highest intensity.
fts <- grep("FG.008.001", featureGroups(xdata))
eics <- featureChromatograms(xdata, features = fts,
                             filled = TRUE, n = 1)Next we plot the EICs using a different color for each of the subgroups. With
peakType = "none" we disable the highlighting of the detected chromatographic
peaks.
cols <- c("#ff000080", "#00ff0080")
names(cols) <- unique(featureGroups(xdata)[fts])
par(mfrow = c(1, 2))
plotChromatogramsOverlay(eics, col = cols[featureGroups(xdata)[fts]],
                         lwd = 2, peakType = "none")
plotChromatogramsOverlay(normalize(eics),
                         col = cols[featureGroups(xdata)[fts]],
                         lwd = 2, peakType = "none")
Figure 6: EICs of features from feature group FG.008.001 in the same sample
Shown are the actual intensities (left) and intensities normalized to 1 (right). Features being part of the same feature group after grouping by EIC similarity are shown in the same color.
One of the features within the original feature group was separated from the other two because of a low similarity of their EICs. In fact, the feature’s EIC is shifted in retention time dimension and can thus not represent the signal from the same compound.
We evaluate next the sub-grouping in another feature group.
fts <- grep("FG.068.001", featureGroups(xdata))
eics <- featureChromatograms(xdata, features = fts,
                             filled = TRUE, n = 1)Next we plot the EICs using a different color for each of the subgroups.
cols <- c("#ff000080", "#00ff0080")
names(cols) <- unique(featureGroups(xdata)[fts])
par(mfrow = c(1, 2))
plotChromatogramsOverlay(eics, col = cols[featureGroups(xdata)[fts]],
                         lwd = 2, peakType = "none")
plotChromatogramsOverlay(normalize(eics),
                         col = cols[featureGroups(xdata)[fts]],
                         lwd = 2, peakType = "none")
Figure 7: EICs for features from feature group FG.068.001 in the same sample
Shown are the actual intensities (left) and intensities normalized to 1 (right). Features being part of the same feature group after grouping by EIC similarity are shown in the same color.
Based on the EIC correlation, the initial feature group FG.068.001 was grouped into two separate sub-groups.
The grouping based on EIC correlation on the pre-defined feature groups from the previous sections grouped the 225 features into 183 feature groups.
In the previous sections we were always considering all features from the data
set, but sometimes it could be desirable to just group a pre-defined set of
features, for example features found to be of particular interest in a certain
experiment (e.g. significant features). This can be easily achieved by assigning
the features of interest to a initial feature group, using NA as group ID
for all other features.
To illustrate this we reset all feature groups by setting them to NA and
assign our features of interest (in this example just 30 randomly selected
features) to an initial feature group "FG".
featureDefinitions(xdata)$feature_group <- NA_character_
set.seed(123)
fts_idx <- sample(1:nrow(featureDefinitions(xdata)), 30)
featureDefinitions(xdata)$feature_group[fts_idx] <- "FG"Any call to groupFeatures would now simply sub-group this set of 30
features. Any feature which has an NA in the "feature_group" column will be
ignored.
xdata <- groupFeatures(xdata, SimilarRtimeParam(diffRt = 20))
xdata <- groupFeatures(xdata, AbundanceSimilarityParam(threshold = 0.7))
table(featureGroups(xdata))## 
## FG.001.001 FG.001.002 FG.002.001 FG.003.001 FG.003.002 FG.004.001 FG.004.002 
##          2          2          4          2          1          1          1 
## FG.004.003 FG.005.001 FG.005.002 FG.006.001 FG.006.002 FG.007.001 FG.008.001 
##          1          1          1          1          1          2          1 
## FG.009.001 FG.010.001 FG.011.001 FG.012.001 FG.013.001 FG.014.001 FG.015.001 
##          1          1          1          1          1          1          1 
## FG.016.001 FG.017.001 
##          1          1sessionInfo()## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] pheatmap_1.0.12     faahKO_1.34.0       MsFeatures_1.2.0   
##  [4] xcms_3.16.1         MSnbase_2.20.2      ProtGenerics_1.26.0
##  [7] S4Vectors_0.32.3    mzR_2.28.0          Rcpp_1.0.7         
## [10] Biobase_2.54.0      BiocGenerics_0.40.0 BiocParallel_1.28.1
## [13] BiocStyle_2.22.0   
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7                matrixStats_0.61.0         
##  [3] progress_1.2.2              doParallel_1.0.16          
##  [5] RColorBrewer_1.1-2          GenomeInfoDb_1.30.0        
##  [7] tools_4.1.2                 bslib_0.3.1                
##  [9] utf8_1.2.2                  R6_2.5.1                   
## [11] affyio_1.64.0               DBI_1.1.1                  
## [13] colorspace_2.0-2            prettyunits_1.1.1          
## [15] tidyselect_1.1.1            MassSpecWavelet_1.60.0     
## [17] compiler_4.1.2              preprocessCore_1.56.0      
## [19] DelayedArray_0.20.0         bookdown_0.24              
## [21] sass_0.4.0                  scales_1.1.1               
## [23] DEoptimR_1.0-9              robustbase_0.93-9          
## [25] affy_1.72.0                 stringr_1.4.0              
## [27] digest_0.6.28               rmarkdown_2.11             
## [29] XVector_0.34.0              pkgconfig_2.0.3            
## [31] htmltools_0.5.2             MatrixGenerics_1.6.0       
## [33] highr_0.9                   fastmap_1.1.0              
## [35] limma_3.50.0                rlang_0.4.12               
## [37] impute_1.68.0               farver_2.1.0               
## [39] jquerylib_0.1.4             generics_0.1.1             
## [41] jsonlite_1.7.2              mzID_1.32.0                
## [43] dplyr_1.0.7                 RCurl_1.98-1.5             
## [45] magrittr_2.0.1              GenomeInfoDbData_1.2.7     
## [47] MALDIquant_1.20             Matrix_1.3-4               
## [49] munsell_0.5.0               fansi_0.5.0                
## [51] MsCoreUtils_1.6.0           lifecycle_1.0.1            
## [53] vsn_3.62.0                  stringi_1.7.5              
## [55] yaml_2.2.1                  MASS_7.3-54                
## [57] SummarizedExperiment_1.24.0 zlibbioc_1.40.0            
## [59] plyr_1.8.6                  grid_4.1.2                 
## [61] parallel_4.1.2              crayon_1.4.2               
## [63] lattice_0.20-45             hms_1.1.1                  
## [65] magick_2.7.3                knitr_1.36                 
## [67] pillar_1.6.4                GenomicRanges_1.46.1       
## [69] codetools_0.2-18            XML_3.99-0.8               
## [71] glue_1.5.0                  evaluate_0.14              
## [73] pcaMethods_1.86.0           BiocManager_1.30.16        
## [75] vctrs_0.3.8                 foreach_1.5.1              
## [77] gtable_0.3.0                RANN_2.6.1                 
## [79] purrr_0.3.4                 clue_0.3-60                
## [81] assertthat_0.2.1            ggplot2_3.3.5              
## [83] xfun_0.28                   ncdf4_1.17.1               
## [85] tibble_3.1.6                iterators_1.0.13           
## [87] IRanges_2.28.0              cluster_2.1.2              
## [89] ellipsis_0.3.2