--- title: "Differential Network Analysis with multiDEGGs" author: "Elisabetta Sciacca, Myles Lewis" output: html_document: toc: true toc_float: collapsed: false toc_depth: 2 number_sections: false vignette: > %\VignetteIndexEntry{1. Differential Network Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The multiDEGGs package performs multi-omic differential network analysis by identifying differential interactions between molecular entities (genes, proteins, miRNAs, or other biomolecules) across the omic datasets provided. For each omic dataset, a differential network is constructed, where links represent statistically significant differential interactions between entities. These networks are then integrated into a comprehensive visualization using distinct colors to distinguish interactions from different omic layers. This unified visualization allows interactive exploration of cross-omic patterns (e.g., differential interactions present at both transcript and protein level). For each link, users can access differential statistical significance metrics (p-values or adjusted p-values, calculated via robust or traditional linear regression with interaction term), and differential regression plots. Beyond network visualization and exploration, multiDEGGs extends its utility into predictive modeling applications. The identified differential interactions can be leveraged as engineered features in machine learning pipelines, providing biologically meaningful predictors that capture relational information between molecular entities. The package includes specialized functions for nested cross-validation that ensure proper feature selection and engineering without data leakage, enabling the construction of robust and interpretable predictive models (see `vignette("2. Feature Selection", package = "multiDEGGs")` for details). ## Installation Install from CRAN: `install.packages("multiDEGGs")` Install from Github: `devtools::install_github("elisabettasciacca/multiDEGGs")` ## Quick start - Generate Differential Networks If you are working with human data, you can start differential analysis by using the internal multiDEGGs' default reference network. Users working with other species or requiring custom interaction sets can provide their own biological reference netowork as detailed in the last paragraph of this vignette. Let's start by loading the package and sample data: ```{r load_data} library(multiDEGGs) data("synthetic_metadata") data("synthetic_rnaseqData") data("synthetic_proteomicData") data("synthetic_OlinkData") ``` Generate Differential Networks: ```{r} assayData_list <- list("RNAseq" = synthetic_rnaseqData, "Proteomics" = synthetic_proteomicData, "Olink" = synthetic_OlinkData) deggs_object <- get_diffNetworks(assayData = assayData_list, metadata = synthetic_metadata, category_variable = "response", regression_method = "lm", padj_method = "bonferroni", verbose = FALSE, show_progressBar = FALSE, cores = 2) ``` ### Key Parameters of `get_diffNetworks` It's worth explaining some of the important parameters of `get_diffNetworks`: * `assayData`: accepts either a single normalized matrix/data frame (for single omic differential analysis), or a list of matrices/data frames (for multi-omic scenarios). For multi-omic analysis, it's highly recommended to use a named list of data. If unnamed, sequential names (assayData1, assayData2, etc.) will be assigned to identify each matrix or data frame. * `metadata`: can also be a named factor vector, with names matching the patient IDs in column names of the assay data matrices/data frames. In that case, the category_variable can remain unset (NULL by default). * `category_subset`: this parameter can restrict the analysis to a certain subset of categories available in the metadata/category vector. * `regression_method`: set to `"lm"` by default because it is faster and highly recommended in machine learning scenarios, where the function might be repeatedly called many times. For basic differential analyses, `"rlm"` can also be used and may perform better in some cases. * `percentile_vector`: by default, molecular targets (genes, proteins, etc.) whose expression level is below the 35th percentile of the entire data matrix are excluded from the analysis. This threshold can be modified by specifying the percentile vector that is internally used for the percolation analysis. For example, to remove only targets below the 25th percentile, set `percentile_vector = seq(0.25, 0.98, by = 0.05)`. * `padj_method`: the default method is Bonferroni. Storey's q values often give more generous results but the `qvalue` package needs to be installed first. **NOTE**: Not all patient IDs need to be present across datasets. Different numbers of samples per omic are acceptable. Only IDs whose data is available in the colnames of the assayData will be included in the analysis. Missing IDs will be listed in a message similar to: `The following samples IDs are missing in Proteomics: PT001, PT005, PT0030` ## Visualization The `deggs_object` now contains the differential networks for each omic data in `assayData_list`. These networks can be integrated into a comprehensive visualization where different colors distinguish links from different omic layers. ```{r, eval=FALSE} View_diffNetworks(deggs_object) ``` This visualization interface allows to: 1. Navigate the networks associated with each patient category 2. Filter by link significance 3. Search for specific genes inside the network
{width=75%}
Thicker links correspond to higher significant p-values. The direction of the arrows shows the relationship direction reported in literature, not derived from the data. The user can visualize differential regression plots by clicking on a link:
{width=75%}
{width=80%}
**NOTE**: For multi-omic scenarios, the data from the first matrix in the list passed to `assayData` will be used for this boxplot. ## List All Differential Interactions Outside of the interactive environment, the `get_multiOmics_diffNetworks()` function can be used to get a table of all differential interactions, ordered by p-value or adjusted p-value: ```{r, warning=FALSE} get_multiOmics_diffNetworks(deggs_object, sig_threshold = 0.05) ```
For single omic scenarios, use the `get_sig_deggs()` function:
```{r} deggs_object_oneOmic <- get_diffNetworks(assayData = synthetic_rnaseqData, metadata = synthetic_metadata, category_variable = "response", regression_method = "lm", padj_method = "bonferroni", verbose = FALSE, show_progressBar = FALSE, cores = 2) get_sig_deggs(deggs_object_oneOmic, sig_threshold = 0.05) ``` ## Differential Regression Plots To plot the differential regression fits outside of the interactive environment, use `plot_regressions()` specifying the omic data to be used and the two targets: ```{r, fig.width = 4.5, fig.height = 4, eval=FALSE} plot_regressions(deggs_object, assayDataName = "RNAseq", gene_A = "MTOR", gene_B = "AKT2", legend_position = "bottomright") ```{width=50%} In single omic analyses, the `assayDataName` parameter can remain unset. ## Differential Network Analysis with More Than Two Groups It's possible to compare differential interactions among more than two categorical groups. All steps described above stay the same; the dropdown menu of the interactive environment will show all available categories:
![]() |
While regressions and boxplots will show all categories:
![]() |
![]() |