Our OmicsMLRepo project aims to improve the AI/ML-readiness of Omics datasets available through Bioconductor. One of the main activities under this project is metadata harmonization (e.g., remove redundant information) and standardization (i.e., incorporate ontology).
Currently, we released the harmonized version of metadata for two Bioconductor data packages - curatedMetagenomicDatacontaining human microbiome data and cBioPortalData package on cancer genomics data. OmicsMLRepoR is a software package allowing users to easily access the harmonized metadata and to leverage ontology in metadata search.
OmicsMLRepoR package provides the three major functions:
1. Download the harmonized metadata
2. Browse the harmonized metadata using ontology
3. Manipulate the ‘shape’ of the harmonized metadata
suppressPackageStartupMessages({
library(OmicsMLRepoR)
library(dplyr)
library(curatedMetagenomicData)
library(cBioPortalData)
})
You can download the harmonized version of metadata using the getMetadata
function. Currently, two options are available - cMD and cBioPortalData.
cmd <- getMetadata("cMD")
cmd
#> # A tibble: 22,588 × 85
#> study_name subject_id sample_id curation_id target_condition
#> * <chr> <chr> <chr> <chr> <chr>
#> 1 AsnicarF_2017 MV_FEI1 MV_FEI1_t1Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 2 AsnicarF_2017 MV_FEI2 MV_FEI2_t1Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 3 AsnicarF_2017 MV_FEI3 MV_FEI3_t1Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 4 AsnicarF_2017 MV_FEI4 MV_FEI4_t1Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 5 AsnicarF_2017 MV_FEI4 MV_FEI4_t2Q15 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 6 AsnicarF_2017 MV_FEI5 MV_FEI5_t1Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 7 AsnicarF_2017 MV_FEI5 MV_FEI5_t2Q14 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 8 AsnicarF_2017 MV_FEI5 MV_FEI5_t3Q15 AsnicarF_2017:MV_FEI… Intestinal Flora
#> 9 AsnicarF_2017 MV_FEM1 MV_FEM1_t1Q14 AsnicarF_2017:MV_FEM… Intestinal Flora
#> 10 AsnicarF_2017 MV_FEM2 MV_FEM2_t1Q14 AsnicarF_2017:MV_FEM… Intestinal Flora
#> # ℹ 22,578 more rows
#> # ℹ 80 more variables: target_condition_ontology_term_id <chr>, control <chr>,
#> # control_ontology_term_id <chr>, country <chr>,
#> # country_ontology_term_id <chr>, body_site <chr>,
#> # body_site_ontology_term_id <chr>, body_site_details <chr>,
#> # body_site_details_ontology_term_id <chr>, age_group <chr>,
#> # age_group_ontology_term_id <chr>, age_max <dbl>, age_min <dbl>, …
cbio <- getMetadata("cBioPortal")
cbio
#> # A tibble: 189,439 × 86
#> studyId patientId sampleId curation_id sample_count age_at_death
#> * <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 acc_tcga TCGA-OR-A5J1 TCGA-OR-A5J1-01 acc_tcga:TCG… 1 NA
#> 2 acc_tcga TCGA-OR-A5J2 TCGA-OR-A5J2-01 acc_tcga:TCG… 1 NA
#> 3 acc_tcga TCGA-OR-A5J3 TCGA-OR-A5J3-01 acc_tcga:TCG… 1 NA
#> 4 acc_tcga TCGA-OR-A5J4 TCGA-OR-A5J4-01 acc_tcga:TCG… 1 NA
#> 5 acc_tcga TCGA-OR-A5J5 TCGA-OR-A5J5-01 acc_tcga:TCG… 1 NA
#> 6 acc_tcga TCGA-OR-A5J6 TCGA-OR-A5J6-01 acc_tcga:TCG… 1 NA
#> 7 acc_tcga TCGA-OR-A5J7 TCGA-OR-A5J7-01 acc_tcga:TCG… 1 NA
#> 8 acc_tcga TCGA-OR-A5J8 TCGA-OR-A5J8-01 acc_tcga:TCG… 1 NA
#> 9 acc_tcga TCGA-OR-A5J9 TCGA-OR-A5J9-01 acc_tcga:TCG… 1 NA
#> 10 acc_tcga TCGA-OR-A5JA TCGA-OR-A5JA-01 acc_tcga:TCG… 1 NA
#> # ℹ 189,429 more rows
#> # ℹ 80 more variables: age_at_death_max <dbl>, age_at_death_min <dbl>,
#> # age_at_diagnosis <dbl>, age_at_diagnosis_max <dbl>,
#> # age_at_diagnosis_min <dbl>, age_at_metastasis <dbl>,
#> # age_at_metastasis_max <dbl>, age_at_metastasis_min <dbl>,
#> # age_at_procurement <dbl>, age_at_procurement_max <dbl>,
#> # age_at_procurement_min <dbl>, age_group <chr>, …
Harmonized metadata can be easily searched by dplyr functions. To fully
leverage ontologies incorporated in harmonized metadata and provide more robust
data browsing experience, the package provides the tree_filter function.
Note, that tree_filter can be used on the attributes mapped to the ontology
terms:
#> [1] "target_condition" "control"
#> [3] "country" "body_site"
#> [5] "body_site_details" "age_group"
#> [7] "ancestry_details" "ancestry"
#> [9] "disease_details" "disease"
#> [11] "disease_response_recist" "feces_phenotype"
#> [13] "hla" "neonatal_delivery_procedure"
#> [15] "neonatal_preterm_birth" "obgyn_menopause"
#> [17] "obgyn_pregnancy" "probing_pocket_depth"
#> [19] "sex" "smoker"
#> [21] "treatment"
Compared to the typical searching in the original metadata from the
curatedMetagenomicData, OmicsMLRepoR enables more robust data browsing,
including case-insensitive, synonyms and descendant searching capabilities.
Searching the same information in the original, unharmonized metadata
(sampleMetadata) from the curatedMetagenomicData package is much less
robust:
## Information spread out in two different columns
nrow(sampleMetadata |> filter(study_condition == "CRC"))
#> [1] 701
nrow(sampleMetadata |> filter(disease == "CRC"))
#> [1] 625
## Case sensitive
nrow(sampleMetadata |> filter(study_condition == "CRC"))
#> [1] 701
nrow(sampleMetadata |> filter(study_condition == "crc"))
#> [1] 0
## Synonyms not covered
nrow(sampleMetadata |> filter(study_condition == "Colorectal Carcinoma"))
#> [1] 0
nrow(sampleMetadata |> filter(study_condition == "Colorectal Cancer"))
#> [1] 0
tree_filter is not case-sensitive.
nrow(cmd |> tree_filter(disease, "Colorectal Carcinoma"))
#> [1] 749
nrow(cmd |> tree_filter(disease, "colorectal carcinoma"))
#> [1] 749
tree_filter includes the synonyms of the queried terms in its searching.
syn_res1 <- cmd |> tree_filter(disease, "CRC")
syn_res2 <- cmd |> tree_filter(disease, "Colorectal Cancer")
syn_res3 <- cmd |> tree_filter(disease, "Colorectal Carcinoma")
nrow(syn_res1)
#> [1] 701
nrow(syn_res2)
#> [1] 1060
nrow(syn_res3)
#> [1] 749
Check that the returned results are identical.
unique(syn_res1$disease)
#> [1] "Colorectal Carcinoma;Hepatic Steatosis;Hypertension;Carcinoma"
#> [2] "Colorectal Carcinoma;Carcinoma"
#> [3] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hepatic Steatosis;Hypertension;Carcinoma"
#> [4] "Colorectal Carcinoma;Hypertension;Carcinoma"
#> [5] "Colorectal Carcinoma;Hepatic Steatosis;Carcinoma"
#> [6] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hypertension;Carcinoma"
#> [7] "Colorectal Carcinoma;Adenocarcinoma"
#> [8] "Colorectal Carcinoma"
#> [9] "Colorectal Carcinoma;Hypercholesterolemia;Adenocarcinoma"
#> [10] "Colorectal Carcinoma;Hypertension;Adenocarcinoma"
#> [11] "Colorectal Carcinoma;Hypercholesterolemia;Hypertension;Adenocarcinoma"
#> [12] "Colorectal Carcinoma;Metastatic Malignant Neoplasm;Adenocarcinoma"
#> [13] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Carcinoma"
unique(syn_res2$disease)
#> [1] "Colorectal Carcinoma;Hepatic Steatosis;Hypertension;Carcinoma"
#> [2] "Colorectal Carcinoma;Carcinoma"
#> [3] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hepatic Steatosis;Hypertension;Carcinoma"
#> [4] "Colorectal Carcinoma;Hypertension;Carcinoma"
#> [5] "Colorectal Carcinoma;Hepatic Steatosis;Carcinoma"
#> [6] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hypertension;Carcinoma"
#> [7] "Melanoma;Metastatic Malignant Neoplasm in the Lung"
#> [8] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [9] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [10] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver"
#> [11] "Melanoma;Metastatic Squamous Cell Carcinoma"
#> [12] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Malignant Neoplasm in the Bone"
#> [13] "Melanoma;Metastatic Malignant Neoplasm in the Liver"
#> [14] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [15] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [16] "Melanoma;Metastatic Squamous Cell Carcinoma;Metastatic Malignant Neoplasm in the Adrenal Gland"
#> [17] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [18] "Melanoma;Metastatic Malignant Neoplasm in the Bone"
#> [19] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Adrenal Gland"
#> [20] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver;Metastatic Malignant Neoplasm in the Bone"
#> [21] "Melanoma;Metastatic Malignant Neoplasm"
#> [22] "Colorectal Carcinoma;Adenocarcinoma"
#> [23] "Colorectal Carcinoma"
#> [24] "Melanoma"
#> [25] "Melanoma;Colitis"
#> [26] "Melanoma;Metastatic Malignant Neoplasm;Melanoma Surgery"
#> [27] "Metastatic Malignant Neoplasm"
#> [28] "Colorectal Carcinoma;Hypercholesterolemia;Adenocarcinoma"
#> [29] "Colorectal Carcinoma;Hypertension;Adenocarcinoma"
#> [30] "Hypertension;Metastatic Malignant Neoplasm"
#> [31] "Adenoma;Metastatic Malignant Neoplasm"
#> [32] "Colorectal Carcinoma;Hypercholesterolemia;Hypertension;Adenocarcinoma"
#> [33] "Adenoma;Hypercholesterolemia;Metastatic Malignant Neoplasm"
#> [34] "Colorectal Carcinoma;Metastatic Malignant Neoplasm;Adenocarcinoma"
#> [35] "Carcinoma"
#> [36] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Carcinoma"
unique(syn_res3$disease)
#> [1] "Colorectal Carcinoma;Hepatic Steatosis;Hypertension;Carcinoma"
#> [2] "Colorectal Carcinoma;Carcinoma"
#> [3] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hepatic Steatosis;Hypertension;Carcinoma"
#> [4] "Colorectal Carcinoma;Hypertension;Carcinoma"
#> [5] "Colorectal Carcinoma;Hepatic Steatosis;Carcinoma"
#> [6] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hypertension;Carcinoma"
#> [7] "Melanoma;Metastatic Squamous Cell Carcinoma"
#> [8] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [9] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [10] "Melanoma;Metastatic Squamous Cell Carcinoma;Metastatic Malignant Neoplasm in the Adrenal Gland"
#> [11] "Colorectal Carcinoma;Adenocarcinoma"
#> [12] "Colorectal Carcinoma"
#> [13] "Colorectal Carcinoma;Hypercholesterolemia;Adenocarcinoma"
#> [14] "Colorectal Carcinoma;Hypertension;Adenocarcinoma"
#> [15] "Colorectal Carcinoma;Hypercholesterolemia;Hypertension;Adenocarcinoma"
#> [16] "Colorectal Carcinoma;Metastatic Malignant Neoplasm;Adenocarcinoma"
#> [17] "Carcinoma"
#> [18] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Carcinoma"
tree_filter includes all the descendants of the queried term in its searching.
onto_res <- cmd |> tree_filter(disease, "Intestinal Disorder")
unique(onto_res$disease)
#> [1] "Parkinson's Disease"
#> [2] "Schizophrenia"
#> [3] "Type 2 Diabetes Mellitus;Schizophrenia"
#> [4] "Crohn Disease;Schizophrenia"
#> [5] "Rheumatoid Arthritis;Ankylosing Spondylitis"
#> [6] "Alzheimer's Disease;Allergic Rhinitis"
#> [7] "Alzheimer's Disease"
#> [8] "Alzheimer's Disease;Allergic Rhinitis;Asthma"
#> [9] "Alzheimer's Disease;Asthma"
#> [10] "Allergic Rhinitis"
#> [11] "Allergic Rhinitis;Asthma"
#> [12] "Gestational Diabetes;Preeclampsia"
#> [13] "Gestational Diabetes"
#> [14] "Chorioamnionitis"
#> [15] "Preeclampsia"
#> [16] "Diarrhea;Cholera"
#> [17] "Colorectal Carcinoma;Hepatic Steatosis;Hypertension;Carcinoma"
#> [18] "Adenoma;Hepatic Steatosis;Hypertension"
#> [19] "Type 2 Diabetes Mellitus;Hepatic Steatosis;Hypertension"
#> [20] "Adenoma;Hepatic Steatosis"
#> [21] "Colorectal Carcinoma;Carcinoma"
#> [22] "Adenoma"
#> [23] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hepatic Steatosis;Hypertension;Carcinoma"
#> [24] "Colorectal Carcinoma;Hypertension;Carcinoma"
#> [25] "Type 2 Diabetes Mellitus;Hypertension"
#> [26] "Type 2 Diabetes Mellitus;Adenoma"
#> [27] "Type 2 Diabetes Mellitus;Adenoma;Hypertension"
#> [28] "Adenoma;Hypertension"
#> [29] "Colorectal Carcinoma;Hepatic Steatosis;Carcinoma"
#> [30] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Hypertension;Carcinoma"
#> [31] "Type 2 Diabetes Mellitus;Hepatic Steatosis"
#> [32] "Type 2 Diabetes Mellitus;Adenoma;Hepatic Steatosis;Hypertension"
#> [33] "Type 2 Diabetes Mellitus;Adenoma;Hepatic Steatosis"
#> [34] "Melanoma;Metastatic Malignant Neoplasm in the Lung"
#> [35] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [36] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [37] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver"
#> [38] "Melanoma;Metastatic Squamous Cell Carcinoma"
#> [39] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Malignant Neoplasm in the Bone"
#> [40] "Melanoma;Metastatic Malignant Neoplasm in the Liver"
#> [41] "Melanoma;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [42] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Lymph Nodes;Metastatic Squamous Cell Carcinoma"
#> [43] "Melanoma;Metastatic Squamous Cell Carcinoma;Metastatic Malignant Neoplasm in the Adrenal Gland"
#> [44] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver;Metastatic Malignant Neoplasm in the Lymph Nodes"
#> [45] "Melanoma;Metastatic Malignant Neoplasm in the Bone"
#> [46] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Adrenal Gland"
#> [47] "Melanoma;Metastatic Malignant Neoplasm in the Lung;Metastatic Malignant Neoplasm in the Liver;Metastatic Malignant Neoplasm in the Bone"
#> [48] "Healthy;Periodontitis"
#> [49] "Type 2 Diabetes Mellitus;Mucositis"
#> [50] "Type 2 Diabetes Mellitus"
#> [51] "Mucositis"
#> [52] "Mucositis;Periodontitis"
#> [53] "Periodontal Disorder;Periodontitis"
#> [54] "Type 2 Diabetes Mellitus;Periodontal Disorder"
#> [55] "Periodontal Disorder"
#> [56] "Type 2 Diabetes Mellitus;Periodontal Disorder;Periodontitis"
#> [57] "Type 2 Diabetes Mellitus;Periodontitis"
#> [58] "Melanoma;Metastatic Malignant Neoplasm"
#> [59] "Colorectal Carcinoma;Adenocarcinoma"
#> [60] "Inflammatory Bowel Disease;Crohn Disease"
#> [61] "Inflammatory Bowel Disease;Ulcerative Colitis"
#> [62] "Colorectal Carcinoma"
#> [63] "Type 1 Diabetes Mellitus"
#> [64] "Cytomegaloviral Infection;Celiac Disease;Gestational Diabetes"
#> [65] "Type 1 Diabetes Mellitus;Celiac Disease;Irritable Bowel Syndrome"
#> [66] "Hepatitis"
#> [67] "Glucose Intolerance"
#> [68] "Glucose Intolerance;Upper Respiratory Tract Infection"
#> [69] "Upper Respiratory Tract Infection"
#> [70] "Type 2 Diabetes Mellitus;Upper Respiratory Tract Infection"
#> [71] "Clostridium difficile Infection"
#> [72] "Clostridium difficile Infection;Fecal Microbiota Transplantation"
#> [73] "Bacterial Infection"
#> [74] "Bacterial Infection;Fecal Microbiota Transplantation"
#> [75] "Inflammatory Bowel Disease"
#> [76] "Inflammatory Bowel Disease;Fecal Microbiota Transplantation"
#> [77] "Atherosclerosis"
#> [78] "Healthy;Type 1 Diabetes Mellitus"
#> [79] "Melanoma"
#> [80] "Melanoma;Colitis"
#> [81] "Inflammatory Bowel Disease;Anorectal Fistula;Crohn Disease"
#> [82] "metabolic syndrome"
#> [83] "metabolic syndrome;Fecal Microbiota Transplantation"
#> [84] "Escherichia coli Infection"
#> [85] "Cirrhosis"
#> [86] "Glucose Intolerance;Metabolic Syndrome"
#> [87] "Metabolic Syndrome"
#> [88] "Coronary Artery Disease"
#> [89] "Coronary Artery Disease;Type 2 Diabetes Mellitus"
#> [90] "Type 2 Diabetes Mellitus;Heart Failure"
#> [91] "Heart Failure;Type 2 Diabetes Mellitus"
#> [92] "Heart Failure;Coronary Artery Disease"
#> [93] "Heart Failure;Coronary Artery Disease;Type 2 Diabetes Mellitus"
#> [94] "Chronic Fatigue Syndrome"
#> [95] "Melanoma;Metastatic Malignant Neoplasm;Melanoma Surgery"
#> [96] "Cirrhosis;Hepatitis"
#> [97] "Ascites;Cirrhosis;Hepatitis"
#> [98] "Ascites;Cirrhosis;Parasite"
#> [99] "Ascites;Cirrhosis;Hepatitis;Parasite"
#> [100] "Ascites;Cirrhosis"
#> [101] "Hepatitis;Cirrhosis;Ascites"
#> [102] "Ascites;Cirrhosis;Hepatolenticular Degeneration"
#> [103] "Ascites;Cirrhosis;Hepatitis;Hepatolenticular Degeneration"
#> [104] "Cirrhosis;Ascites"
#> [105] "Periodontitis"
#> [106] "Periodontitis;Periodontal Scaling and Root Planing"
#> [107] "Psoriasis"
#> [108] "Arthritis;Psoriasis"
#> [109] "Arthritis"
#> [110] "Metastatic Malignant Neoplasm"
#> [111] "Colorectal Carcinoma;Hypercholesterolemia;Adenocarcinoma"
#> [112] "Adenoma;Hypercholesterolemia"
#> [113] "Colorectal Carcinoma;Hypertension;Adenocarcinoma"
#> [114] "Hypertension;Metastatic Malignant Neoplasm"
#> [115] "Adenoma;Metastatic Malignant Neoplasm"
#> [116] "Colorectal Carcinoma;Hypercholesterolemia;Hypertension;Adenocarcinoma"
#> [117] "Adenoma;Hypercholesterolemia;Metastatic Malignant Neoplasm"
#> [118] "Colorectal Carcinoma;Metastatic Malignant Neoplasm;Adenocarcinoma"
#> [119] "bronchitis"
#> [120] "Otitis Media"
#> [121] "Tonsillitis"
#> [122] "Stomatitis"
#> [123] "Gastroenteritis"
#> [124] "Salmonellosis"
#> [125] "Skin Infection"
#> [126] "Pneumonia"
#> [127] "Sepsis"
#> [128] "Cystitis"
#> [129] "Pyelonephritis"
#> [130] "Inflammatory Bowel Disease;Colitis"
#> [131] "Abdominal Hernia"
#> [132] "Cellulitis"
#> [133] "Osteoarthritis"
#> [134] "Clostridium difficile Infection;Pneumonia"
#> [135] "Clostridium difficile Infection;Cellulitis"
#> [136] "Clostridium difficile Infection;Osteoarthritis"
#> [137] "Clostridium difficile Infection;Ureteric Stone"
#> [138] "Diabetes Mellitus"
#> [139] "Migraine;Asthma"
#> [140] "Asthma"
#> [141] "Migraine"
#> [142] "Migraine;Diabetes Mellitus"
#> [143] "Carcinoma"
#> [144] "Polyps"
#> [145] "Behcet Syndrome"
#> [146] "Colorectal Carcinoma;Type 2 Diabetes Mellitus;Carcinoma"
#> [147] "Adenoma;Small Intestinal Adenoma"
#> [148] "Adenoma;Colorectal Adenoma"
For example, you can search for any row including a disease related to either “migraine” or “diabetes.”
res_or <- cmd %>% tree_filter(disease, c("migraine", "diabetes"), "OR")
We can also change the “OR” argument (default) to either “AND” or “NOT” and change the filtering action. “AND” will return any rows including a disease value that is related to both “migraine” and “diabetes,” and “NOT” will return any rows including a disease value that is not related to either “migraine” or “diabetes.”
res_and <- cmd %>% tree_filter(disease, c("migraine", "diabetes"), "AND")
res_not <- cmd %>% tree_filter(disease, c("migraine", "diabetes"), "NOT")
You can combine tree_filter and dplyr functions. For example, if you want
all rows with a disease value related to either “migraine” or “diabetes,” as
well as with an age_years value under 30,
res_or_below30 <- cmd %>%
filter(age_years < 30) %>%
tree_filter(disease, c("migraine", "diabetes"))
Some metadata columns (e.g., biomarker) contain multiple, similar
attributes separated with a specific delimiter (i.e., <;>). Our
harmonization use this structure because they are related information
often looked up together.
cmd_biomarker <- cmd %>%
filter(!is.na(biomarker)) %>%
select(curation_id, biomarker)
wtb <- getWideMetaTb(cmd_biomarker, "biomarker")
head(wtb)
#> # A tibble: 6 × 36
#> curation_id `Adiponectin_in_ug/mL` Alanine_Aminotransfe…¹ `Albumin_in_g/dL`
#> <chr> <chr> <chr> <chr>
#> 1 ChengpingW_20… <NA> 23 57
#> 2 ChengpingW_20… <NA> 13 49.4
#> 3 ChengpingW_20… <NA> 19 50.8
#> 4 ChengpingW_20… <NA> 29 52
#> 5 ChengpingW_20… <NA> 14 47.1
#> 6 ChengpingW_20… <NA> 5 46.1
#> # ℹ abbreviated name: ¹`Alanine_Aminotransferase_in_U/L`
#> # ℹ 32 more variables: `Aspartate_Aminotransferase_in_U/L` <chr>,
#> # `Autoantibody_titer_positive_(finding)` <chr>, `C-peptide_in_ng/ml` <chr>,
#> # `Cholesterol_in_mg/dL` <chr>, `Creatine_in_umol/L` <chr>,
#> # `Creatinine_in_umol/L` <chr>, `Diastolic_Blood_Pressure_in_mm/Hg` <chr>,
#> # `Direct_Bilirubin_in_umol/L` <chr>,
#> # `Erythrocyte_Sedimentation_Rate_in_mm/hr` <chr>, …
ltb <- getLongMetaTb(cmd, targetCol = "target_condition")
dim(cmd)
#> [1] 22588 85
dim(ltb)
#> [1] 44096 85
cmd_dat <- cmd %>%
tree_filter(col = "disease", "Type 2 Diabetes Mellitus") %>%
filter(sex == "Female") %>%
filter(age_group == "Elderly") %>%
returnSamples("relative_abundance", rownames = "short")
cbio_sub <- cbio %>%
getLongMetaTb("treatment_name", "<;>") %>%
filter(treatment_name == "Fluorouracil") %>%
filter(age_at_diagnosis > 50) %>%
filter(sex == "Female") %>%
getShortMetaTb(idCols = "curation_id", targetCol = "treatment_name")
dim(cbio_sub)
#> [1] 55 86
studies <- unique(cbio_sub$studyId)
studies
#> [1] "aml_ohsu_2018" "aml_ohsu_2022" "egc_msk_2017"
A simple for loop can collect samples from multiple studies. For example,
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] cBioPortalData_2.22.3 MultiAssayExperiment_1.36.1
#> [3] AnVIL_1.22.5 AnVILBase_1.4.0
#> [5] curatedMetagenomicData_3.18.0 TreeSummarizedExperiment_2.18.0
#> [7] Biostrings_2.78.0 XVector_0.50.0
#> [9] SingleCellExperiment_1.32.0 SummarizedExperiment_1.40.0
#> [11] Biobase_2.70.0 GenomicRanges_1.62.1
#> [13] Seqinfo_1.0.0 IRanges_2.44.0
#> [15] S4Vectors_0.48.0 BiocGenerics_0.56.0
#> [17] generics_0.1.4 MatrixGenerics_1.22.0
#> [19] matrixStats_1.5.0 dplyr_1.2.0
#> [21] OmicsMLRepoR_1.4.6 BiocStyle_2.38.0
#>
#> loaded via a namespace (and not attached):
#> [1] ggtext_0.1.2 fs_1.6.7
#> [3] bitops_1.0-9 DirichletMultinomial_1.52.0
#> [5] lubridate_1.9.5 httr_1.4.8
#> [7] RColorBrewer_1.1-3 GenomicDataCommons_1.34.1
#> [9] tools_4.5.2 utf8_1.2.6
#> [11] R6_2.6.1 DT_0.34.0
#> [13] vegan_2.7-3 lazyeval_0.2.2
#> [15] mgcv_1.9-4 permute_0.9-10
#> [17] withr_3.0.2 TCGAutils_1.30.2
#> [19] gridExtra_2.3 cli_3.6.5
#> [21] textshaping_1.0.5 formatR_1.14
#> [23] sandwich_3.1-1 slam_0.1-55
#> [25] sass_0.4.10 mvtnorm_1.3-6
#> [27] S7_0.2.1 GCPtools_1.0.0
#> [29] readr_2.2.0 rapiclient_0.1.8
#> [31] Rsamtools_2.26.0 systemfonts_1.3.2
#> [33] yulab.utils_0.2.4 dichromat_2.0-0.1
#> [35] scater_1.38.1 decontam_1.30.0
#> [37] parallelly_1.46.1 readxl_1.4.5
#> [39] fillpattern_1.0.3 RSQLite_2.4.6
#> [41] BiocIO_1.20.0 visNetwork_2.1.4
#> [43] vroom_1.7.0 rbiom_2.2.1
#> [45] Matrix_1.7-4 futile.logger_1.4.9
#> [47] ggbeeswarm_0.7.3 DECIPHER_3.6.0
#> [49] abind_1.4-8 lifecycle_1.0.5
#> [51] multcomp_1.4-30 yaml_2.3.12
#> [53] RaggedExperiment_1.34.0 SparseArray_1.10.9
#> [55] BiocFileCache_3.0.0 grid_4.5.2
#> [57] blob_1.3.0 promises_1.5.0
#> [59] ExperimentHub_3.0.0 crayon_1.5.3
#> [61] miniUI_0.1.2 lattice_0.22-9
#> [63] beachmat_2.26.0 chromote_0.5.1
#> [65] cigarillo_1.0.0 GenomicFeatures_1.62.0
#> [67] KEGGREST_1.50.0 pillar_1.11.1
#> [69] knitr_1.51 rjson_0.2.23
#> [71] estimability_1.5.1 codetools_0.2-20
#> [73] glue_1.8.0 data.table_1.18.2.1
#> [75] vctrs_0.7.1 png_0.1-9
#> [77] treeio_1.34.0 cellranger_1.1.0
#> [79] gtable_0.3.6 cachem_1.1.0
#> [81] xfun_0.57 S4Arrays_1.10.1
#> [83] mime_0.13 coda_0.19-4.1
#> [85] survival_3.8-6 RTCGAToolbox_2.40.0
#> [87] DiagrammeR_1.0.11 bluster_1.20.0
#> [89] TH.data_1.1-5 nlme_3.1-168
#> [91] bit64_4.6.0-1 filelock_1.0.3
#> [93] GenomeInfoDb_1.46.2 data.tree_1.2.0
#> [95] bslib_0.10.0 irlba_2.3.7
#> [97] vipor_0.4.7 otel_0.2.0
#> [99] DBI_1.3.0 processx_3.8.6
#> [101] tidyselect_1.2.1 emmeans_2.0.2
#> [103] bit_4.6.0 compiler_4.5.2
#> [105] curl_7.0.0 rvest_1.0.5
#> [107] httr2_1.2.2 BiocNeighbors_2.4.0
#> [109] xml2_1.5.2 DelayedArray_0.36.0
#> [111] rtracklayer_1.70.1 bookdown_0.46
#> [113] scales_1.4.0 rappdirs_0.3.4
#> [115] stringr_1.6.0 digest_0.6.39
#> [117] rmarkdown_2.30 htmltools_0.5.9
#> [119] pkgconfig_2.0.3 sparseMatrixStats_1.22.0
#> [121] dbplyr_2.5.2 fastmap_1.2.0
#> [123] rlang_1.1.7 htmlwidgets_1.6.4
#> [125] UCSC.utils_1.6.1 shiny_1.13.0
#> [127] DelayedMatrixStats_1.32.0 farver_2.1.2
#> [129] jquerylib_0.1.4 zoo_1.8-15
#> [131] jsonlite_2.0.0 BiocParallel_1.44.0
#> [133] BiocSingular_1.26.1 RCurl_1.98-1.17
#> [135] magrittr_2.0.4 scuttle_1.20.0
#> [137] patchwork_1.3.2 Rcpp_1.1.1
#> [139] ape_5.8-1 ggnewscale_0.5.2
#> [141] viridis_0.6.5 stringi_1.8.7
#> [143] RJSONIO_2.0.0 MASS_7.3-65
#> [145] AnnotationHub_4.0.0 plyr_1.8.9
#> [147] parallel_4.5.2 ggrepel_0.9.8
#> [149] splines_4.5.2 gridtext_0.1.6
#> [151] hms_1.1.4 ps_1.9.1
#> [153] igraph_2.2.2 reshape2_1.4.5
#> [155] ScaledMatrix_1.18.0 futile.options_1.0.1
#> [157] XML_3.99-0.23 BiocVersion_3.22.0
#> [159] evaluate_1.0.5 lambda.r_1.2.4
#> [161] BiocManager_1.30.27 tzdb_0.5.0
#> [163] httpuv_1.6.17 rols_3.6.1
#> [165] tidyr_1.3.2 purrr_1.2.1
#> [167] ggplot2_4.0.2 BiocBaseUtils_1.12.0
#> [169] rsvd_1.0.5 xtable_1.8-8
#> [171] restfulr_0.0.16 tidytree_0.4.7
#> [173] later_1.4.8 viridisLite_0.4.3
#> [175] ragg_1.5.1 tibble_3.3.1
#> [177] websocket_1.4.4 GenomicAlignments_1.46.0
#> [179] memoise_2.0.1 beeswarm_0.4.0
#> [181] AnnotationDbi_1.72.0 cluster_2.1.8.2
#> [183] timechange_0.4.0 mia_1.18.0