--- title: "Pipeline integration (targets / drake)" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Pipeline integration (targets / drake)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") # Skip evaluation of all chunks on CRAN's auto-check farm to fit the # 10-minute build budget. Locally, on CI, and under devtools::check(), # NOT_CRAN=true and all chunks evaluate normally. The vignette source # (which CRAN users see in browseVignettes() / vignette()) is unchanged. NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set(eval = NOT_CRAN) ``` # Pipeline integration `vennDiagramLab` is library-first and tidyverse-friendly. The `broom`-compatible S3 methods on `RegionResult` make it trivial to plug into `targets` / `drake` workflows or any pipeline that expects tidy data. ```{r load} library(vennDiagramLab) result <- analyze(load_sample("dataset_real_cancer_drivers_4")) ``` ## broom methods Three methods convert a `RegionResult` to a tibble at three different levels of aggregation: * `tidy(result)` — one row per set pair, all five pairwise metrics * `glance(result)` — one row, headline numbers * `augment(result)` — one row per item, set-membership flags + region label ```{r broom} broom::glance(result) head(broom::tidy(result)) head(broom::augment(result)) ``` ## Combining with dplyr If you want to filter to only the highly significant pairs: ```{r dplyr, eval = NOT_CRAN && requireNamespace("dplyr", quietly = TRUE)} broom::tidy(result) |> dplyr::filter(highly_significant) |> dplyr::arrange(dplyr::desc(jaccard)) |> dplyr::select(set_a, set_b, intersection, jaccard, p_adjusted) ``` Or count items per region: ```{r dplyr-augment, eval = NOT_CRAN && requireNamespace("dplyr", quietly = TRUE)} broom::augment(result) |> dplyr::count(region_label, sort = TRUE) ``` ## targets pipeline (sketch) A simple `_targets.R` file: ```{r targets-pipeline, eval = FALSE} library(targets) list( tar_target(ds, load_sample("dataset_real_cancer_drivers_4")), tar_target(result, analyze(ds)), tar_target(stats_df, broom::tidy(result)), tar_target(genes_df, broom::augment(result)), tar_target(venn_svg, render_venn_svg(result)), tar_target(venn_path, { writeLines(venn_svg, "venn.svg"); "venn.svg" }, format = "file") ) ``` Run with `targets::tar_make()`. Each step caches independently, so re-running after only changing the sort order in a downstream report does not re-run the analysis. ## Caching tip `statistics(result)` recomputes on every call (no S4 lazy-property equivalent). If you call it many times, cache it once: ```{r cache} stats <- statistics(result) str(stats@jaccard, max.level = 1) ``` Inside a `targets` pipeline, this is a non-issue because `tar_target(stats, statistics(result))` caches it for you. ## What's next * `vignette("v05_statistics_deep_dive")` — what the metrics in `broom::tidy()` actually mean. * `vignette("v07_pdf_reports")` — turning a result into a PDF artifact for a pipeline.