--- title: "Bootstrap strategies for bigPLSR" shorttitle: "Bootstrap strategies for bigPLSR" author: - name: "Frédéric Bertrand" affiliation: - Cedric, Cnam, Paris email: frederic.bertrand@lecnam.net date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Bootstrap strategies for bigPLSR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup_ops, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/bootstrap-", fig.width = 7, fig.height = 5, dpi = 150, message = FALSE, warning = FALSE ) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") set.seed(2025) ``` ## Introduction `bigPLSR` now provides two complementary bootstrap procedures: * **(X, Y) bootstrap** refits the full regression model on resampled pairs. * **(X, T) bootstrap** keeps the latent components of the original fit and resamples the score structure, delivering fast updates of the regression coefficients. Both approaches expose percentile and BCa confidence intervals, numerical summaries and plotting helpers. We rely on a small multivariate example to illustrate the workflow. ```{r data} library(bigPLSR) n <- 100; p <- 6; m <- 2 X <- matrix(rnorm(n * p), n, p) eta1 <- X[, 1] + 0.4 * X[, 2] - 0.6 * X[, 3] eta2 <- -0.5 * X[, 2] + 0.7 * X[, 4] + 0.5 * X[, 5] Y <- cbind(eta1, eta2) + matrix(rnorm(n * m, sd = 0.5), n, m) ``` ## Baseline fit ```{r fit, eval=LOCAL, cache=TRUE} fit <- pls_fit(X, Y, ncomp = 3, scores = "r") ``` ## (X, Y) bootstrap ```{r boot-xy, eval=LOCAL, cache=TRUE} boot_xy <- pls_bootstrap(X, Y, ncomp = 3, R = 50, type = "xy", parallel = "none", return_scores = TRUE) head(summarise_pls_bootstrap(boot_xy)) ``` A quick visual inspection of the coefficient distributions: ```{r boot-xy-plot, fig.height=4.5, eval=LOCAL, cache=TRUE} plot_pls_bootstrap_coefficients(boot_xy, variables = colnames(X)) ``` ## (X, T) bootstrap The conditional bootstrap operates on the latent score representation extracted from the baseline fit. ```{r boot-xt, eval=LOCAL, cache=TRUE} boot_xt <- pls_bootstrap(X, Y, ncomp = 3, R = 50, type = "xt", parallel = "none", return_scores = TRUE) head(summarise_pls_bootstrap(boot_xt)) ``` ```{r boot-xt-plot, fig.height=4.5, eval=LOCAL, cache=TRUE} plot_pls_bootstrap_coefficients(boot_xt, responses = colnames(Y)) ``` ## Exploring bootstrap scores When `return_scores = TRUE`, the bootstrap result stores the score matrices for each replicate. This allows for custom diagnostics such as the dispersion of the first two latent variables: ```{r boot-score-summary, eval=LOCAL, cache=TRUE} score_mats <- boot_xt$score_samples score_means <- sapply(score_mats, function(M) colMeans(M)[1:2]) apply(score_means, 1, summary) ``` You can feed individual score matrices into `plot_pls_individuals()` to overlay confidence ellipses obtained from the bootstrap draws. ## Parallel execution Both bootstrap flavours honour the `parallel = "future"` option. Configure your preferred plan before calling the helper: ```{r boot-parallel, eval=FALSE} future::plan(future::multisession, workers = 2) boot_xy_parallel <- pls_bootstrap(X, Y, ncomp = 3, R = 100, type = "xy", parallel = "future") future::plan(future::sequential) ``` ## Conclusion Use the two bootstrap strategies to quantify the uncertainty of your PLS models. The (X, Y) variant mirrors the classic non-parametric bootstrap while the (X, T) option keeps the latent structure fixed for computational efficiency. The supplied summaries and plotting helpers provide starting points for more elaborate diagnostic workflows.