---
title: "Bootstrap strategies for bigPLSR"
shorttitle: "Bootstrap strategies for bigPLSR"
author:
- name: "Frédéric Bertrand"
  affiliation:
  - Cedric, Cnam, Paris
  email: frederic.bertrand@lecnam.net
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Bootstrap strategies for bigPLSR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup_ops, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/bootstrap-",
  fig.width = 7,
  fig.height = 5,
  dpi = 150,
  message = FALSE,
  warning = FALSE
)

LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")
set.seed(2025)
```


## Introduction

`bigPLSR` now provides two complementary bootstrap procedures:

* **(X, Y) bootstrap** refits the full regression model on resampled pairs.
* **(X, T) bootstrap** keeps the latent components of the original fit and
  resamples the score structure, delivering fast updates of the regression
  coefficients.

Both approaches expose percentile and BCa confidence intervals, numerical
summaries and plotting helpers.

We rely on a small multivariate example to illustrate the workflow.

```{r data}
library(bigPLSR)
n <- 100; p <- 6; m <- 2
X <- matrix(rnorm(n * p), n, p)
eta1 <- X[, 1] + 0.4 * X[, 2] - 0.6 * X[, 3]
eta2 <- -0.5 * X[, 2] + 0.7 * X[, 4] + 0.5 * X[, 5]
Y <- cbind(eta1, eta2) + matrix(rnorm(n * m, sd = 0.5), n, m)
```

## Baseline fit

```{r fit, eval=LOCAL, cache=TRUE}
fit <- pls_fit(X, Y, ncomp = 3, scores = "r")
```

## (X, Y) bootstrap

```{r boot-xy, eval=LOCAL, cache=TRUE}
boot_xy <- pls_bootstrap(X, Y, ncomp = 3, R = 50, type = "xy",
                         parallel = "none", return_scores = TRUE)
head(summarise_pls_bootstrap(boot_xy))
```

A quick visual inspection of the coefficient distributions:

```{r boot-xy-plot, fig.height=4.5, eval=LOCAL, cache=TRUE}
plot_pls_bootstrap_coefficients(boot_xy, variables = colnames(X))
```

## (X, T) bootstrap

The conditional bootstrap operates on the latent score representation extracted
from the baseline fit.

```{r boot-xt, eval=LOCAL, cache=TRUE}
boot_xt <- pls_bootstrap(X, Y, ncomp = 3, R = 50, type = "xt",
                         parallel = "none", return_scores = TRUE)
head(summarise_pls_bootstrap(boot_xt))
```

```{r boot-xt-plot, fig.height=4.5, eval=LOCAL, cache=TRUE}
plot_pls_bootstrap_coefficients(boot_xt, responses = colnames(Y))
```

## Exploring bootstrap scores

When `return_scores = TRUE`, the bootstrap result stores the score matrices for
each replicate. This allows for custom diagnostics such as the dispersion of the
first two latent variables:

```{r boot-score-summary, eval=LOCAL, cache=TRUE}
score_mats <- boot_xt$score_samples
score_means <- sapply(score_mats, function(M) colMeans(M)[1:2])
apply(score_means, 1, summary)
```

You can feed individual score matrices into `plot_pls_individuals()` to overlay
confidence ellipses obtained from the bootstrap draws.

## Parallel execution

Both bootstrap flavours honour the `parallel = "future"` option. Configure your
preferred plan before calling the helper:

```{r boot-parallel, eval=FALSE}
future::plan(future::multisession, workers = 2)
boot_xy_parallel <- pls_bootstrap(X, Y, ncomp = 3, R = 100, type = "xy",
                                  parallel = "future")
future::plan(future::sequential)
```

## Conclusion

Use the two bootstrap strategies to quantify the uncertainty of your PLS models.
The (X, Y) variant mirrors the classic non-parametric bootstrap while the (X, T)
option keeps the latent structure fixed for computational efficiency. The
supplied summaries and plotting helpers provide starting points for more
elaborate diagnostic workflows.