bigPLSR provides fast, scalable Partial Least Squares (PLS) with two execution backends:
backend = "arma"): in-memory
Armadillo/BLAS for speed.backend = "bigmem"):
chunked streaming over bigmemory::big.matrix for large
data.Both PLS1 (single response) and PLS2 (multi-response) are supported. PLS2 uses SIMPLS on cross-products in both backends for numerical parity.
Recent updates bring additional solvers and tooling:
future
ecosystem.The package is set up to be CRAN-friendly: the optional CBLAS fast path is off by default.
Support for parallel computation and GPU is being developed.
This website and these examples were created by F. Bertrand and M. Maumy.
You can install the released version of bigPLSR from CRAN with:
install.packages("bigPLSR")You can install the development version of bigPLSR from github with:
devtools::install_github("fbertran/bigPLSR")library(bigPLSR)
set.seed(1)
n <- 200; p <- 50
X <- matrix(rnorm(n*p), n, p)
y <- X[,1]*2 - X[,2] + rnorm(n)
# Dense PLS1 (fast)
fit <- pls_fit(X, y, ncomp = 3, backend = "arma", scores = "r")
str(list(
coef=dim(fit$coefficients),
scores=dim(fit$scores),
ncomp=fit$ncomp
))options_val_before <- options("bigmemory.allow.dimnames")
options(bigmemory.allow.dimnames=TRUE)
bmX <- bigmemory::as.big.matrix(X)
bmy <- bigmemory::as.big.matrix(matrix(y, n, 1))
tmp=tempdir()
if(file.exists(paste(tmp,"scores.desc",sep="/"))){unlink(paste(tmp,"scores.desc",sep="/"))}
if(file.exists(paste(tmp,"scores.bin",sep="/"))){unlink(paste(tmp,"scores.bin",sep="/"))}
sink <- bigmemory::filebacked.big.matrix(
nrow=n, ncol=3, type="double",
backingfile="scores.bin",
backingpath=tmp,
descriptorfile="scores.desc"
)
fit_b <- pls_fit(
bmX, bmy, ncomp=3, backend="bigmem", scores="big",
scores_target="existing", scores_bm=sink,
scores_colnames = c("t1","t2","t3"),
return_scores_descriptor = TRUE
)
fit_b$scores_descriptor # big.matrix.descriptor
options(bigmemory.allow.dimnames=options_val_before)set.seed(2)
m <- 3
B <- matrix(rnorm(p*m), p, m)
Y <- scale(X, scale = FALSE) %*% B + matrix(rnorm(n*m, sd = 0.1), n, m)
# Dense PLS2 – SIMPLS on cross-products (parity with bigmem)
fit2 <- pls_fit(X, Y, ncomp = 2, backend = "arma", mode = "pls2", scores = "none")
str(list(coef=dim(fit2$coefficients), ncomp=fit2$ncomp))pls_fit(
X, y, ncomp,
tol = 1e-8,
backend = c("auto", "arma", "bigmem"),
scores = c("none", "r", "big"),
chunk_size = 10000L,
scores_name = "scores",
mode = c("auto","pls1","pls2"),
scores_target = c("auto","new","existing"),
scores_bm = NULL,
scores_backingfile = NULL,
scores_backingpath = NULL,
scores_descriptorfile = NULL,
scores_colnames = NULL,
return_scores_descriptor = FALSE
)Auto selection
backend = "auto" → "bigmem" when
X is a big.matrix (or descriptor), else
"arma".mode = "auto" → "pls1" when y
is one column, else "pls2".Return values
coefficients (p),
intercept (scalar), x_weights,
x_loadings, y_loadings, scores
(optional), x_means, y_mean,
ncomp.coefficients (p×m),
intercept (length m), x_weights (p×ncomp),
x_loadings (p×ncomp), y_loadings (m×ncomp),
scores (optional), x_means,
y_means, ncomp.backend = "arma")X, Y, build
XtX = XcᵀXc, XtY = XcᵀYc.T = Xc %*% W.backend = "bigmem")big.matrix with preallocated
buffers.XtX += BᵀB, XtY += BᵀY) + the same
SIMPLS solver for parity; optional score streaming:
T = (X − μ) %*% W.Both paths enforce symmetry (
0.5*(M+Mᵀ)) before eigen and use a small ridge onXtXfor stability.
scores = "none" – don’t compute scores.scores = "r" – return an in-memory matrix.scores = "big" – write to a big.matrix:
scores_target = "existing" +
scores_bm (big.matrix or descriptor),
orscores_backingfile (+ optional
scores_backingpath,
scores_descriptorfile).scores_colnames – set column names of the scores.return_scores_descriptor = TRUE – adds
fit$scores_descriptor when scores is a
big.matrix.For tight parity tests, force 1 BLAS thread and fix RNG:
set.seed(1)
if (requireNamespace("RhpcBLASctl", quietly = TRUE)) {
RhpcBLASctl::blas_set_num_threads(1L)
} else {
# Use env vars before BLAS loads in the session
Sys.setenv(
OMP_NUM_THREADS="1",
OPENBLAS_NUM_THREADS="1",
MKL_NUM_THREADS="1",
VECLIB_MAXIMUM_THREADS="1",
BLIS_NUM_THREADS="1"
)
}10000L. On Apple
Silicon, internal default is larger (e.g., 16384) when
chunk_size == 0. Tune per dataset for best GEMM
throughput.scores="big",
streaming avoids holding T fully in RAM.Default: OFF (CRAN-safe).
An optional in-place accumulation (true beta = 1
CBLAS dgemm) is available and
guarded by compile-time checks. When not available or
not enabled, the package falls back automatically to
the portable Armadillo path.
Enable locally (Unix/macOS):
R CMD INSTALL . --configure-vars="PKG_CPPFLAGS='-DBIGPLSR_USE_CBLAS'"In src/Makevars, link to the same BLAS/LAPACK
that R uses:
PKG_LIBS += $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)
<vecLib/cblas.h>; on
Linux/others: <cblas.h>.-lopenblas or
-framework Accelerate; use R’s variables.Windows: leave the macro off unless you’ve explicitly provided CBLAS headers/libs.
tempdir().If you use bigPLSR in academic work, please cite this package and the relevant PLS method used.