Let \(X \in \mathbb{R}^{n\times p}\) and \(Y \in \mathbb{R}^{n\times m}\). We assume column-centered data unless stated otherwise. PLS extracts latent scores \(T=[t_1,\dots,t_a]\) with loadings and weights so that covariance between \(X\) and \(Y\) along \(t_a\) is maximized, with orthogonality constraints across components.
For kernel methods, let \(\phi(\cdot)\) be an implicit feature map and define the Gram matrix \(K_X = \Phi_X \Phi_X^\top\) where \((K_X)_{ij} = k(x_i, x_j)\). The centering operator \(H = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}^\top\) yields a centered Gram \(\tilde K_X = H K_X H\).
The package implements several complementary extraction schemes. The following pseudo-code summarises the core loops.
algorithm = "rkhs_xy" )algorithm = "kf_pls")Common kernels:
\[ \begin{aligned} \text{Linear:}\quad& k(x,z) = x^\top z \\ \text{RBF:}\quad& k(x,z) = \exp(-\gamma \|x-z\|^2) \\ \text{Polynomial:}\quad& k(x,z) = (\gamma\,x^\top z + c_0)^{d} \\ \text{Sigmoid:}\quad& k(x,z) = \tanh(\gamma\,x^\top z + c_0). \end{aligned} \]
Given \(K\in\mathbb{R}^{n\times n}\), the centered version is:
\[ \tilde K = H K H, \quad H = I_n - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top. \]
We operate in the dual. Consider \(K_X\) and \(K_{XY} = K_X Y\). At step \(a\), we extract a dual direction \(\alpha_a\) so that the score \(t_a = \tilde K_X \alpha_a\) maximizes covariance with \(Y\), subject to orthogonality in the RKHS metric:
\[ \max_{\alpha} \ \mathrm{cov}(t, Y) \quad \text{s.t.}\ \ t=\tilde K_X \alpha,\ \ t^\top t = 1,\ \ t^\top t_b = 0,\ b<a. \]
A SIMPLS-style recursion in the dual:
Prediction uses the dual coefficients; for a new \(x_\star\), \(k_\star = [k(x_\star, x_i)]_{i=1}^n\) and \(t_\star = \tilde k_\star^\top \alpha\). When \(Y\) is multivariate, apply steps component-wise with a shared \(t_a\).
In bigPLSR
- Dense path: algorithm="rkhs" builds \(\tilde K_X\) (or an approximation) and runs
dual SIMPLS deflation.
- Big-matrix path: block-streamed Gram computations to avoid
materializing \(n\times n\).
We avoid forming \(\tilde K_X\) explicitly by accumulating blocks. Write \(K_X = \sum_{B} X_B X^\top\) for blocks \(X_B\) taken by rows (row-chunked/XXᵗ) or \(K_X = X X^\top = \sum_{C} X C^\top\) with column chunks via \(X C^\top\) where \(C\) are column submatrices (useful for tall-skinny \(X\)).
Row-chunked (XXᵗ): 1. For blocks \(B \subset \{1,\ldots,n\}\): compute \(G_B = X_B X^\top\). 2. Accumulate \(K \leftarrow K + H G_B H\) on the fly when needed in matrix-vector products \((K v)\) without storing full \(K\).
Column-chunked: 1. Partition the feature dimension \(p\) into blocks \(J\). 2. For each block \(J\): \(G_J = X_{[:,J]} X_{[:,J]}^\top\). 3. Use \(G_J\) to update \(K v\) accumulators and to refresh deflation quantities (\(t, q\)).
Memory
- Row-chunked: \(O(n \cdot
\text{chunk\_rows})\).
- Column-chunked: \(O(n \cdot
\text{chunk\_cols})\).
Pick based on layout and cache friendliness.
Nyström (rank \(r\))
Sample a subset \(S\) of size \(r\), form \(K_{SS}\) and \(K_{NS}\). Define the sketch \(Z = K_{NS} K_{SS}^{-1/2}\), so \(K \approx Z Z^\top\). Center \(Z\) by subtracting row/column means. Run
linear PLS on \(Z\).
RFF (RBF kernels)
Draw \(\{\omega_\ell\}_{\ell=1}^r \sim
\mathcal{N}(0,2\gamma I)\) and \(b_\ell\sim \mathcal{U}[0,2\pi]\). Define
features \(\varphi_\ell(x)=\sqrt{\tfrac{2}{r}}\cos(\omega_\ell^\top
x + b_\ell)\), so \(k(x,z)\approx
\varphi(x)^\top \varphi(z)\). Run linear PLS on \(\varphi(X)\).
We first compute KPLS scores \(T\in\mathbb{R}^{n\times a}\) from \(X\) vs labels \(y\in\{0,1\}\), then run logistic regression in latent space via IRLS:
Minimize \(\ell(\beta, \beta_0) = -\sum_i \{ y_i\eta_i - \log(1+\exp\eta_i)\}\) with \(\eta = \beta_0\mathbf{1} + T \beta\). IRLS step at iteration \(k\):
\[ W = \mathrm{diag}(p^{(k)}(1-p^{(k)})),\quad z = \eta^{(k)} + W^{-1}(y - p^{(k)}),\quad \begin{bmatrix}\beta_0\\ \beta\end{bmatrix} = (X_\eta^\top W X_\eta + \lambda I)^{-1} X_\eta^\top W z, \]
where \(X_\eta = [\mathbf{1}, T]\) and \(p^{(k)} = \sigma(\eta^{(k)})\). Optionally alternate: recompute KPLS scores with current residuals and re-run a few IRLS steps. Class weights \(w_c\) can be injected by scaling rows of \(W\).
In bigPLSR
algorithm="klogitpls" computes \(T\) (dense or streamed) then fits IRLS in
latent space.
Promote sparsity in dual or primal weights. In dual form, constrain \(\alpha_a\) by \(\ell_1\) (or group) penalty:
\[ \max_{\alpha}\ \mathrm{cov}(\tilde K \alpha, Y) - \lambda\|\alpha\|_1 \quad\text{s.t.}\quad (\tilde K\alpha)^\top (\tilde K\alpha) = 1,\ t_a^\top t_b=0\ (b<a). \]
A practical approach uses proximal gradient or coordinate descent on a smooth surrogate of covariance, with periodic orthogonalization of the resulting score vectors in the \(\tilde K\) metric. Early stop by explained covariance. (The current release provides the scaffolding API.)
Let \(K_X\) and \(K_Y\) be centered Grams for \(X\) and \(Y\) (with small ridge \(\lambda_X,\lambda_Y\) for stability). The cross-covariance operator is \(A = K_X (K_Y + \lambda_Y I) K_X\). Dual SIMPLS extracts latent directions via the dominant eigenspace of \(A\) with orthogonalization under the \(K_X\) inner product.
Prediction returns dual coefficients \(\alpha\) for \(X\) and \(\beta\) for \(Y\).
In bigPLSR
algorithm="rkhs_xy" wires this in dense mode; a streamed
variant can be built by block Gram accumulations on \(K_X\) and \(K_Y\).
KF-PLS maintains a state that tracks latent parameters over incoming mini-batches. Let the state be \(s = \{w, p, q, b\}\) for the current component, with state transition \(s_{k+1} = s_k + \epsilon_k\) (random walk) and “measurement” formed from the current block cross-covariances (\(\widehat{X^\top t}, \widehat{Y^\top t}\)). The Kalman update:
\[ \begin{aligned} &\text{Predict: } \hat s_{k|k-1} = \hat s_{k-1},\ \ P_{k|k-1}=P_{k-1}+Q \\ &\text{Innovation: } \nu_k = z_k - H_k \hat s_{k|k-1},\ \ S_k = H_k P_{k|k-1} H_k^\top + R \\ &\text{Gain: } K_k = P_{k|k-1} H_k^\top S_k^{-1} \\ &\text{Update: } \hat s_k = \hat s_{k|k-1} + K_k \nu_k,\ \ P_k = (I - K_k H_k) P_{k|k-1}. \end{aligned} \]
After convergence (or patience stop), form \(t = X w\), normalize, and proceed to next component with deflation compatible with SIMPLS/NIPALS choice.
In bigPLSR
algorithm="kf_pls" reuses the existing chunked T
streaming kernel and updates the state per block.
library(bigPLSR)
# Dense RKHS PLS with Nyström of rank 500 (rbf kernel)
fit_rkhs <- pls_fit(X, Y, ncomp = 5,
backend = "arma",
algorithm = "rkhs",
kernel = "rbf", gamma = 0.5,
approx = "nystrom", approx_rank = 500,
scores = "r")
# Bigmemory, kernel logistic PLS (streamed scores + IRLS)
fit_klog <- pls_fit(bmX, bmy, ncomp = 4,
backend = "bigmem",
algorithm = "klogitpls",
kernel = "rbf", gamma = 1.0,
chunk_size = 16384L,
scores = "r")
# Sparse KPLS (dense scaffold)
fit_sk <- pls_fit(X, Y, ncomp = 5,
backend = "arma",
algorithm = "sparse_kpls")Let \(X\in\mathbb{R}^{n\times p}\) be training inputs and \(Y\in\mathbb{R}^{n\times m}\) the responses. With kernel \(k(\cdot,\cdot)\) and training Gram \(K\), the centered Gram is \[ K_c = K - \mathbf{1}_n \bar{k}^\top - \bar{k}\,\mathbf{1}_n^\top + \bar{\bar{k}}, \quad \bar{k} = \frac{1}{n}\sum_{i=1}^n K_{i\cdot},\quad \bar{\bar{k}} = \frac{1}{n^2}\sum_{i,j}K_{ij}. \] KPLS on \(K_c\) yields dual coefficients \(A\in\mathbb{R}^{n\times m}\).
For new inputs \(X_\*\), the cross-kernel \(K_\* \in \mathbb{R}^{n_\*\times n}\) has \((K_\*)_{i,j} = k(x^\*_i, x_j)\). The centered cross-Gram is \[ K_{\*,c} = K_\* - \mathbf{1}_{n_\*}\bar{k}^\top - \bar{k}_\*\mathbf{1}_n^\top + \bar{\bar{k}}, \quad \bar{k}_\* = \frac{1}{n}K_\*\mathbf{1}_n. \] Predictions follow \[ \widehat{Y}_\* = K_{\*,c}\,A + \mathbf{1}_{n_\*}\,\mu_Y^\top, \] where \(\mu_Y\) is the vector of training response means.
In bigPLSR, these are stored as: dual_coef
(\(A\)), k_colmeans (\(\bar{k}\)), k_mean (\(\bar{\bar{k}}\)), y_means
(\(\mu_Y\)), and X_ref
(dense training inputs). The RKHS branch of
predict.big_plsr() uses the same formula.
pls_fit(algorithm="simpls", backend="arma")
└─ .Call("_bigPLSR_cpp_simpls_from_cross")
pls_fit(algorithm="simpls", backend="bigmem")
├─ .Call("_bigPLSR_cpp_bigmem_cross")
├─ .Call("_bigPLSR_cpp_simpls_from_cross")
└─ .Call("_bigPLSR_cpp_stream_scores_given_W")
pls_fit(algorithm="nipals", backend="arma")
└─ cpp_dense_plsr_nipals()
pls_fit(algorithm="nipals", backend="bigmem")
└─ big_plsr_stream_fit_nipals()
pls_fit(algorithm="kernelpls"/"widekernelpls")
└─ .kernel_pls_core() (R)
pls_fit(algorithm="rkhs", backend="arma")
└─ .Call("_bigPLSR_cpp_kpls_rkhs_dense")
pls_fit(algorithm="rkhs", backend="bigmem")
└─ .Call("_bigPLSR_cpp_kpls_rkhs_bigmem")
pls_fit(algorithm="klogitpls", backend="arma")
└─ .Call("_bigPLSR_cpp_klogit_pls_dense")
pls_fit(algorithm="klogitpls", backend="bigmem")
└─ .Call("_bigPLSR_cpp_klogit_pls_bigmem")
pls_fit(algorithm="sparse_kpls")
└─ .Call("_bigPLSR_cpp_sparse_kpls_dense")
pls_fit(algorithm="rkhs_xy")
└─ .Call("_bigPLSR_cpp_rkhs_xy_dense")
pls_fit(algorithm="kf_pls")
└─ .Call("_bigPLSR_cpp_kf_pls_stream")