We choose between XtX (SIMPLS), XX^T (wide-kernel), and NIPALS using \((n,p)\) and a RAM budget.
# In pls_fit(), after arg parsing:
if (identical(algo_in, "auto")) {
algo_in <- .choose_algorithm_auto(backend, X, y, ncomp)
}
.mem_bytes <- function() {
gb <- getOption("bigPLSR.mem_budget_gb", 8)
as.numeric(gb) * (1024^3)
}
.dims_of <- function(X) {
if (inherits(X, "big.matrix")) c(nrow(X), ncol(X)) else c(NROW(X), NCOL(X))
}
.choose_algorithm_auto <- function(backend, X, y, ncomp) {
is_big_local <- inherits(X, "big.matrix") || inherits(X, "big.matrix.descriptor")
dims <- .dims_of(X); n <- as.integer(dims[1]); p <- as.integer(dims[2])
B <- .mem_bytes()
bytes <- 8
need_XtX <- bytes * as.double(p) * as.double(p) # bytes for p x p
need_XXt <- bytes * as.double(n) * as.double(n) # bytes for n x n
can_XtX <- need_XtX <= M
can_XXt <- need_XXt <= M
shape_XtX <- (p <= 4L * n)
shape_XXt <- (n <= 4L * p)
if (can_XtX && shape_XtX) {
algo_in <- "simpls"
} else if (can_XXt && shape_XXt) {
algo_in <- "widekernelpls"
} else {
algo_in <- "nipals"
}
}Users can override the memory budget:
bigPLSR::pls_fit() can automatically choose an algorithm
based on problem shape and a user-configurable
memory budget:
p × p cross-product fits in memory.XtX
does not fit but XXt (n × n) does.XtX
nor XXt comfortably fit.This selection only applies when algorithm = "auto" (the
default). Any explicit algorithm = overrides the
decision.
p×p or n×n).XtX is efficient when p is moderate;
using XXt is efficient for “wide” problems
(p ≫ n) but still bounded by n^2 memory.big.memory with fixed working
memory; it is the safe fallback when memory is tight.Let the memory budget be B bytes (defaults to 8 GB,
configurable via options(bigPLSR.mem_budget_gb = ...)).
With doubles (8 bytes), we estimate the size of each symmetric matrix
as:
need_XtX = 8 * p^2need_XXt = 8 * n^2Then:
if (can_XtX && shape_XtX) { algo_in <- "simpls"}. # XtX
if (can_XXt && shape_XXt) { algo_in <- "widekernelpls"}. XXt (a.k.a. "kernel" route)
else { algorithm <- "nipals"} # streaming
This does not change R’s actual memory limit; it only controls the selection.
For tight numerical parity in tests:
library(bigPLSR)
n <- 2e3; p <- 5e2
X <- matrix(rnorm(n*p), n, p)
y <- X[,1] - 0.5*X[,2] + rnorm(n)
# Auto will likely pick SIMPLS (XtX) here
fit <- pls_fit(X, y, ncomp = 10, algorithm = "auto")
fit$algorithm # "simpls"Wide case:
n <- 200; p <- 4000
X <- matrix(rnorm(n*p), n, p)
y <- rnorm(n)
# If budget is small, auto picks kernel (XXt) or NIPALS
options(bigPLSR.mem_budget_gb = 2) # small budget
fit <- pls_fit(X, y, ncomp = 5, algorithm = "auto")
fit$algorithm # "kernelpls" or "nipals" depending on n^2 vs budgetBig-matrix streaming:
For column blocks \(J\), \[ K \approx \sum_{J} X_{[:,J]} X_{[:,J]}^\top,\quad (Kv) \leftarrow (Kv) + X_{[:,J]} \big(X_{[:,J]}^\top v\big). \]
For row blocks \(B\), \[ K \approx \sum_{B} X_B X^\top,\quad (Kv) \leftarrow (Kv) + X_B \big(X^\top v\big)_B. \]
Center on the fly: \(H K H v = K v - \tfrac{1}{n}\mathbf{1}\mathbf{1}^\top K v - \tfrac{1}{n}K\mathbf{1}\mathbf{1}^\top v + \tfrac{1}{n^2}\mathbf{1}\mathbf{1}^\top K \mathbf{1}\,\mathbf{1}^\top v\). Maintain the needed aggregated vectors once per pass.