bigPLSR, PLS Regression Models with Big Matrices

Frédéric Bertrand and Myriam Maumy

R-CMD-check R-hub

bigPLSR provides fast, scalable Partial Least Squares (PLS) with two execution backends:

Both PLS1 (single response) and PLS2 (multi-response) are supported. PLS2 uses SIMPLS on cross-products in both backends for numerical parity.

Recent updates bring additional solvers and tooling:

The package is set up to be CRAN-friendly: the optional CBLAS fast path is off by default.

Support for parallel computation and GPU is being developed.

This website and these examples were created by F. Bertrand and M. Maumy.

Installation

You can install the released version of bigPLSR from CRAN with:

install.packages("bigPLSR")

You can install the development version of bigPLSR from github with:

devtools::install_github("fbertran/bigPLSR")

Quick start

library(bigPLSR)

set.seed(1)
n <- 200; p <- 50
X <- matrix(rnorm(n*p), n, p)
y <- X[,1]*2 - X[,2] + rnorm(n)

# Dense PLS1 (fast)
fit <- pls_fit(X, y, ncomp = 3, backend = "arma", scores = "r")
str(list(
  coef=dim(fit$coefficients),
  scores=dim(fit$scores),
  ncomp=fit$ncomp
))

Big-matrix PLS1 with file-backed scores

options_val_before <- options("bigmemory.allow.dimnames")
options(bigmemory.allow.dimnames=TRUE)

bmX <- bigmemory::as.big.matrix(X)
bmy <- bigmemory::as.big.matrix(matrix(y, n, 1))

tmp=tempdir()
if(file.exists(paste(tmp,"scores.desc",sep="/"))){unlink(paste(tmp,"scores.desc",sep="/"))}
if(file.exists(paste(tmp,"scores.bin",sep="/"))){unlink(paste(tmp,"scores.bin",sep="/"))}
sink <- bigmemory::filebacked.big.matrix(
  nrow=n, ncol=3, type="double",
  backingfile="scores.bin",
  backingpath=tmp,
  descriptorfile="scores.desc"
)

fit_b <- pls_fit(
  bmX, bmy, ncomp=3, backend="bigmem", scores="big",
  scores_target="existing", scores_bm=sink,
  scores_colnames = c("t1","t2","t3"),
  return_scores_descriptor = TRUE
)

fit_b$scores_descriptor  # big.matrix.descriptor
options(bigmemory.allow.dimnames=options_val_before)

PLS2 (multi-response)

set.seed(2)
m <- 3
B <- matrix(rnorm(p*m), p, m)
Y <- scale(X, scale = FALSE) %*% B + matrix(rnorm(n*m, sd = 0.1), n, m)

# Dense PLS2 – SIMPLS on cross-products (parity with bigmem)
fit2 <- pls_fit(X, Y, ncomp = 2, backend = "arma", mode = "pls2", scores = "none")
str(list(coef=dim(fit2$coefficients), ncomp=fit2$ncomp))

API

pls_fit(
  X, y, ncomp,
  tol = 1e-8,
  backend = c("auto", "arma", "bigmem"),
  scores  = c("none", "r", "big"),
  chunk_size = 10000L,
  scores_name = "scores",
  mode = c("auto","pls1","pls2"),
  scores_target = c("auto","new","existing"),
  scores_bm = NULL,
  scores_backingfile = NULL,
  scores_backingpath = NULL,
  scores_descriptorfile = NULL,
  scores_colnames = NULL,
  return_scores_descriptor = FALSE
)

Auto selection

Return values


Backends & algorithms

Dense path (backend = "arma")

Big-matrix path (backend = "bigmem")

Both paths enforce symmetry (0.5*(M+Mᵀ)) before eigen and use a small ridge on XtX for stability.


Scores, sinks, and descriptors


Determinism (tests & reproducibility)

For tight parity tests, force 1 BLAS thread and fix RNG:

set.seed(1)
if (requireNamespace("RhpcBLASctl", quietly = TRUE)) {
  RhpcBLASctl::blas_set_num_threads(1L)
} else {
  # Use env vars before BLAS loads in the session
  Sys.setenv(
    OMP_NUM_THREADS="1",
    OPENBLAS_NUM_THREADS="1",
    MKL_NUM_THREADS="1",
    VECLIB_MAXIMUM_THREADS="1",
    BLIS_NUM_THREADS="1"
  )
}

Performance tuning


Optional CBLAS fast path (in-place GEMM)

Default: OFF (CRAN-safe).
An optional in-place accumulation (true beta = 1 CBLAS dgemm) is available and guarded by compile-time checks. When not available or not enabled, the package falls back automatically to the portable Armadillo path.

Enable locally (Unix/macOS):

R CMD INSTALL .   --configure-vars="PKG_CPPFLAGS='-DBIGPLSR_USE_CBLAS'"

In src/Makevars, link to the same BLAS/LAPACK that R uses:

PKG_LIBS += $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)

Windows: leave the macro off unless you’ve explicitly provided CBLAS headers/libs.


Development


Citation

If you use bigPLSR in academic work, please cite this package and the relevant PLS method used.