--- title: "Getting Started with plsRglm" author: - "Frederic Bertrand" - "Myriam Maumy-Bertrand" date: "March 2026" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Getting Started with plsRglm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, fig.width = 7, fig.height = 5, dpi = 150 ) set.seed(123) library(plsRglm) ``` `plsRglm` provides partial least squares regression for linear and generalized linear models, repeated k-fold cross-validation, bootstrap utilities, and support for incomplete predictor matrices. This vignette is the practical starting point for the current package API. The companion vignette `vignette("plsRglm", package = "plsRglm")` keeps the longer historical case studies and algorithmic notes. # Core Fitting Workflows `plsR()` is the dedicated interface for ordinary PLS regression. `plsRglm()` extends the same ideas to generalized linear and ordinal models, and can also fit `modele = "pls"` through the shared interface. ## Linear PLS with matrix and formula interfaces ```{r linear-pls} data(Cornell) XCornell <- Cornell[, 1:7] yCornell <- Cornell$Y pls_fit_matrix <- plsR(yCornell, XCornell, nt = 3, verbose = FALSE) pls_fit_formula <- plsR(Y ~ ., data = Cornell, nt = 3, pvals.expli = TRUE, verbose = FALSE) pls_fit_formula$InfCrit coef(pls_fit_formula) ``` The fitted model stores the extracted components (`tt`), the loadings (`pp`), the coefficients on the original predictors (`Coeffs`), and information-criterion summaries (`InfCrit`). ## Generalized PLS models ```{r glm-fits} data(aze_compl) logit_fit <- plsRglm(y ~ ., data = aze_compl, nt = 3, modele = "pls-glm-logistic", verbose = FALSE) logit_fit$InfCrit head(predict(logit_fit, type = "response")) family_fit <- plsRglm( Y ~ ., data = Cornell, nt = 2, modele = "pls-glm-family", family = gaussian(link = "log"), verbose = FALSE ) family_fit$family$family family_fit$family$link ``` `plsRglm()` supports predefined model shortcuts together with a custom-family entry point: ```{r supported-modes, eval = FALSE} plsRglm(Y ~ ., data = Cornell, nt = 3, modele = "pls") plsRglm(Y ~ ., data = Cornell, nt = 3, modele = "pls-glm-gaussian") plsRglm(Y ~ ., data = Cornell, nt = 3, modele = "pls-glm-inverse.gaussian") plsRglm(y ~ ., data = aze_compl, nt = 3, modele = "pls-glm-logistic") data(pine) plsRglm(round(x11) ~ ., data = pine, nt = 3, modele = "pls-glm-poisson") plsRglm(x11 ~ ., data = pine, nt = 3, modele = "pls-glm-Gamma") plsRglm(Quality ~ ., data = bordeaux, nt = 2, modele = "pls-glm-polr") plsRglm( Y ~ ., data = Cornell, nt = 3, modele = "pls-glm-family", family = gaussian(link = "log") ) ``` Ordinal responses are handled through `modele = "pls-glm-polr"`. As with `MASS::polr()`, the response should be an ordered factor: ```{r polr-fit} data(bordeaux) bordeaux$Quality <- factor(bordeaux$Quality, ordered = TRUE) polr_fit <- plsRglm(Quality ~ ., data = bordeaux, nt = 2, modele = "pls-glm-polr", verbose = FALSE) head(predict(polr_fit, type = "class")) ``` # Cross-Validation and Model Choice Use `cv.plsR()` for ordinary PLS regression and `cv.plsRglm()` for generalized models. Both provide repeated k-fold cross-validation and integrate with `summary()` and `cvtable()`. ```{r linear-cv} cv_pls <- cv.plsR(Y ~ ., data = Cornell, nt = 3, K = 4, NK = 2, verbose = FALSE) cv_pls_summary <- cvtable(summary(cv_pls)) cv_pls_summary plot(cv_pls_summary) ``` ```{r glm-cv} cv_logit <- cv.plsRglm( y ~ ., data = aze_compl, nt = 3, K = 4, NK = 2, modele = "pls-glm-logistic", verbose = FALSE ) cv_logit_summary <- cvtable(summary(cv_logit, MClassed = TRUE)) cv_logit_summary plot(cv_logit_summary) ``` For generalized models, `summary(..., MClassed = TRUE)` exposes miss-classification information when it is relevant. # Prediction and Missing Data Incomplete predictor matrices are a core package feature, both during fitting and during prediction. ```{r prediction-missing} data(pine) data(pine_sup) data(pineNAX21) pred_fit <- plsRglm( x11 ~ ., data = pine, nt = 3, modele = "pls-glm-family", family = gaussian(), verbose = FALSE ) pine_sup_small <- pine_sup[1:3, 1:10] pine_sup_small[1, 1] <- NA predict(pred_fit, newdata = pine_sup_small, type = "response", methodNA = "missingdata") predict(pred_fit, newdata = pine_sup_small, type = "scores", methodNA = "missingdata") missing_train_fit <- plsR(x11 ~ ., data = pineNAX21, nt = 3, verbose = FALSE) missing_train_fit$na.miss.X ``` When `newdata` contains incomplete rows, `methodNA = "missingdata"` treats all prediction rows with the missing-data scoring rule, while `methodNA = "adaptative"` switches between complete-row and incomplete-row formulas automatically. # Bootstrap Utilities `bootpls()` and `bootplsglm()` wrap the `boot` package for PLS and PLS-GLM models. The default resampling schemes differ: - `bootpls()` defaults to `(y, X)` resampling with `typeboot = "plsmodel"`. - `bootplsglm()` defaults to `(y, T)` resampling with `typeboot = "fmodel_np"`. For a lightweight vignette render, the examples below use a small number of resamples and request non-BCa confidence intervals. ```{r bootstrap} boot_pls <- bootpls(pls_fit_formula, R = 20, verbose = FALSE) dim(boot_pls$t) confints.bootpls(boot_pls, indices = 2:4, typeBCa = FALSE) boot_logit <- bootplsglm(logit_fit, R = 20, verbose = FALSE) dim(boot_logit$t) confints.bootpls(boot_logit, indices = 1:4, typeBCa = FALSE) ``` The plotting helpers `boxplots.bootpls()` and `plots.confints.bootpls()` can be applied directly to these bootstrap objects when a graphical summary is helpful. # Further Reading - Use `vignette("plsRglm", package = "plsRglm")` for the historical applications and algorithmic note. - Use the function help pages for lower-level weighted constructors such as `PLS_lm_wvc()` and `PLS_glm_wvc()`. - Use `?cv.plsRglm`, `?bootplsglm`, and `?predict.plsRglmmodel` for the full argument reference. ```{r session-information} sessionInfo() ```