--- title: "Simple global models" author: - name: Leonardo Ramirez-Lopez email: ramirez.lopez.leo@gmail.com date: today bibliography: resemble.bib csl: elsevier-harvard.csl format: html: toc: true toc-depth: 3 number-sections: true toc-location: left code-overflow: wrap smooth-scroll: true html-math-method: mathjax knitr: opts_chunk: class-output: "ansi" vignette: > %\VignetteIndexEntry{5 Simple global models} %\VignetteEncoding{UTF-8} %\VignetteEngine{quarto::html} --- ```{r} #| echo: false options(cli.num_colors = 1) Sys.setenv(OMP_NUM_THREADS = 2) ``` :::: {.columns} ::: {.column width="70%"} > *It is vain to do with more what can be done with less* -- William of Ockham ::: ::: {.column width="30%"} ::: :::: ```{r} #| message: false #| echo: false library(resemble) ``` # Introduction The `model()` function provides a simple interface for building global calibration models using partial least squares (PLS) or Gaussian process regression (GPR). Unlike `mbl()`, which builds local models for each prediction, `model()` fits a single model to the entire reference set. # Fitting methods Regression methods are specified via constructor functions. The available methods are: | Constructor | Method | Description | Reference | |-------------|--------|-------------|-----------| | `fit_pls(method = "pls")` | PLS | Non-linear iterative partial least squares (NIPALS) | @wold1975soft | | `fit_pls(method = "simpls")` | SIMPLS | Straightforward implementation of PLS | @de1993simpls | | `fit_pls(method = "mpls")` | Modified PLS | Uses correlation instead of covariance for weights | @shenk1991populations | | `fit_wapls(method = "mpls")` | Weighted average PLS | Predictions are a weighted average of multiple models from multiple PLS factors. __NOT YET AVAIABLE FOR THE__ `model()` __FUNCTION__ | @shenk1997investigation | | `fit_gpr()` | GPR | Gaussian process regression with dot product kernel | @rasmussen2006gaussian | ```{r} #| label: fit-methods # PLS with 15 components fit_pls(ncomp = 15) # SIMPLS with scaling fit_pls(ncomp = 15, method = "simpls", scale = TRUE) # mPLS with scaling fit_pls(ncomp = 15, method = "mpls") # GPR with default noise variance fit_gpr() ``` # Example ## Data preparation ```{r} #| label: data-prep #| message: false library(prospectr) wavs <- as.numeric(colnames(NIRsoil$spc)) # Preprocess: detrend + first derivative NIRsoil$spc_pr <- savitzkyGolay( detrend(NIRsoil$spc, wav = wavs), m = 1, p = 1, w = 7 ) # Split data train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ] train_y <- NIRsoil$Ciso[NIRsoil$train == 1] test_x <- NIRsoil$spc_pr[NIRsoil$train == 0, ] test_y <- NIRsoil$Ciso[NIRsoil$train == 0] # Remove missing values ok_train <- !is.na(train_y) ok_test <- !is.na(test_y) train_x <- train_x[ok_train, ] train_y <- train_y[ok_train] test_x <- test_x[ok_test, ] test_y <- test_y[ok_test] ``` ## Fitting a PLS model ```{r} #| label: fit-pls set.seed(1124) # guarantee same CV splits for all methods pls_mod <- model( Xr = train_x, Yr = train_y, fit_method = fit_pls(ncomp = 12, method = "mpls", scale = TRUE), control = model_control(validation_type = "lgo", number = 10) ) pls_mod ``` ## Fitting a GPR model ```{r} #| label: fit-gpr set.seed(1124) # guarantee same CV splits for all methods gpr_mod <- model( Xr = train_x, Yr = train_y, fit_method = fit_gpr(noise_variance = 0.5), control = model_control(validation_type = "lgo", number = 10) ) gpr_mod ``` ## Prediction ```{r} #| label: predict # Predict using optimal number of components (from CV) pls_pred <- predict(pls_mod, newdata = test_x) # Predict with GPR gpr_pred <- predict(gpr_mod, newdata = test_x) # Compare predictions data.frame( observed = test_y, pls = pls_pred[, which(pls_mod$cv_results$optimal)], gpr = as.vector(gpr_pred) ) |> head(10) ``` ## Validation statistics ```{r} #| label: validation # Function to compute stats eval_pred <- function(obs, pred) { data.frame( rmse = sqrt(mean((obs - pred)^2)), r2 = cor(obs, pred)^2 ) } ``` ```{r} ## PLS evaluation eval_pred(test_y, pls_pred[, which(pls_mod$cv_results$optimal)]) ``` ```{r} ## GPR evaluation eval_pred(test_y, gpr_pred) ``` # References {-}