--- title: "Benchmarking bigPLSR against external PLS implementations" shorttitle: "Benchmarking bigPLSR" author: - name: "Frédéric Bertrand" affiliation: - Cedric, Cnam, Paris email: frederic.bertrand@lecnam.net date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Benchmarking bigPLSR against external PLS implementations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/benchmark-short-", fig.width = 6, fig.height = 4, dpi = 150, message = FALSE, warning = FALSE ) LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE") ``` ## Overview This vignette documents how the `bigPLSR` implementations compare to external partial least squares (PLS) libraries in terms of runtime and memory use, using the pre computed dataset `external_pls_benchmarks`. The goals are: * to summarise the benchmark design, * to visualise runtime and memory behaviour for comparable problem sizes, * to provide short comments that can be reused in papers or reports. ## Benchmark design The dataset `external_pls_benchmarks` is a data frame that contains benchmark results for both PLS1 and PLS2 problems and several algorithms. ```{r, eval=LOCAL, cache=TRUE} library(bigPLSR) library(ggplot2) library(dplyr) library(tidyr) data("external_pls_benchmarks", package = "bigPLSR") str(external_pls_benchmarks) ``` The main columns are: * `task`: `"pls1"` or `"pls2"`, * `algorithm`: one of `"simpls"`, `"nipals"`, `"kernelpls"`, `"widekernelpls"`, * `package`: implementation provider (for example `"bigPLSR"`, `"pls"`, `"mixOmics"`), * `median_time_s`: median runtime in seconds reported by `bench::mark`, * `itr_per_sec`: iterations per second, * `mem_alloc_bytes`: memory allocated in bytes, * `n`, `p`, `q`: number of observations, predictors and responses, * `ncomp`: number of components, * `notes`: optional free text description. For most plots in this vignette we focus on configurations that are directly comparable, namely fixed `task`, `n`, `p`, `q` and `ncomp`. ## Helper summaries We start with a compact summary that reports the best implementation for each configuration in terms of runtime and memory. ```{r, eval=LOCAL, cache=TRUE} summ_best <- external_pls_benchmarks %>% group_by(task, n, p, q, ncomp) %>% mutate( rank_time = rank(median_time_s, ties.method = "min"), rank_mem = rank(mem_alloc_bytes, ties.method = "min") ) %>% ungroup() best_time <- summ_best %>% filter(rank_time == 1L) %>% count(task, package, algorithm, name = "n_best_time") best_mem <- summ_best %>% filter(rank_mem == 1L) %>% count(task, package, algorithm, name = "n_best_mem") best_time best_mem ``` These two tables indicate in how many configurations a given combination `package + algorithm` comes out as the fastest or the most memory efficient. ## Example: PLS1, fixed size, varying components In order to avoid mixing problem sizes, we select a single PLS1 configuration and plot runtime and memory as functions of the number of components. You can adjust the filters below to match the sizes of interest for your work. ```{r, eval=LOCAL, cache=TRUE} example_pls1 <- external_pls_benchmarks %>% filter(task == "pls1") %>% group_by(n, p, q) %>% filter(n == first(n), p == first(p), q == first(q)) %>% ungroup() example_pls1_size <- example_pls1 %>% count(n, p, q, sort = TRUE) %>% slice(1L) %>% select(n, p, q) example_pls1 <- external_pls_benchmarks %>% semi_join(example_pls1_size, by = c("n", "p", "q")) %>% filter(task == "pls1") ``` Runtime comparison for this fixed size: ```{r, eval=LOCAL, cache=TRUE} ggplot(example_pls1, aes(x = ncomp, y = median_time_s, colour = package, linetype = algorithm)) + geom_line() + geom_point() + scale_y_log10() + labs( x = "Number of components", y = "Median runtime (seconds, log scale)", title = "PLS1 benchmark, fixed (n, p, q)", subtitle = "Comparison across packages and algorithms" ) + theme_minimal() ``` Memory use for the same configuration: ```{r, eval=LOCAL, cache=TRUE} ggplot(example_pls1, aes(x = ncomp, y = mem_alloc_bytes / 1024^2, colour = package, linetype = algorithm)) + geom_line() + geom_point() + labs( x = "Number of components", y = "Memory allocated (MiB)", title = "PLS1 benchmark, fixed (n, p, q)" ) + theme_minimal() ``` These figures are exported as SVG by default, so they can be included directly in LaTeX or HTML documents. ## Example: PLS2, fixed size, varying components We repeat the same idea for a PLS2 setting. ```{r, eval=LOCAL, cache=TRUE} example_pls2 <- external_pls_benchmarks %>% filter(task == "pls2") %>% group_by(n, p, q) %>% filter(n == first(n), p == first(p), q == first(q)) %>% ungroup() example_pls2_size <- example_pls2 %>% count(n, p, q, sort = TRUE) %>% slice(1L) %>% select(n, p, q) example_pls2 <- external_pls_benchmarks %>% semi_join(example_pls2_size, by = c("n", "p", "q")) %>% filter(task == "pls2") ``` ```{r, eval=LOCAL, cache=TRUE} ggplot(example_pls2, aes(x = ncomp, y = median_time_s, colour = package, linetype = algorithm)) + geom_line() + geom_point() + scale_y_log10() + labs( x = "Number of components", y = "Median runtime (seconds, log scale)", title = "PLS2 benchmark, fixed (n, p, q)", subtitle = "Comparison across packages and algorithms" ) + theme_minimal() ``` ```{r, eval=LOCAL, cache=TRUE} ggplot(example_pls2, aes(x = ncomp, y = mem_alloc_bytes / 1024^2, colour = package, linetype = algorithm)) + geom_line() + geom_point() + labs( x = "Number of components", y = "Memory allocated (MiB)", title = "PLS2 benchmark, fixed (n, p, q)" ) + theme_minimal() ``` ## Short commentary From these plots and the summary tables you can usually observe the following patterns. * On small to moderate PLS1 problems, the dense `bigPLSR` SIMPLS backend is typically close to `pls::simpls` in terms of runtime, while favouring more explicit memory control. * On larger PLS1 and PLS2 configurations, the big memory streaming backends trade a small runtime penalty for a bounded memory footprint that does not depend on the number of observations. * Kernel based algorithms tend to react more strongly to increases in `n` or `ncomp` because the underlying Gram matrices scale quadratically in `n`. Because the benchmarks are stored as a regular data frame, you can easily produce additional figures adapted to your application areas, for example by fixing `n` and `q` and varying `p`, or by comparing only one or two algorithms at a time.