Type: | Package |
Title: | Principal Component Analysis for 'bigmemory' Matrices |
Version: | 0.9.0 |
Date: | 2025-10-02 |
Author: | Frederic Bertrand [aut, cre] |
Maintainer: | Frederic Bertrand <frederic.bertrand@lecnam.net> |
Description: | High performance principal component analysis routines that operate directly on 'bigmemory::big.matrix' objects. The package avoids materialising large matrices in memory by streaming data through 'BLAS' and 'LAPACK' kernels and provides helpers to derive scores, loadings, correlations, and contribution diagnostics, including utilities that stream results into 'bigmemory'-backed matrices for file-based workflows. Additional interfaces expose 'scalable' singular value decomposition, robust PCA, and robust SVD algorithms so that users can explore large matrices while tempering the influence of outliers. 'Scalable' principal component analysis is also implemented, Elgamal, Yabandeh, Aboulnaga, Mustafa, and Hefeeda (2015) <doi:10.1145/2723372.2751520>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp (≥ 1.0.12), methods, withr |
Suggests: | knitr, rmarkdown, bigmemory, ggplot2, testthat (≥ 3.0.0) |
LinkingTo: | Rcpp, bigmemory, BH |
Encoding: | UTF-8 |
VignetteBuilder: | knitr |
LazyLoad: | yes |
NeedsCompilation: | yes |
URL: | https://fbertran.github.io/bigPCAcpp/, https://github.com/fbertran/bigPCAcpp/ |
BugReports: | https://github.com/fbertran/bigPCAcpp/issues/ |
RoxygenNote: | 7.3.3 |
Config/testthat/edition: | 3 |
Packaged: | 2025-10-14 21:24:55 UTC; bertran7 |
Repository: | CRAN |
Date/Publication: | 2025-10-20 19:20:07 UTC |
bigPCAcpp: Principal Component Analysis for bigmemory Matrices
Description
The bigPCAcpp package provides high-performance principal component analysis
routines that work directly with bigmemory::big.matrix
objects. Data are
streamed through BLAS and LAPACK kernels so large, file-backed matrices can be
analysed without materialising dense copies in R. Companion helpers compute
scores, loadings, correlations, and contributions, including streaming
variants that write results to bigmemory::big.matrix
destinations used by
file-based pipelines.
Author(s)
Maintainer: Frederic Bertrand frederic.bertrand@lecnam.net
See Also
pca_bigmatrix()
, pca_stream_bigmatrix()
Examples
library(bigmemory)
mat <- as.big.matrix(matrix(rnorm(20), nrow = 5))
result <- pca_bigmatrix(mat)
result$sdev
Benchmark timings for bigPCAcpp methods
Description
A dataset summarising wall-clock performance for the main PCA entry points
in bigPCAcpp across matrices of increasing size. The benchmarks compare
the classical block-based decomposition, the streaming variant that writes
rotations as it progresses, the scalable stochastic PCA implementation, and
the base R stats::prcomp()
routine for reference.
Format
A data frame with 360 rows and 14 columns:
- dataset
Human-readable size label ("small", "medium", "large", "xlarge").
- rows
Number of rows in the simulated matrix.
- cols
Number of columns in the simulated matrix.
- ncomp
Number of components requested.
- method
Computation strategy ("classical", "streaming", "scalable", or "prcomp").
- replicate
Replication index for repeated runs.
- user_time
User CPU time in seconds returned by
base::system.time()
.- system_time
System CPU time in seconds.
- elapsed
Elapsed (wall-clock) time in seconds.
- success
Logical flag indicating whether the run completed without errors.
- backend
Name of the backend reported by the result object when the computation succeeded.
- iterations
Recorded iteration count for iterative methods when available (otherwise
NA
).- converged
Logical convergence flag for iterative methods when available.
- error
Error message captured for failed runs (otherwise
NA
).
Source
Generated by scripts/run_benchmark.R
using randomly simulated
in-memory matrices (no file-backed storage).
Examples
data(benchmark_results)
BigPCA result objects
Description
Results returned by pca_bigmatrix()
, pca_stream_bigmatrix()
, and
pca_robust()
inherit from the bigpca
class. The objects store the
component standard deviations, rotation/loadings, and optional scores while
recording which computational backend produced them. Standard S3 generics
such as summary()
and plot()
are implemented for convenience.
bigpca
objects are lists produced by pca_bigmatrix()
,
pca_stream_bigmatrix()
, pca_robust()
, and related helpers. They mirror
the structure of base R's prcomp()
outputs while tracking additional
metadata for large-scale and streaming computations.
#' @seealso pca_bigmatrix()
, pca_stream_bigmatrix()
, pca_robust()
,
pca_plot_scree()
, pca_plot_scores()
, pca_plot_contributions()
,
pca_plot_correlation_circle()
, and pca_plot_biplot()
.
Components
sdev
Numeric vector of component standard deviations.
rotation
Numeric matrix whose columns contain the variable loadings (principal axes).
center
,scale
Optional numeric vectors describing the centring and scaling applied to each variable when fitting the model.
scores
Optional numeric matrix of principal component scores when computed alongside the decomposition.
column_sd
Numeric vector of marginal standard deviations for each input variable.
eigenvalues
Numeric vector of eigenvalues associated with the retained components.
explained_variance
,cumulative_variance
Numeric vectors summarising the fraction of variance explained by individual components and the corresponding cumulative totals.
covariance
Sample covariance matrix used to derive the components.
nobs
Number of observations used in the decomposition.
The class also records the computation backend via
attr(x, "backend")
, enabling downstream methods to adjust their
behaviour for streamed or robust results.
See Also
pca_bigmatrix()
, pca_stream_bigmatrix()
, summary.bigpca()
,
print.summary.bigpca()
, plot.bigpca()
Internal constructors and S3 methods for bigPCAcpp results
Description
These helpers provide a light-weight S3 layer around the PCA outputs so
users can interact with them through familiar generics such as
summary()
and plot()
.
Usage
new_bigpca_result(result, backend)
Principal component analysis for bigmemory::big.matrix
inputs
Description
Perform principal component analysis (PCA) directly on a
bigmemory::big.matrix
without copying the data into R memory. The
exported helpers mirror the structure of base R's prcomp()
while avoiding
the need to materialise large matrices.
Usage
resolve_big_pointer(x, arg, allow_null = FALSE)
pca_scores_bigmatrix(
xpMat,
rotation,
center,
scale,
ncomp = -1L,
block_size = 1024L
)
pca_variable_loadings(rotation, sdev)
pca_variable_correlations(rotation, sdev, column_sd, scale = NULL)
pca_variable_contributions(loadings)
pca_individual_contributions(scores, sdev, total_weight = NA_real_)
pca_individual_cos2(scores)
pca_variable_cos2(correlations)
## S3 method for class 'bigpca'
summary(object, ...)
## S3 method for class 'summary.bigpca'
print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'bigpca'
plot(
x,
y,
type = c("scree", "contributions", "correlation_circle", "biplot"),
max_components = 25L,
component = 1L,
top_n = 20L,
components = c(1L, 2L),
data = NULL,
draw = TRUE,
...
)
Arguments
x |
A |
arg |
Character string naming the argument being validated. Used to construct informative error messages. |
allow_null |
Logical flag indicating whether |
xpMat |
Either a |
rotation |
A rotation matrix such as the |
center |
For |
scale |
Optional numeric vector of scaling factors returned by
|
ncomp |
Number of components to retain. Use a non-positive value to keep all components returned by the decomposition. |
block_size |
Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory. |
sdev |
A numeric vector of component standard deviations, typically the
|
column_sd |
A numeric vector with the marginal standard deviation of
each original variable. When |
loadings |
A numeric matrix such as the result of
|
scores |
For |
total_weight |
Optional positive scalar giving the effective number of
observations to use when computing contributions. Defaults to the number of
rows in |
correlations |
For |
object |
A |
... |
Additional arguments passed to plotting helpers. |
digits |
Number of significant digits to display when printing importance metrics. |
y |
Currently unused. |
type |
The plot to draw. Options include "scree" (variance explained), "contributions" (top contributing variables), "correlation_circle" (variable correlations with selected components), and "biplot" (joint display of scores and loadings). |
max_components |
Maximum number of components to display in scree plots. |
component |
Component index to highlight when drawing contribution plots. |
top_n |
Number of variables to display in contribution plots. |
components |
Length-two integer vector selecting the components for correlation circle and biplot views. |
data |
Optional data source (matrix, data frame, |
draw |
Logical; if |
Value
For pca_bigmatrix()
, a bigpca
object mirroring a prcomp
result
with elements sdev
, rotation
, optional center
and scale
vectors,
column_sd
, eigenvalues
, explained_variance
, cumulative_variance
, and
the sample covariance matrix. The object participates in S3 generics such as
summary()
and plot()
.
A numeric matrix of scores with rows corresponding to observations and columns to retained components.
A numeric matrix containing variable loadings for each component.
A numeric matrix of correlations between variables and components.
A numeric matrix where each entry represents the contribution of a variable to a component.
For summary.bigpca()
, a summary.bigpca
object containing
component importance measures.
Functions
-
pca_scores_bigmatrix()
: Project observations into principal component space while streaming from abig.matrix
. -
pca_variable_loadings()
: Compute variable loadings (covariances between original variables and components). -
pca_variable_correlations()
: Compute variable-component correlations given column standard deviations. -
pca_variable_contributions()
: Derive the relative contribution of each variable to the retained components. -
pca_individual_contributions()
: Compute the relative contribution of individual observations to each component. -
pca_individual_cos2()
: Compute squared cosine values measuring the quality of representation for individual observations. -
pca_variable_cos2()
: Compute squared cosine values measuring the quality of representation for variables. -
summary(bigpca)
: Summarise the component importance metrics for abigpca
result. -
print(summary.bigpca)
: Print the component importance summary produced bysummary.bigpca()
. -
plot(bigpca)
: Visualise PCA diagnostics such as scree, correlation circle, contribution, and biplot displays.
See Also
bigpca, pca_scores_bigmatrix()
, pca_variable_loadings()
,
pca_variable_correlations()
, pca_variable_contributions()
, and the
streaming variants pca_stream_bigmatrix()
and companions.
Examples
set.seed(123)
mat <- bigmemory::as.big.matrix(matrix(rnorm(40), nrow = 10))
pca <- pca_bigmatrix(mat, center = TRUE, scale = TRUE, ncomp = 3)
scores <- pca_scores_bigmatrix(mat, pca$rotation, pca$center, pca$scale, ncomp = 3)
loadings <- pca_variable_loadings(pca$rotation, pca$sdev)
correlations <- pca_variable_correlations(pca$rotation, pca$sdev, pca$column_sd, pca$scale)
contributions <- pca_variable_contributions(loadings)
list(scores = scores, loadings = loadings, correlations = correlations,
contributions = contributions)
PCA biplot helper
Description
Combines principal component scores and variable loadings in a single
scatter plot. The helper accepts both standard matrices and
bigmemory::big.matrix
inputs, extracting only the requested component
columns. When draw = TRUE
, the function scales the loadings to match the
score ranges, draws optional axes, overlays loading arrows, and labels
observations when requested.
Usage
pca_plot_biplot(
scores,
loadings,
components = c(1L, 2L),
draw = TRUE,
draw_axes = TRUE,
draw_arrows = TRUE,
label_points = FALSE,
...
)
Arguments
scores |
Matrix or |
loadings |
Matrix or |
components |
Integer vector of length two selecting the components to display. |
draw |
Logical; set to |
draw_axes |
Logical; when |
draw_arrows |
Logical; when |
label_points |
Logical; when |
... |
Additional graphical parameters passed to |
Value
A list containing the selected components
, extracted scores
,
original loadings
, scaled loadings (loadings_scaled
), and the applied
scale_factor
. The list is returned invisibly. When draw = TRUE
, a biplot
is produced using base graphics.
Plot variable contributions
Description
Highlights the variables that contribute most to a selected principal
component. The helper works with dense matrices returned by
pca_variable_contributions()
as well as with bigmemory::big.matrix
objects via sampling.
Usage
pca_plot_contributions(
contributions,
component = 1L,
top_n = 20L,
draw = TRUE,
...
)
Arguments
contributions |
Contribution matrix where rows correspond to variables and columns to components. |
component |
Integer index of the component to visualise. |
top_n |
Number of variables with the largest absolute contribution to include in the bar plot. |
draw |
Logical; set to |
... |
Additional arguments passed to |
Value
A data frame with the variables and their contributions is returned
invisibly. When draw = TRUE
, a bar plot of the top variables is produced.
Plot a PCA correlation circle
Description
Visualises the correlation between each variable and a pair of principal components. The variables are projected onto the unit circle, where points near the perimeter indicate strong correlation with the selected components.
Usage
pca_plot_correlation_circle(
correlations,
components = c(1L, 2L),
labels = NULL,
draw = TRUE,
...
)
Arguments
correlations |
Matrix or |
components |
Length-two integer vector specifying the principal components to display. |
labels |
Optional character vector specifying the labels to display for
each variable. When |
draw |
Logical; set to |
... |
Additional graphical parameters passed to |
Value
A data frame with variable
, PCx
, and PCy
columns representing the
projected correlations, where PCx
/PCy
correspond to the requested
component indices. The data frame is returned invisibly.
Plot sampled PCA scores
Description
Streams a subset of observations through the PCA rotation and plots their scores on the requested components. Sampling keeps the drawn subset small so graphics remain interpretable even when the source big matrix contains millions of rows.
Usage
pca_plot_scores(
x,
rotation,
center = numeric(),
scale = numeric(),
components = c(1L, 2L),
max_points = 5000L,
sample = c("uniform", "head"),
seed = NULL,
draw = TRUE,
...
)
Arguments
x |
Either a |
rotation |
A rotation matrix such as |
center |
Optional centering vector. Use |
scale |
Optional scaling vector. Use |
components |
Length-two integer vector selecting the principal components to display. |
max_points |
Maximum number of observations to sample for the plot. |
sample |
Strategy for selecting rows. |
seed |
Optional seed to make the sampling reproducible. |
draw |
Logical; set to |
... |
Additional graphical parameters forwarded to |
Value
A list containing indices
(the sampled row indices) and scores
(the corresponding score matrix) is returned invisibly. When draw = TRUE
a scatter plot is produced.
Scree plot for principal component importance
Description
Displays the proportion of variance explained by the leading principal components. The function caps the number of displayed components to keep the visualization legible on very high-dimensional problems.
Usage
pca_plot_scree(
pca_result,
max_components = 25L,
cumulative = TRUE,
draw = TRUE,
...
)
Arguments
pca_result |
A list created by |
max_components |
Maximum number of components to display. Defaults to 25 or the available number of components, whichever is smaller. |
cumulative |
Logical flag indicating whether to overlay the cumulative explained variance line. |
draw |
Logical; set to |
... |
Additional parameters passed to |
Value
A list with component
, explained
, and cumulative
vectors is
returned invisibly. When draw = TRUE
, the function produces a scree plot
using base graphics.
Plot PCA diagnostics for big data workflows
Description
These helpers visualise the results returned by pca_bigmatrix()
and its
companions without requiring users to materialise dense intermediate
structures. Each plotting function optionally samples the inputs so the
default output remains responsive even when the underlying big matrix spans
millions of observations.
See Also
pca_bigmatrix()
, pca_variable_loadings()
,
pca_variable_contributions()
Robust principal component analysis
Description
Compute principal component analysis (PCA) using robust measures of location and scale so that extreme observations have a reduced influence on the resulting components. The implementation centres each variable by its median and, when requested, scales by the median absolute deviation (MAD) before performing an iteratively reweighted singular value decomposition that down-weights observations with unusually large reconstruction errors.
Usage
pca_robust(x, center = TRUE, scale = FALSE, ncomp = NULL)
Arguments
x |
A numeric matrix, data frame, or an object coercible to a numeric matrix. Missing values are not supported. |
center |
Logical; should variables be centred by their median before applying PCA? |
scale |
Logical; when |
ncomp |
Number of components to retain. Use |
Value
A bigpca
object mirroring the structure of
pca_bigmatrix()
with robust estimates of location, scale, and
variance metrics.
Examples
set.seed(42)
x <- matrix(rnorm(50), nrow = 10)
x[1, 1] <- 25 # outlier
robust <- pca_robust(x, ncomp = 2)
robust$sdev
Scalable principal component analysis via streaming power iterations
Description
Implements the scalable PCA (sPCA) procedure of Elgamal et al. (2015), which uses block power iterations to approximate the leading principal components while streaming the data in manageable chunks. The algorithm only requires matrix-vector products, allowing large matrices to be processed without materialising the full cross-product in memory.
Implements the scalable PCA (sPCA) procedure of Elgamal et al. (2015), which uses block power iterations to approximate the leading principal components while streaming the data in manageable chunks. The algorithm only requires matrix-vector products, allowing large matrices to be processed without materialising the full cross-product in memory.
Implements the scalable PCA (sPCA) procedure of Elgamal et al. (2015), which uses block power iterations to approximate the leading principal components while streaming the data in manageable chunks. The algorithm only requires matrix-vector products, allowing large matrices to be processed without materialising the full cross-product in memory.
Usage
pca_spca(
x,
ncomp = NULL,
center = TRUE,
scale = FALSE,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_spca(
x,
ncomp = NULL,
center = TRUE,
scale = FALSE,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_spca_R(
x,
ncomp = NULL,
center = TRUE,
scale = FALSE,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_spca(
x,
ncomp = NULL,
center = TRUE,
scale = FALSE,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_spca_R(
x,
ncomp = NULL,
center = TRUE,
scale = FALSE,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
Arguments
x |
A numeric matrix, data frame, |
ncomp |
Number of principal components to retain. Use |
center |
Logical; should column means be subtracted before performing PCA? |
scale |
Logical; when |
block_size |
Number of rows to stream per block when computing column statistics and matrix-vector products. |
max_iter |
Maximum number of block power iterations. |
tol |
Convergence tolerance applied to the Frobenius norm of the difference between successive subspace projectors. |
seed |
Optional integer seed used to initialise the random starting basis. |
return_scores |
Logical; when |
verbose |
Logical; when |
Value
A bigpca
object containing the approximate PCA solution with the
same structure as pca_bigmatrix()
. The result includes component standard
deviations, rotation/loadings, optional scores, column statistics, and
variance summaries. Additional metadata is stored in
attr(result, "iterations")
(number of iterations performed),
attr(result, "tolerance")
(requested tolerance), and
attr(result, "converged")
(logical convergence flag).
A bigpca
object containing the approximate PCA solution with the
same structure as pca_bigmatrix()
. The result includes component standard
deviations, rotation/loadings, optional scores, column statistics, and
variance summaries. Additional metadata is stored in
attr(result, "iterations")
(number of iterations performed),
attr(result, "tolerance")
(requested tolerance), and
attr(result, "converged")
(logical convergence flag).
A bigpca
object containing the approximate PCA solution with the
same structure as pca_bigmatrix()
. The result includes component standard
deviations, rotation/loadings, optional scores, column statistics, and
variance summaries. Additional metadata is stored in
attr(result, "iterations")
(number of iterations performed),
attr(result, "tolerance")
(requested tolerance), and
attr(result, "converged")
(logical convergence flag).
References
Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda (2015). sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. doi:10.1145/2723372.2751520.
Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda (2015). sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. doi:10.1145/2723372.2751520.
Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda (2015). sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. doi:10.1145/2723372.2751520.
Streaming big.matrix PCA helpers
Description
Variants of the PCA helpers that stream results directly into
bigmemory::big.matrix
objects, enabling file-backed workflows without
materialising dense R matrices.
Usage
pca_spca_stream_bigmatrix(
xpMat,
xpRotation = NULL,
center = TRUE,
scale = FALSE,
ncomp = -1L,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_scores_stream_bigmatrix(
xpMat,
xpDest,
rotation,
center,
scale,
ncomp = -1L,
block_size = 1024L
)
pca_variable_loadings_stream_bigmatrix(xpRotation, sdev, xpDest)
pca_variable_correlations_stream_bigmatrix(
xpRotation,
sdev,
column_sd,
scale = NULL,
xpDest
)
pca_variable_contributions_stream_bigmatrix(xpLoadings, xpDest)
Arguments
xpMat |
Either a |
xpRotation |
For |
center |
For |
scale |
Optional numeric vector of scaling factors returned by
|
ncomp |
Number of components to retain. Use a non-positive value to keep all components returned by the decomposition. |
block_size |
Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory. |
max_iter |
Maximum number of block power iterations. |
tol |
Convergence tolerance applied to the Frobenius norm of the difference between successive subspace projectors. |
seed |
Optional integer seed used to initialise the random starting basis. |
return_scores |
Logical; when |
verbose |
Logical; when |
xpDest |
Either a |
rotation |
A rotation matrix such as the |
sdev |
A numeric vector of component standard deviations, typically the
|
column_sd |
A numeric vector of variable standard deviations used to scale the correlations when the PCA was performed on unscaled data. |
xpLoadings |
For |
Value
For pca_stream_bigmatrix()
, the same bigpca
object as
pca_bigmatrix()
with the
addition of a rotation_stream_bigmatrix
element referencing the populated
big.matrix
when xpRotation
is supplied. For
pca_spca_stream_bigmatrix()
, the same scalable PCA structure as
pca_spca()
with the optional pointer populated when provided.
The external pointer supplied in xpDest
, invisibly.
Functions
-
pca_scores_stream_bigmatrix()
: Stream PCA scores into a destination big.matrix. -
pca_variable_loadings_stream_bigmatrix()
: Populate big.matrix objects with derived variable diagnostics. -
pca_variable_correlations_stream_bigmatrix()
: Stream variable correlations into a destination big.matrix. -
pca_variable_contributions_stream_bigmatrix()
: Stream variable contributions into a destination big.matrix.
Examples
set.seed(456)
mat <- bigmemory::as.big.matrix(matrix(rnorm(30), nrow = 6))
ncomp <- 2
rotation_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_stream <- pca_stream_bigmatrix(mat, xpRotation = rotation_store, ncomp = ncomp)
score_store <- bigmemory::big.matrix(nrow(mat), ncomp, type = "double")
pca_scores_stream_bigmatrix(
mat,
score_store,
pca_stream$rotation,
pca_stream$center,
pca_stream$scale,
ncomp = ncomp
)
loadings_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_loadings_stream_bigmatrix(
pca_stream$rotation_stream_bigmatrix,
pca_stream$sdev,
loadings_store
)
correlation_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_correlations_stream_bigmatrix(
pca_stream$rotation_stream_bigmatrix,
pca_stream$sdev,
pca_stream$column_sd,
pca_stream$scale,
correlation_store
)
contribution_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_contributions_stream_bigmatrix(
loadings_store,
contribution_store
)
Supplementary individual diagnostics
Description
Compute principal component scores and quality metrics for supplementary individuals (rows) projected into an existing PCA solution.
Usage
pca_supplementary_individuals(
data,
rotation,
sdev,
center = NULL,
scale = NULL,
total_weight = NA_real_
)
Arguments
data |
Matrix-like object whose rows correspond to supplementary individuals and columns to the original variables. |
rotation |
Rotation matrix from the PCA model (e.g. the |
sdev |
Numeric vector of component standard deviations associated with
|
center |
Optional numeric vector giving the centring applied to each variable when fitting the PCA. Defaults to zero centring. |
scale |
Optional numeric vector describing the scaling applied to each
variable when fitting the PCA. When |
total_weight |
Optional positive scalar passed to
|
Value
A list with elements scores
, contributions
, and cos2
.
Supplementary variable diagnostics
Description
Compute loadings, correlations, contributions, and cos^2 values for supplementary variables (columns) given component scores for the active individuals.
Usage
pca_supplementary_variables(data, scores, sdev, center = NULL)
Arguments
data |
Matrix-like object whose columns correspond to supplementary variables measured on the active individuals. |
scores |
Numeric matrix of component scores for the active individuals. |
sdev |
Numeric vector of component standard deviations associated with
|
center |
Optional numeric vector specifying the centring to apply to each
supplementary variable. When |
Value
A list with elements loadings
, correlations
, contributions
, and
cos2
.
Prepare iteratively reweighted singular value decomposition
Description
Internal helper used by pca_robust()
to compute a singular value
decomposition that is less sensitive to individual rows with extreme values.
The routine alternates between computing the SVD of a row-weighted matrix and
updating the weights via a Huber-type scheme based on the reconstruction
residuals.
Usage
prepare_svd_robust_input(x, ncomp, max_iter, tol, huber_k)
Arguments
x |
Numeric matrix for which the decomposition should be computed. |
ncomp |
Number of leading components to retain. |
max_iter |
Maximum number of reweighting iterations. |
tol |
Convergence tolerance applied to successive changes in the row weights and singular values. |
huber_k |
Tuning constant controlling the aggressiveness of the Huber weight function. Larger values down-weight fewer observations. |
Value
A list containing x, n, p, ncomp, max_iter, tol and huber_k.
Singular value decomposition for bigmemory::big.matrix
inputs
Description
Compute the singular value decomposition (SVD) of a
bigmemory::big.matrix
without materialising it as a base R matrix.
Blocks of rows are streamed through BLAS before LAPACK is invoked so that
even moderately large matrices can be decomposed efficiently.
Usage
svd_bigmatrix(
xpMat,
nu = -1L,
nv = -1L,
block_size = 1024L,
method = c("dgesdd", "dgesvd")
)
Arguments
xpMat |
Either a |
nu |
Number of left singular vectors to return. Use a negative value to
request the default of |
nv |
Number of right singular vectors to return. Use a negative value to
request the default of |
block_size |
Number of rows to process per block when streaming data into BLAS kernels. Larger values can improve throughput at the cost of additional temporary memory. |
method |
LAPACK backend used to compute the decomposition. The default
uses the divide-and-conquer routine |
Value
A list with components u
, d
, and v
analogous to base R's
svd()
output. When nu
or nv
are zero the corresponding matrix has
zero columns.
Examples
set.seed(42)
mat <- bigmemory::as.big.matrix(matrix(rnorm(20), nrow = 5))
svd_res <- svd_bigmatrix(mat, nu = 2, nv = 2)
svd_res$d
Robust singular value decomposition (C++ backend)
Description
Compute the iteratively reweighted SVD using the high-performance C++
implementation. The interface mirrors svd_robust_R()
while delegating the
heavy lifting to compiled code.
Usage
svd_robust(
x,
ncomp,
max_iter = 25L,
tol = sqrt(.Machine$double.eps),
huber_k = 1.345
)
Arguments
x |
Numeric matrix for which the decomposition should be computed. |
ncomp |
Number of leading components to retain. |
max_iter |
Maximum number of reweighting iterations. |
tol |
Convergence tolerance applied to successive changes in the row weights and singular values. |
huber_k |
Tuning constant controlling the aggressiveness of the Huber weight function. Larger values down-weight fewer observations. |
Value
A list containing the left and right singular vectors (u
and v
),
the singular values (d
), the final row weights (weights
), and the number
of iterations required for convergence (iterations
).
Iteratively reweighted singular value decomposition
Description
Internal helper used by pca_robust()
to compute a singular value
decomposition that is less sensitive to individual rows with extreme values.
The routine alternates between computing the SVD of a row-weighted matrix and
updating the weights via a Huber-type scheme based on the reconstruction
residuals.
Usage
svd_robust_R(
x,
ncomp,
max_iter = 25L,
tol = sqrt(.Machine$double.eps),
huber_k = 1.345
)
Arguments
x |
Numeric matrix for which the decomposition should be computed. |
ncomp |
Number of leading components to retain. |
max_iter |
Maximum number of reweighting iterations. |
tol |
Convergence tolerance applied to successive changes in the row weights and singular values. |
huber_k |
Tuning constant controlling the aggressiveness of the Huber weight function. Larger values down-weight fewer observations. |
Value
A list containing the left and right singular vectors (u
and v
),
the singular values (d
), the final row weights (weights
), and the number
of iterations required for convergence (iterations
). The structure mirrors
base R's base::svd()
output with additional metadata.