| Title: | Competitive Adaptive Reweighted Sampling (CARS) Algorithm |
| Version: | 0.5.0 |
| Maintainer: | Md. Ashraful Haque <ashrafulhaque664@gmail.com> |
| Description: | Implements Competitive Adaptive Reweighted Sampling (CARS) algorithm for variable selection from high-dimensional dataset using Partial Least Squares (PLS) regression models. CARS algorithm iteratively applies the Monte Carlo sub-sampling and exponential variable elimination techniques to identify/select the most informative variables/features subjected to minimal cross-validated RMSE score. The implementation of CARS algorithm is inspired from the work of Li et al. (2009) <doi:10.1016/j.aca.2009.06.046>. This algorithm is widely applied in near-infrared (NIR), mid-infrared (MIR), hyperspectral chemometrics areas, etc. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | ggplot2, pls, rlang, stats, utils |
| URL: | https://github.com/mah-iasri/carsAlgo |
| BugReports: | https://github.com/mah-iasri/carsAlgo/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-04-10 13:44:44 UTC; Ashraful |
| Author: | Md. Ashraful Haque [aut, cre], Avijit Ghosh [aut], Sayantani Karmakar [aut], Harsh Sachan [aut], Shalini Kumari [aut] |
| Repository: | CRAN |
| Date/Publication: | 2026-04-16 10:12:11 UTC |
Adaptive Reweighted Sampling (Internal)
Description
Samples num_features variable indices without replacement, with
probabilities proportional to the absolute values of PLS regression
coefficients. Falls back to uniform probabilities if weights are all
zero, non-finite, or contain NA.
Usage
.adaptive_reweighted_sampling(weights, num_features)
Arguments
weights |
Numeric vector of PLS regression coefficients. |
num_features |
Integer. Number of variable indices to select. |
Value
Integer vector of selected indices (positions within weights).
Exponential Decreasing Schedule (Internal)
Description
Computes the number of variables to retain at CARS iteration k using
the schedule: \lceil (n\_feat / 2^b) \cdot e^{-k/b} \rceil.
Usage
.exponential_decreasing_function(k, n_feat, b = 2)
Arguments
k |
Integer. Current iteration index (0-based). |
n_feat |
Integer. Total number of features at the start of CARS. |
b |
Numeric. Decay rate parameter. Default |
Value
Integer. Number of features to retain at this iteration.
Determine Safe Number of PLS Components (Internal)
Description
Returns the largest safe number of PLS latent components given the
current sample size and feature count, capped at max_components.
Ensures the value is at least 1 to avoid degenerate PLS models.
Usage
.get_optimal_components(n_samples, n_features, max_components)
Arguments
n_samples |
Integer. Number of available (calibration) samples. |
n_features |
Integer. Number of features in the current subset. |
max_components |
Integer. Hard upper bound supplied by the user. |
Value
Integer. Number of PLS components to use.
Cross-Validated RMSECV via Manual K-Fold (Internal)
Description
Fits a PLS model using manual k-fold cross-validation and returns the
Root Mean Square Error of Cross-Validation (RMSECV). Uses
plsr with the "kernelpls" algorithm.
Returns Inf if all folds fail or produce no predictions.
Usage
.pls_rmsecv(X_sel, y, ncomp, n_folds, seed = NULL)
Arguments
X_sel |
Numeric matrix. Feature matrix for all samples (n_samples x selected features). |
y |
Numeric vector. Response variable (length n_samples). |
ncomp |
Integer. Number of PLS latent components to use. |
n_folds |
Integer. Number of cross-validation folds. |
seed |
Integer or |
Value
Numeric scalar. RMSECV value; Inf if computation fails.
Monte Carlo Runs for One CARS Iteration (Internal)
Description
Executes all N Monte Carlo sub-sampling runs for a single CARS iteration.
In each run:
A random 80\
A PLS model is fitted and regression coefficients are extracted.
Feature indices are selected via Adaptive Reweighted Sampling (ARS).
The selected subset is evaluated via k-fold RMSECV on the full data.
Returns the feature subset and RMSECV of the best-performing run, or
NULL if every run fails.
Usage
.run_monte_carlo(
X,
y,
current_features,
n_select,
max_components,
iteration,
N,
cv_folds,
random_state
)
Arguments
X |
Numeric matrix. Full predictor matrix (n_samples x n_features). |
y |
Numeric vector. Response variable (length n_samples). |
current_features |
Integer vector. Active feature indices (1-based) for this iteration. |
n_select |
Integer. Target number of features to select. |
max_components |
Integer. PLS component cap (from user). |
iteration |
Integer. Current CARS iteration index (used for deterministic per-run seed offsets). |
N |
Integer. Number of Monte Carlo runs to execute. |
cv_folds |
Integer. Number of cross-validation folds. |
random_state |
Integer. Base random seed from the |
Value
A named list with two elements:
featuresInteger vector of selected feature indices from the best run.
rmsecvNumeric. RMSECV of the best run.
Returns NULL if no run succeeds.
Create an object of the CARS algorithm
Description
The CARSAlgorithm() function creates a configuration object for the
Competitive Adaptive Reweighted Sampling (CARS) algorithm. Pass this object
to fit.CARSAlgorithm to run variable selection on your high dimensional dataset.
Usage
CARSAlgorithm(max_iter = 100, N = 50, cv_folds = 5, random_state = 42)
Arguments
max_iter |
Maximum number of CARS iterations. Default |
N |
Number of Monte Carlo sub-sampling runs per iteration. Default |
cv_folds |
Number of folds for k-fold cross-validation. Default |
random_state |
Integer seed for reproducibility. Default |
Value
An object of class "CARSAlgorithm" - a named list of
hyperparameters to be passed to fit.CARSAlgorithm.
See Also
Examples
cars_obj <- CARSAlgorithm(max_iter = 20, N = 30, cv_folds = 5)
cars_obj
Fit a Model Object to Data
Description
Generic function for fitting model objects to data. Methods are
dispatched based on the class of x.
Usage
fit(cars_obj, ...)
Arguments
cars_obj |
A model configuration object (e.g., a |
... |
Additional arguments passed to the specific method. |
Value
Depends on the method. See fit.CARSAlgorithm.
See Also
Fits a CARS Object to any high dimensional dataset
Description
Applies the CARS algorithm to a high-dimensional data matrix X and
response vector y, iteratively selecting the optimal variable subset
via Monte Carlo enabled PLS regression and adaptive reweighted sampling techniques.
Usage
## S3 method for class 'CARSAlgorithm'
fit(cars_obj, X, y, max_components = 10L, plot = TRUE, plot_path = NULL, ...)
Arguments
cars_obj |
A |
X |
Numeric matrix of predictors (n_samples x n_features). |
y |
Numeric response vector of length n_samples. |
max_components |
Integer cap on PLS latent components. Default |
plot |
Logical. Whether to display and save the RMSECV curve. Default |
plot_path |
File path for saving the RMSECV plot. Default |
... |
Currently unused. |
Details
This function iteratively:
Sub-samples the calibration set (Monte Carlo,
Nruns per iteration).Fits a PLS model and extracts regression coefficients.
Selects variables by Adaptive Reweighted Sampling (ARS) proportional to absolute coefficient magnitude.
Evaluates the subset via k-fold cross-validation (RMSECV).
Retains the best subset and repeats with an exponentially shrinking variable set.
Value
A named list with:
best_featuresSorted 1-based column indices of selected features.
best_rmsecvLowest RMSECV achieved across all iterations.
rmsecv_historyNumeric vector of best RMSECV per iteration.
num_features_historyInteger vector of feature count per iteration.
plotA
ggplot2object of the RMSECV curve.
See Also
Examples
set.seed(1)
X <- matrix(rnorm(100 * 200), nrow = 100)
y <- X[, 5] * 2 + X[, 50] * -1.5 + rnorm(100, sd = 0.5)
cars_obj <- CARSAlgorithm(max_iter = 15, N = 30, cv_folds = 5)
result <- fit(cars_obj, X, y, max_components = 8)
cat("Best RMSECV :", result$best_rmsecv, "\n")
cat("Selected features:", result$best_features, "\n")
Print method for CARSAlgorithm objects
Description
Print method for CARSAlgorithm objects
Usage
## S3 method for class 'CARSAlgorithm'
print(x, ...)
Arguments
x |
A |
... |
Ignored. |
Value
No return value, called for side effects