| Title: | Probabilistic Efficiency Analysis Using Explainable Artificial Intelligence |
| Version: | 0.1.0 |
| Description: | Provides a probabilistic framework that integrates Data Envelopment Analysis (DEA) (Banker et al., 1984) <doi:10.1287/mnsc.30.9.1078> with machine learning classifiers (Kuhn, 2008) <doi:10.18637/jss.v028.i05> to estimate both the (in)efficiency status and the probability of efficiency for decision-making units. The approach trains predictive models on DEA-derived efficiency labels (Charnes et al., 1985) <doi:10.1016/0304-4076(85)90133-2>, enabling explainable artificial intelligence (XAI) workflows with global and local interpretability tools, including permutation importance (Molnar et al., 2018) <doi:10.21105/joss.00786>, Shapley value explanations (Strumbelj & Kononenko, 2014) <doi:10.1007/s10115-013-0679-x>, and sensitivity analysis (Cortez, 2011) https://CRAN.R-project.org/package=rminer. The framework also supports probability-threshold peer selection and counterfactual improvement recommendations for benchmarking and policy evaluation. The probabilistic efficiency framework is detailed in González-Moyano et al. (2025) "Probability-based Technical Efficiency Analysis through Machine Learning", in review for publication. |
| License: | GPL-3 |
| URL: | https://github.com/rgonzalezmoyano/PEAXAI |
| BugReports: | https://github.com/rgonzalezmoyano/PEAXAI/issues |
| Encoding: | UTF-8 |
| Language: | en |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 3.5) |
| Imports: | Benchmarking, caret, deaR, dplyr, fastshap, iml, PRROC, pROC, rminer, stats |
| Suggests: | ggplot2, knitr, rmarkdown, nnet |
| VignetteBuilder: | knitr |
| LazyData: | false |
| ByteCompile: | true |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2025-11-27 01:08:27 UTC; Ricardo |
| Author: | Ricardo González Moyano
|
| Maintainer: | Ricardo González Moyano <ricardo.gonzalezm@umh.es> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-02 14:50:07 UTC |
Training Classification Models to Estimate Efficiency
Description
Trains one or multiple classification algorithms to identify Pareto-efficient decision-making units (DMUs). It jointly searches model hyperparameters and the class-balancing level (synthetic samples via SMOTE) using k-fold cross- validation or a train/validation/test split, selecting the configuration that maximizes the specified metric(s). Returns, for each technique, the best fitted model together with training summaries, performance metrics, and the selected balancing level.
Usage
PEAXAI_fitting(
data,
x,
y,
RTS = "vrs",
imbalance_rate = NULL,
trControl,
methods,
metric_priority = "Balanced_Accuracy",
hold_out = NULL,
seed = NULL,
verbose = TRUE
)
Arguments
data |
A |
x |
Integer vector with column indices of input variables in |
y |
Integer vector with column indices of output variables in |
RTS |
Text string or number defining the underlying DEA technology /
returns-to-scale assumption (default:
|
imbalance_rate |
Optional target(s) for class balance via SMOTE. If |
trControl |
A |
methods |
A |
metric_priority |
A |
hold_out |
Numeric proportion in (0,1) for validation split (default |
seed |
Integer. Seed for reproducibility. |
verbose |
Logical; if |
Value
A "PEAXAI" (list) with the best technique, best fitted models and their performance and the results by fold.
Examples
data("firms", package = "PEAXAI")
data <- subset(
firms,
autonomous_community == "Comunidad Valenciana"
)
trControl <- list(
method = "cv",
number = 3
)
# glm method
methods <- list(
"glm" = list(
weights = "dinamic"
)
)
models <- PEAXAI_fitting(
data = data,
x = c(1:4),
y = 5,
RTS = "vrs",
imbalance_rate = NULL,
methods = methods,
trControl = trControl,
metric_priority = c("Balanced_Accuracy", "ROC_AUC"),
seed = 1,
verbose = FALSE
)
Global feature importance for efficiency classifiers
Description
Computes global feature importance for a fitted classification model that separates Pareto-efficient DMUs, using one of three XAI backends:
-
"SA"— Sensitivity Analysis via rminer. -
"SHAP"— Model-agnostic SHAP approximations via fastshap. -
"PI"— Permutation Importance via iml.
You can evaluate the model on either the training domain (background = "train")
or the real-world domain (background = "real") and compute importance on a
chosen target set ("train" or "real"). Importances are
returned normalized to sum to 1.
Usage
PEAXAI_global_importance(
data,
x,
y,
final_model,
background = "train",
target = "train",
importance_method
)
Arguments
data |
A |
x |
Integer or character vector with the columns used as inputs (predictors). |
y |
Integer or character vector with the columns used as outputs (targets used
to define |
final_model |
A fitted model. If it is a base- |
background |
Character, |
target |
Character, |
importance_method |
A named list (or data.frame-like) with the backend and its args:
|
Details
Internally, the function builds background/target sets with xai_prepare_sets().
For glm models, the positive class is assumed to be the second level
("efficient") and probabilities are extracted with type = "response".
For other models (e.g., caret), predict(type = "prob")[, "efficient"] is used.
Value
A named numeric vector (or 1-row data.frame) of normalized importances, with names matching the predictor columns; the values sum to 1.
See Also
explain, FeatureImp,
Importance
Examples
data("firms", package = "PEAXAI")
data <- subset(
firms,
autonomous_community == "Comunidad Valenciana"
)
x <- 1:4
y <- 5
RTS <- "vrs"
imbalance_rate <- NULL
trControl <- list(
method = "cv",
number = 3
)
# glm method
methods <- list(
"glm" = list(
weights = "dinamic"
)
)
metric_priority <- c("Balanced_Accuracy", "ROC_AUC")
models <- PEAXAI_fitting(
data = data, x = x, y = y, RTS = RTS,
imbalance_rate = imbalance_rate,
methods = methods,
trControl = trControl,
metric_priority = metric_priority,
seed = 1,
verbose = FALSE
)
final_model <- models[["best_model_fit"]][["glm"]]
imp <- PEAXAI_global_importance(
data = data, x = x, y = y,
final_model = final_model,
background = "real", target = "real",
importance_method = list(name = "PI", n.repetitions = 5)
)
head(imp)
Identify Benchmark Peers Based on Estimated Efficiency Probabilities
Description
Identifies peer units (i.e., reference benchmarks) for each decision-making unit (DMU) based on predicted probabilities of technical efficiency. Given a fitted classification model that estimates the probability of being efficient, the function selects, for each DMU, its nearest efficient peer according to Euclidean or weighted distances. Multiple efficiency thresholds can be specified to assess different levels of benchmarking stringency.
Usage
PEAXAI_peer(
data,
x,
y,
final_model,
efficiency_thresholds,
weighted = FALSE,
relative_importance = NULL
)
Arguments
data |
A |
x |
Integer vector indicating the column indices of input variables in |
y |
Integer vector indicating the column indices of output variables in |
final_model |
A fitted classification model used to estimate efficiency probabilities. Supported classes: |
efficiency_thresholds |
Numeric vector indicating the minimum probability values required to consider a DMU as efficient. |
weighted |
Logical. If |
relative_importance |
Optional named numeric vector indicating the relative importance of each input/output variable (used when |
Details
This function enables probabilistic peer identification under uncertainty, supporting flexible definitions of efficiency based on thresholds over estimated probabilities.
When weighted = TRUE, variable weights (e.g., derived from feature importance) modulate the peer selection process, allowing for context-aware benchmarking.
Value
A named list of matrices. Each element corresponds to an efficiency threshold and contains, for each DMU, the index of the closest efficient peer.
If weighted = FALSE, the list contains unweighted peers. If weighted = TRUE, the list contains weighted peers.
Examples
data("firms", package = "PEAXAI")
data <- subset(
firms,
autonomous_community == "Comunidad Valenciana"
)
x <- 1:4
y <- 5
RTS <- "vrs"
imbalance_rate <- NULL
trControl <- list(
method = "cv",
number = 3
)
# glm method
methods <- list(
"glm" = list(
weights = "dinamic"
)
)
metric_priority <- c("Balanced_Accuracy", "ROC_AUC")
models <- PEAXAI_fitting(
data = data, x = x, y = y, RTS = RTS,
imbalance_rate = imbalance_rate,
methods = methods,
trControl = trControl,
metric_priority = metric_priority,
verbose = FALSE,
seed = 1
)
final_model <- models[["best_model_fit"]][["glm"]]
relative_importance <- PEAXAI_global_importance(
data = data, x = x, y = y,
final_model = final_model,
background = "real", target = "real",
importance_method = list(name = "PI", n.repetitions = 5)
)
efficiency_thresholds <- seq(0.75, 0.95, 0.1)
directional_vector <- list(relative_importance = relative_importance,
scope = "global", baseline = "mean")
targets <- PEAXAI_targets(data = data, x = x, y = y, final_model = final_model,
efficiency_thresholds = efficiency_thresholds, directional_vector = directional_vector,
n_expand = 0.5, n_grid = 50, max_y = 2, min_x = 1)
peers <- PEAXAI_peer(data = data, x = x, y = y, final_model = final_model,
efficiency_thresholds = efficiency_thresholds, weighted = FALSE)
Generate Efficiency Rankings Based on Probabilistic Classification
Description
Produces efficiency rankings of decision-making units (DMUs) according to the probabilities estimated by a fitted classification model. Two ranking modes are supported:
-
"predicted": ranks DMUs solely by their predicted probability of being efficient. -
"attainable": ranks DMUs hierarchically according to: (1) the attainable (target) efficiency probability, (2) the size of the improvement parameter\beta(smaller is better), and (3) the predicted efficiency probability (higher is better).
This allows to integrate both predictive and counterfactual (attainable) information into the efficiency ranking.
Usage
PEAXAI_ranking(
data,
x,
y,
final_model,
efficiency_thresholds,
targets = NULL,
rank_basis
)
Arguments
data |
A |
x |
Integer vector specifying the column indices of input variables in |
y |
Integer vector specifying the column indices of output variables in |
final_model |
A fitted classification model used to estimate efficiency probabilities. Supported types are:
|
efficiency_thresholds |
Numeric vector defining one or more efficiency probability thresholds to determine the attainable frontier or peer set. |
targets |
A named list containing, for each efficiency threshold, the corresponding
attainable targets and estimated |
rank_basis |
Character string specifying the ranking criterion. Options are:
|
Details
The attainable-based ranking combines predictive efficiency with the modeled potential
for improvement (\beta) and the probability of reaching a target frontier level.
This approach yields a more nuanced and interpretable prioritization of DMUs, reflecting
both their current and achievable performance under the estimated model.
When rank_basis = "attainable", ties in attainable probability are broken first
by the magnitude of \beta (ascending), and then by the predicted probability
(descending).
Value
If
rank_basis = "predicted": adata.framesorted by predicted efficiency probability.If
rank_basis = "attainable": a named list ofdata.frames, one per efficiency threshold, each sorted according to the hierarchical ranking scheme described above.
Examples
data("firms", package = "PEAXAI")
data <- subset(
firms,
autonomous_community == "Comunidad Valenciana"
)
x <- 1:4
y <- 5
RTS <- "vrs"
imbalance_rate <- NULL
trControl <- list(
method = "cv",
number = 3
)
# glm method
methods <- list(
"glm" = list(
weights = "dinamic"
)
)
metric_priority <- c("Balanced_Accuracy", "ROC_AUC")
models <- PEAXAI_fitting(
data = data, x = x, y = y, RTS = RTS,
imbalance_rate = imbalance_rate,
methods = methods,
trControl = trControl,
metric_priority = metric_priority,
verbose = FALSE,
seed = 1
)
final_model <- models[["best_model_fit"]][["glm"]]
relative_importance <- PEAXAI_global_importance(
data = data, x = x, y = y,
final_model = final_model,
background = "real", target = "real",
importance_method = list(name = "PI", n.repetitions = 5)
)
efficiency_thresholds <- seq(0.75, 0.95, 0.1)
directional_vector <- list(relative_importance = relative_importance,
scope = "global", baseline = "mean")
targets <- PEAXAI_targets(data = data, x = x, y = y, final_model = final_model,
efficiency_thresholds = efficiency_thresholds, directional_vector = directional_vector,
n_expand = 0.5, n_grid = 50, max_y = 2, min_x = 1)
ranking <- PEAXAI_ranking(data = data, x = x, y = y,
final_model = final_model, rank_basis = "predicted")
Projection-Based Efficiency Targets
Description
Computes efficiency projections for each observation based on a trained
classifier from caret that provides class probabilities via
predict(type = "prob"). For each probability threshold, the function
finds the direction and magnitude of change in input–output space required
for a unit to reach a specified efficiency level, following a directional
distance approach.
Usage
PEAXAI_targets(
data,
x,
y,
final_model,
efficiency_thresholds,
directional_vector,
n_expand,
n_grid,
max_y = 2,
min_x = 1
)
Arguments
data |
A |
x |
A numeric vector indicating the column indexes of input variables in |
y |
A numeric vector indicating the column indexes of output variables in |
final_model |
A fitted caret model of class |
efficiency_thresholds |
A numeric vector of probability levels in (0,1)
that define the efficiency classes (e.g., |
directional_vector |
A
|
n_expand |
Numeric. Number of expansion steps used to enlarge the initial
search range for |
n_grid |
Integer. Number of grid points evaluated during each iteration
to refine the cutoff value of |
max_y |
Numeric. Upper-limit multiplier for output expansion in the search procedure (default = 2). |
min_x |
Numeric. Lower-limit multiplier for input contraction in the search procedure (default = 1). |
Details
For each observation and for each probability level in efficiency_thresholds,
the function searches for the smallest directional distance \beta such that
the predicted probability of belonging to the efficient class reaches the target.
Value
A named list with one element per threshold. Each element contains:
-
data: Adata.frameof projected input–output values at that threshold. -
beta: A two-columndata.framewith the optimal\betaand the corresponding predicted probability.
See Also
find_beta_maxmin for initializing search bounds;
train for model training.
Examples
data("firms", package = "PEAXAI")
data <- subset(
firms,
autonomous_community == "Comunidad Valenciana"
)
x <- 1:4
y <- 5
RTS <- "vrs"
imbalance_rate <- NULL
trControl <- list(
method = "cv",
number = 3
)
# glm method
methods <- list(
"glm" = list(
weights = "dinamic"
)
)
metric_priority <- c("Balanced_Accuracy", "ROC_AUC")
models <- PEAXAI_fitting(
data = data, x = x, y = y, RTS = RTS,
imbalance_rate = imbalance_rate,
methods = methods,
trControl = trControl,
metric_priority = metric_priority,
verbose = FALSE,
seed = 1
)
final_model <- models[["best_model_fit"]][["glm"]]
relative_importance <- PEAXAI_global_importance(
data = data, x = x, y = y,
final_model = final_model,
background = "real", target = "real",
importance_method = list(name = "PI", n.repetitions = 5)
)
efficiency_thresholds <- seq(0.75, 0.95, 0.1)
directional_vector <- list(relative_importance = relative_importance,
scope = "global", baseline = "mean")
targets <- PEAXAI_targets(data = data, x = x, y = y, final_model = final_model,
efficiency_thresholds = efficiency_thresholds, directional_vector = directional_vector,
n_expand = 0.5, n_grid = 50, max_y = 2, min_x = 1)
Create New SMOTE Units to Balance Data combinations of m + s
Description
This function creates new DMUs to address data imbalances. If the majority class is efficient, it generates new inefficient DMUs by worsering the observed units. Conversely, if the majority class is inefficient, it projects inefficient DMUs to the frontier. Finally, a random selection if performed to keep a proportion of 0.65 for the majority class and 0.35 for the minority class.
Usage
SMOTE_data(data, x, y, RTS = "vrs", balance_data, seed)
Arguments
data |
A |
x |
Column indexes of the input variables in the |
y |
Column indexes of the output variables in the |
RTS |
Text string or number defining the underlying DEA technology /
returns-to-scale assumption (default:
|
balance_data |
Indicate level of efficient units to achive and the number of efficient and not efficient units. |
seed |
Integer. Seed for reproducibility. |
Value
It returns a data.frame with the newly created set of DMUs incorporated.
Create New SMOTE Units to Balance Data combinations of m + s
Description
This function creates new DMUs to address data imbalances. If the majority class is efficient, it generates new inefficient DMUs by worsering the observed units. Conversely, if the majority class is inefficient, it projects inefficient DMUs to the frontier. Finally, a random selection if performed to keep a proportion of 0.65 for the majority class and 0.35 for the minority class.
Usage
convex_facets(data, x, y, RTS = "vrs", balance_data = NULL)
Arguments
data |
A |
x |
Column indexes of the input variables in the |
y |
Column indexes of the output variables in the |
RTS |
Text string or number defining the underlying DEA technology /
returns-to-scale assumption (default:
|
balance_data |
A numeric vector indicating the different levels of balance required (e.g., c(0.1, 0.45, 0.6)). |
Value
It returns a data.frame with the newly created set of DMUs incorporated.
Simulated efficiency dataset (100 DMUs)
Description
Dataset with 100 simulated decision-making units (DMUs) used to illustrate the basic workflow of PEAXAI in a simple single-input/single-output setting.
Usage
data(data)
Format
A data.frame with 100 rows and 3 columns:
- x1
Input of the DMU (e.g., resource use, cost or effort).
- y
Observed output, potentially affected by technical inefficiency.
- yD
Deterministic (theoretical) output on the efficient frontier.
Details
Each DMU uses one input x1 to produce an output y. The variable
yD represents the theoretical output on the deterministic frontier,
that is, the output level that would be observed in the absence of
technical inefficiency.
The dataset is purely simulated and is intended for examples and vignettes.
It contains 100 DMUs with heterogeneous input levels and corresponding
output levels. The observed output y can be interpreted as
y <= yD, where the gap between yD and y reflects
technical inefficiency (plus possible noise, depending on how the data
were generated).
Source
Simulated data generated by the authors for illustrative purposes.
Examples
data(data)
str(data)
summary(data)
if (requireNamespace("ggplot2", quietly = TRUE)) {
ggplot2::ggplot(data, ggplot2::aes(x = x1)) +
ggplot2::geom_point(ggplot2::aes(y = y), alpha = 0.6) +
ggplot2::geom_line(ggplot2::aes(y = yD), color = "red") +
ggplot2::labs(
x = "Input x1",
y = "Output",
title = "Simulated DMUs and theoretical frontier"
) +
ggplot2::theme_minimal()
}
Search Range for Directional Efficiency Parameter (\beta)
Description
Estimates, for each observation, the minimum and maximum feasible values of the
directional distance parameter \beta used in projection-based efficiency
analysis. This function is an internal step of PEAXAI_targets,
providing the initial search bounds for the iterative determination of efficiency targets.
Usage
find_beta_maxmin(
data,
x,
y,
final_model,
efficiency_thresholds,
n_expand,
vector_gx,
vector_gy,
max_y,
min_x
)
Arguments
data |
A |
x |
A numeric vector with the column indexes of input variables in |
y |
A numeric vector with the column indexes of output variables in |
final_model |
A fitted caret model of class |
efficiency_thresholds |
A numeric vector of probability levels in (0,1).
Its minimum and maximum values delimit the target interval used to bracket |
n_expand |
Integer. Increment step size applied to |
vector_gx |
A numeric vector or |
vector_gy |
A numeric vector or |
max_y |
Numeric. Upper-limit multiplier for output expansion relative to observed maxima. |
min_x |
Numeric. Lower-limit multiplier for input contraction relative to observed minima. |
Details
For each DMU, the function expands outputs and contracts inputs along the specified
direction until the predicted probability of efficiency (from final_model)
reaches the maximum in efficiency_thresholds or feasible domain limits.
The resulting interval [\beta_{\min}, \beta_{\max}] is then used by
PEAXAI_targets to refine projections via grid search.
Value
A data.frame with two numeric columns:
minMinimum feasible value of
\betafor each observation.maxMaximum feasible value of
\betafor each observation.
See Also
PEAXAI_targets (efficiency projections based on \beta);
train (model training with class probabilities).
Spanish Food Industry Firms Dataset
Description
Dataset containing information on food industry companies located in Spain, used to illustrate efficiency analysis within the PEAXAI package. The dataset reflects the institutional and market heterogeneity that shapes firm-level efficiency across Spain’s 17 autonomous communities.
Usage
data(firms)
Format
A data.frame with 917 rows and 6 columns:
- total_assets
Total assets (millions of euros).
- employees
Number of employees.
- fixed_assets
Tangible fixed assets (millions of euros).
- personnel_expenses
Personnel expenses (millions of euros).
- operating_income
Operating income (millions of euros).
- autonomous_community
Autonomous community where the firm operates.
Details
The dataset includes 917 food industry firms with more than 50 employees, collected from the SABI database for the year 2023. Each observation corresponds to a single company. Variables reflect both operational and financial dimensions relevant for productivity and efficiency assessment.
The output variable is:
-
operating_income— Operating income (in millions of euros), measuring revenues generated from core business activities.
The input variables are:
-
total_assets— Total assets (millions of euros), representing resources employed. -
employees— Number of employees, indicating workforce size (only firms with more than 50 workers are included). -
fixed_assets— Tangible fixed assets (millions of euros), such as buildings and machinery. -
personnel_expenses— Personnel expenses (millions of euros), including salaries, benefits, and training.
The variable autonomous_community identifies the territorial location of each firm within Spain.
The sample displays substantial dispersion across variables, encompassing both small and large firms. This heterogeneity affects measures of central tendency—mean and median values differ considerably—thus providing a realistic challenge for efficiency and explainability analyses.
Source
SABI (Sistema de Análisis de Balances Ibéricos) database, 2023. Firms with more than 50 employees in the Spanish food industry.
Examples
data(firms)
str(firms)
summary(firms)
if (requireNamespace("ggplot2", quietly = TRUE)) {
ggplot2::ggplot(firms, ggplot2::aes(x = employees, y = operating_income)) +
ggplot2::geom_point(alpha = 0.6) +
ggplot2::labs(
x = "Number of employees",
y = "Operating income (millions of euros)",
title = "Spanish Food Industry Firms (2023)"
) +
ggplot2::theme_minimal() +
ggplot2::theme(
plot.title = ggplot2::element_text(face = "bold"),
axis.line = ggplot2::element_line(color = "black"),
axis.ticks = ggplot2::element_line(color = "black"),
panel.grid.minor = ggplot2::element_blank()
)
}
Create New SMOTE Units to Balance Data combinations of m + s
Description
This function creates new DMUs to address data imbalances. If the majority class is efficient, it generates new inefficient DMUs by worsering the observed units. Conversely, if the majority class is inefficient, it projects inefficient DMUs to the frontier. Finally, a random selection if performed to keep a proportion of 0.65 for the majority class and 0.35 for the minority class.
Usage
get_SMOTE_DMUs(data, facets, x, y, RTS = "vrs", balance_data = NULL, seed)
Arguments
data |
A |
facets |
A |
x |
Column indexes of the input variables in the |
y |
Column indexes of the output variables in the |
RTS |
Text string or number defining the underlying DEA technology /
returns-to-scale assumption (default:
|
balance_data |
A numeric vector indicating the different levels of balance required (e.g., c(0.1, 0.45, 0.6)). |
seed |
Integer. Seed for reproducibility. |
Value
A list where each element corresponds to a balance level, containing a single data.frame
with the real and synthetic DMUs, correctly labeled.
Data preprocessing and efficiency labeling with Additive DEA
Description
Labels each DMU (Decision Making Unit) as efficient or not using the
Additive DEA model, optionally after basic data preprocessing. The resulting
factor class_efficiency has levels c("not_efficient","efficient"),
where "efficient" is the positive class for downstream modeling.
Usage
label_efficiency(data, REF = data, x, y, RTS = "vrs")
Arguments
data |
A |
REF |
Optional reference set of inputs that defines the technology
(defaults to the columns indicated by |
x |
Integer vector with column indices of input variables in |
y |
Integer vector with column indices of output variables in |
RTS |
Character or integer specifying the DEA technology / returns-to-scale
assumption (default:
|
Details
Internally relies on dea.add to compute Additive DEA
scores and derive the binary efficiency label.
Value
A data.frame equal to data (retaining all input x and
output y columns) plus a new factor column class_efficiency
with levels c("not_efficient","efficient").
See Also
Examples
# Example (assuming columns 1:2 are inputs and 3 is output):
# out <- my_fun(data = df, x = 1:2, y = 3, RTS = "vrs")
# table(out$class_efficiency)
Prepare Data and Handle Errors
Description
This function arranges the data in the required format and displays some error messages.
Usage
preprocessing(data, x, y)
Arguments
data |
A |
x |
Column indexes of input variables in |
y |
Column indexes of output variables in |
Value
It returns a matrix in the required format and displays some error messages.
Training a Classification Machine Learning Model
Description
This function trains a set of models and selects best hyperparameters for each of them.
Usage
train_PEAXAI(data, method, parameters, trControl, metric_priority, seed)
Arguments
data |
A |
method |
Parameters for controlling the training process (from the |
parameters |
A |
trControl |
A |
metric_priority |
A |
seed |
Integer. Seed for reproducibility. |
Value
It returns a list with the chosen model.
Prepare Training and Target Datasets from a caret Model
Description
Extracts and formats the training and/or target datasets from a machine learning model trained with caret::train,
allowing for distinction between using the full training data or only the original subset used for modeling.
It standardizes the class column to be named "class_efficiency" and positions it as the last column.
Usage
xai_prepare_sets(
data,
x,
y,
final_model,
background,
target,
type,
threshold,
levels_order
)
Arguments
data |
A |
x |
Not currently used. Reserved for future input variable selection. |
y |
Not currently used. Reserved for future output variable specification. |
final_model |
A trained model object of class |
background |
A character string, either |
target |
A character string, either |
type |
Not currently used. Reserved for future prediction types. |
threshold |
Not currently used. Reserved for future thresholding logic. |
levels_order |
A character vector specifying the levels of the response factor, typically |
Value
A list with two elements:
train_dataA
data.framerepresenting the background dataset, with the class column renamed to"class_efficiency"and positioned last.target_dataA
data.framerepresenting the target dataset, formatted in the same way.