A minimalistic library specifically designed to make the estimation of MachineLearning (ML) techniques as easy and accessible as possible, particularly within the framework of the Knowledge Discovery in Databases (KDD) process in data mining. The package provides all the essential tools needed to efficiently structure and execute each stage of a predictive or classification modeling workflow, aligning closely with the fundamental steps of the KDD methodology, from data selection and preparation, through model building and tuning, to the interpretation and evaluation of results using Sensitivity Analysis. The ‘MLwrap’ workflow is organized into four core steps; preprocessing(), build_model(), fine_tuning(), and sensitivity_analysis(). These steps correspond, respectively, to data preparation and transformation, model construction, hyperparameter optimization, and sensitivity analysis. The user can access comprehensive model evaluation results including fit assessment metrics, plots, predictions, and performance diagnostics for ML models implemented through Neural Networks, Support Vector Machines, Random Forest, and XGBoost algorithms. By streamlining these phases,‘MLwrap’ aims to simplify the implementation of ML techniques, allowing analysts and data scientists to focus on extracting actionable insights and meaningful patterns from large datasets, in line with the objectives of the KDD process.
You can install the development version of MLwrap from GitHub with:
# install.packages("pak")
::pak("JMartinezGarcia/MLwrap") pak
This is a basic example which shows you how to solve a common problem:
library(MLwrap)
#>
#> *****************************************************************************
#>
#> ooo ooooo ooooo
#> 88. .888 888
#> 888b d 888 888 oooo oooo ooo oooo d8b .oooo. oo.ooooo.
#> 8 Y88. .P 888 888 88. 88. .8 888 8P P )88b 888 88b
#> 8 888 888 888 88..]88..8 888 .oP 888 888 888
#> 8 Y 888 888 o 888 888 888 d8( 888 888 888
#> o8o o888o o888ooooood8 8 8 d888b Y888 8o 888bod8P
#> 888
#> o888o
#>
#> *****************************************************************************
#>
#> MLwrap v0.1.0: **Start simple, scale smart**
#>
## basic example code
<- "psych_well ~ age + gender + socioec_status + emot_intel + depression"
formula_reg
<- preprocessing(sim_data, formula_reg, task = "regression") |>
analysis_object
build_model(model_name = "Random Forest",
hyperparameters = list(trees = 150)) |>
fine_tuning(tuner = "Bayesian Optimization", metrics = "rmse") |>
sensitivity_analysis(methods = c("PFI", "SHAP"),
metric = "rsq")
#> ! No improvement for 5 iterations; returning current results.
### Tuning Results
|>
analysis_object plot_tuning_results()
### Evaluation Plots
|>
analysis_object plot_residuals_distribution() |>
plot_scatter_residuals()
### Sensitivity analysis
|>
analysis_object plot_pfi() |>
plot_shap()
<- table_pfi_results(analysis_object)
table_pfi
show(table_pfi)
#> $PFI
#> # A tibble: 8 × 3
#> Feature Importance StDev
#> <chr> <dbl> <dbl>
#> 1 depression 0.760 0.0344
#> 2 emot_intel 0.239 0.0248
#> 3 age 0.0593 0.00665
#> 4 socioec_status_Low 0.0169 0.00247
#> 5 gender_Female 0.0125 0.00224
#> 6 socioec_status_Medium 0.0118 0.00244
#> 7 gender_Male 0.0114 0.00244
#> 8 socioec_status_High 0.00967 0.00269