--- title: "Getting Started with SQIpro: Comprehensive Soil Quality Index" author: "Your Name" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true vignette: > %\VignetteIndexEntry{Getting Started with SQIpro} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) library(SQIpro) ``` # Introduction Soil quality is defined as *"the capacity of a specific kind of soil to function within natural or managed ecosystem boundaries, to sustain plant and animal productivity, maintain or enhance water and air quality, and support human health and habitation"* [@doran1994]. The **Soil Quality Index (SQI)** provides a single numeric score (0–1) that integrates multiple soil physical, chemical, and biological indicators. `SQIpro` implements six established methods for computing SQI, a flexible variable scoring framework, and publication-quality visualisation tools — all in a single CRAN-compliant R package. ## Key Concepts | Term | Definition | |------|-----------| | **Minimum Data Set (MDS)** | The smallest subset of soil variables that adequately characterises soil quality [@andrews2004] | | **Scoring function** | A transformation that converts raw variable values to a 0–1 score | | **SQI** | Weighted or unweighted combination of variable scores | | **Optimum** | The value or range of a variable associated with best soil function | --- # Installation ```{r install, eval=FALSE} # From CRAN (once published) install.packages("SQIpro") # Development version from GitHub # install.packages("remotes") remotes::install_github("yourname/SQIpro") ``` --- # The Example Dataset `SQIpro` ships with `soil_data`, a hypothetical dataset of 100 soil samples across 5 land-use systems and 2 depths, containing 12 soil indicators. ```{r data} data(soil_data) dim(soil_data) head(soil_data) ``` ```{r data-summary} table(soil_data$LandUse, soil_data$Depth) ``` --- # Step 1: Validate Your Data Always validate before analysis: ```{r validate} result <- validate_data( soil_data, group_cols = c("LandUse", "Depth") ) result$n_per_group ``` --- # Step 2: Define Variable Configuration This is the most important step. Each variable receives a scoring type: - **`"more"`** — higher is better (e.g., organic carbon, microbial biomass) - **`"less"`** — lower is better (e.g., bulk density, electrical conductivity) - **`"opt"`** — a specific optimum range (e.g., pH 6.0–7.0) - **`"trap"`** — trapezoidal with explicit zero boundaries ```{r config} cfg <- make_config( variable = c("pH", "EC", "BD", "OC", "MBC", "PMN", "Clay", "WHC", "DEH", "AP", "TN"), type = c("opt", "less", "less", "more", "more", "more", "opt", "more", "more", "more","more"), opt_low = c(6.0, NA, NA, NA, NA, NA, 20, NA, NA, NA, NA), opt_high = c(7.0, NA, NA, NA, NA, NA, 35, NA, NA, NA, NA), description = c( "Soil pH (H2O 1:2.5)", "Electrical Conductivity (dS/m)", "Bulk Density (g/cm3)", "Organic Carbon (%)", "Microbial Biomass Carbon (mg/kg)", "Potentially Mineralizable N (mg/kg)", "Clay content (%)", "Water Holding Capacity (%)", "Dehydrogenase Activity (ug TPF/g/day)", "Available Phosphorus (mg/kg)", "Total Nitrogen (%)" ) ) print(cfg) ``` ### Verify scoring curves Before proceeding, always visualise your scoring curves to confirm biological plausibility: ```{r scoring-curves, fig.height=7} plot_scoring_curves(soil_data, cfg, group_cols = c("LandUse", "Depth")) ``` --- # Step 3: Score All Variables ```{r score} scored <- score_all(soil_data, cfg, group_cols = c("LandUse", "Depth")) head(scored[, c("LandUse", "Depth", "pH", "OC", "MBC")]) ``` --- # Step 4: Select the Minimum Data Set (MDS) ```{r mds} mds <- select_mds(scored, group_cols = c("LandUse", "Depth"), load_threshold = 0.6) mds$mds_vars ``` --- # Step 5: Compute SQI Using All Six Methods ## Linear Scoring Equal-weight additive index [@doran1994]: ```{r linear} res_lin <- sqi_linear(scored, cfg, group_cols = c("LandUse", "Depth"), mds_vars = mds$mds_vars) print(res_lin) ``` ## Regression-Based Variables weighted by stepwise regression coefficients [@masto2008]: ```{r regression} res_reg <- sqi_regression(scored, cfg, dep_var = "OC", group_cols = c("LandUse", "Depth"), mds_vars = mds$mds_vars) print(res_reg) ``` ## PCA-Based Variables weighted by variance explained [@andrews2004; @bastida2008]: ```{r pca} res_pca <- sqi_pca(scored, cfg, group_cols = c("LandUse", "Depth"), mds = mds) print(res_pca) ``` ## Fuzzy Logic Arithmetic or geometric mean of fuzzy membership scores: ```{r fuzzy} res_fuz <- sqi_fuzzy(scored, cfg, group_cols = c("LandUse", "Depth"), mds_vars = mds$mds_vars, operator = "mean") print(res_fuz) ``` ## Entropy Weighting Objective weights derived from Shannon entropy [@shannon1948]: ```{r entropy} res_ent <- sqi_entropy(scored, cfg, group_cols = c("LandUse", "Depth"), mds_vars = mds$mds_vars) print(res_ent) cat("\nEntropy weights:\n") print(attr(res_ent, "entropy_weights")) ``` ## TOPSIS Multi-criteria ranking by ideal solution distance [@hwang1981]: ```{r topsis} res_top <- sqi_topsis(scored, cfg, group_cols = c("LandUse", "Depth"), mds_vars = mds$mds_vars) print(res_top) ``` ## Compare All Methods ```{r compare} comparison <- sqi_compare(scored, cfg, group_cols = c("LandUse", "Depth"), dep_var = "OC", mds = mds) print(comparison) ``` --- # Step 6: Visualisation ## Score Heatmap ```{r heatmap} plot_scores(scored, cfg, group_cols = c("LandUse", "Depth"), group_by = "LandUse", facet_by = "Depth") ``` ## SQI Bar Chart ```{r bar} plot_sqi(res_lin, sqi_col = "SQI_linear", group_col = "LandUse", fill_col = "Depth") ``` ## Radar Profile ```{r radar, fig.height=6, eval=requireNamespace("fmsb", quietly=TRUE)} # Requires the 'fmsb' package: install.packages("fmsb") plot_radar(scored, cfg, group_col = "LandUse", group_cols = c("LandUse", "Depth")) ``` ## PCA Biplot ```{r biplot} plot_pca_biplot(mds, scored, group_col = "LandUse") ``` --- # Step 7: Statistical Analysis ## ANOVA ```{r anova} # Compute per-observation index scored$SQI_obs <- rowMeans(scored[, mds$mds_vars], na.rm = TRUE) aov_res <- sqi_anova(scored, sqi_col = "SQI_obs", group_col = "LandUse") cat("ANOVA significant:", aov_res$significant, "\n") print(aov_res$compact_letters) ``` ## Sensitivity Analysis ```{r sensitivity} sa <- sqi_sensitivity(scored, cfg, group_cols = c("LandUse", "Depth"), method = "linear", mds_vars = mds$mds_vars) print(sa) ``` ## Sensitivity Tornado Chart ```{r tornado} plot_sensitivity(sa) ``` --- # Recommended Workflow Diagram ``` raw data │ ▼ validate_data() ← check structure, missing values, groups │ ▼ make_config() ← define scoring type per variable │ ▼ plot_scoring_curves() ← verify biological plausibility │ ▼ score_all() ← transform all variables to 0–1 │ ▼ select_mds() ← PCA-based minimum data set selection │ ├──► sqi_linear() ├──► sqi_regression() ├──► sqi_pca() ├──► sqi_fuzzy() ├──► sqi_entropy() └──► sqi_topsis() │ ▼ sqi_compare() ← unified results table + ranking │ ▼ Visualisation ← plot_sqi(), plot_scores(), plot_radar() │ ▼ Statistics ← sqi_anova(), sqi_sensitivity() ``` --- # References ```{r bib, echo=FALSE} ```