--- title: "Cross-system classification: WRB 2022, SiBCS 5, USDA Soil Taxonomy" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cross-system classification: WRB 2022, SiBCS 5, USDA Soil Taxonomy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(soilKey) ``` `soilKey` ships three independent classification keys -- WRB 2022 (Module 1), SiBCS 5ª edição (Module 6), and USDA Soil Taxonomy 13th edition (Module 5). Every key consumes the same `PedonRecord`, so a profile can be classified through all three in a single pass. This vignette demonstrates the alignment on canonical fixtures and shows where the systems agree, disagree, and complement each other. # 1. The same Ferralsol through three keys The canonical Ferralsol fixture is a clay-rich, low-CEC, low-BS Brazilian profile. ```{r classify-three} pr <- make_ferralsol_canonical() w <- classify_wrb2022(pr, on_missing = "silent") s <- classify_sibcs (pr, on_missing = "silent") u <- classify_usda (pr, on_missing = "silent") data.frame( System = c("WRB 2022", "SiBCS 5", "USDA"), Class = c(w$rsg_or_order, s$rsg_or_order, u$rsg_or_order), Full = c(w$name, s$name, u$name) ) ``` The three systems converge on the same conceptual unit: * **WRB** : Ferralsol with the canonical Ch 6 qualifiers. * **SiBCS**: Latossolo Vermelho (red Latossolo). * **USDA** : Oxisol. This three-way alignment is the textbook correspondence: WRB Ferralsol ↔ SiBCS Latossolo ↔ USDA Oxisol. # 2. Cross-system table on the canonical fixtures A subset of fixtures across the three systems: ```{r cross-table} fxs <- list( Ferralsol = make_ferralsol_canonical(), Acrisol = make_acrisol_canonical(), Lixisol = make_lixisol_canonical(), Luvisol = make_luvisol_canonical(), Nitisol = make_nitisol_canonical(), Vertisol = make_vertisol_canonical(), Andosol = make_andosol_canonical(), Histosol = make_histosol_canonical(), Podzol = make_podzol_canonical(), Cambisol = make_cambisol_canonical(), Gleysol = make_gleysol_canonical(), Plinthosol = make_plinthosol_canonical() ) tab <- do.call(rbind, lapply(names(fxs), function(nm) { pr <- fxs[[nm]] data.frame( Fixture = nm, WRB = classify_wrb2022(pr, on_missing = "silent")$rsg_or_order, SiBCS = classify_sibcs (pr, on_missing = "silent")$rsg_or_order, USDA = classify_usda (pr, on_missing = "silent")$rsg_or_order ) })) knitr::kable(tab) ``` The table reproduces the canonical correspondences: | WRB | SiBCS | USDA | |------------|--------------|-------------| | Ferralsol | Latossolo | Oxisol | | Acrisol | Argissolo | Ultisol | | Lixisol | Argissolo | Alfisol | | Luvisol | Argissolo | Alfisol | | Nitisol | Nitossolo | Alfisol/Ultisol | | Vertisol | Vertissolo | Vertisol | | Andosol | Cambissolo / specific | Andisol | | Histosol | Organossolo | Histosol | | Podzol | Espodossolo | Spodosol | # 3. Where the systems diverge The same profile can land in different "RSGs" because each system uses a slightly different gating criterion. The most important divergences: **Argic horizon chemistry**: SiBCS lumps Acrisol/Lixisol/Alisol/Luvisol under *Argissolos*, while WRB splits them by CEC (Lixisol/Luvisol = high CEC) AND base saturation (Acrisol/Alisol = low BS, low/high Al). The USDA equivalent split is Ultisol (low BS) vs Alfisol (high BS). **Andic vs cambic priority**: A volcanic ash soil with weak Bw can land in WRB Andosol but in SiBCS Cambissolo if the andic criteria narrowly fail. USDA Andisol uses the same andic criterion as WRB. **Plinthic / petric variants**: WRB Plinthosols, USDA Plinthudults / Plinthumults, SiBCS Plintossolos -- all rely on the same plinthite criterion but apply different gating order in the key. # 4. Recovering the qualifier-level correspondence For the same profile, each system provides additional discriminators: ```{r ferralsol-three-detail} pr <- make_ferralsol_canonical() w <- classify_wrb2022(pr, on_missing = "silent") s <- classify_sibcs (pr, on_missing = "silent") u <- classify_usda (pr, on_missing = "silent") cat("WRB principal qualifiers: ", paste(w$qualifiers$principal, collapse = ", "), "\n") cat("WRB supplementary qualifiers:", paste(w$qualifiers$supplementary, collapse = ", "), "\n") cat("SiBCS subordem (2nd level): ", s$rsg_or_order, "\n") cat("USDA suborder / great group: ", u$rsg_or_order, "\n") ``` The WRB qualifier ladder *(Geric, Ferric, Rhodic, Chromic + Clayic, Humic, Dystric, Ochric, Rubic)* is the most expressive: it captures CEC, iron, colour, texture, organic carbon, and base saturation in one parenthesised string. SiBCS achieves the same through its 2nd-categorical-level *subordem* names (e.g. Latossolos Vermelhos, Distroférricos), which are encoded separately. USDA's information density is concentrated in the great group / subgroup level (currently scaffolded for v1.0). # 5. Validating the SiBCS ↔ WRB alignment `soilKey` runs the SiBCS key on the same canonical fixtures used for WRB. The fixture-level correspondence is asserted by the test suite: ```{r sibcs-mapping} sibcs_expectations <- c( Ferralsol = "Latossolos", Acrisol = "Argissolos", Lixisol = "Argissolos", Luvisol = "Argissolos", Nitisol = "Nitossolos", Vertisol = "Vertissolos", Andosol = "Cambissolos", # Cambissolo Háplico Tb (Andic-leaning) Histosol = "Organossolos", Podzol = "Espodossolos", Plinthosol = "Plintossolos" ) actual <- vapply(names(sibcs_expectations), function(nm) { fx <- get(paste0("make_", tolower(nm), "_canonical"))() classify_sibcs(fx, on_missing = "silent")$rsg_or_order }, character(1)) data.frame( fixture = names(sibcs_expectations), expected = unname(sibcs_expectations), actual = actual, match = actual == sibcs_expectations ) ``` # 6. Use cases for cross-system classification * **Brazilian field surveys** -- producers and extension services use SiBCS, while international literature uses WRB. The same `PedonRecord` resolved through both keys gives the bilingual name without re-entering the data. * **Global benchmarks** -- WoSIS profiles carry WRB names; some legacy datasets use Soil Taxonomy. The cross-system table makes both corpora analysable side by side. * **Concept stress-testing** -- when WRB and SiBCS disagree on the same profile, the cause is almost always a single threshold (CEC/clay, BS, andic). Inspecting the disagreement is a fast way to find data-entry errors or to identify ambiguous profiles that deserve a closer look. The next vignette (`v04_vlm_extraction`) shows how the `PedonRecord` itself can be assembled from PDFs and field photos via vision-language extraction, so the cross-system pass can run on freshly-described profiles without manual data entry.