--- title: "TSQCA Tutorial (English)" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{TSQCA Tutorial (English)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction Threshold-Sweep QCA (TS-QCA) is a framework for systematically exploring how different threshold settings affect the results of crisp-set Qualitative Comparative Analysis (QCA). In crisp-set QCA, the researcher must choose thresholds to binarize: - the outcome **Y**, and - the conditions **X**. Small changes in these thresholds may lead to substantial differences in truth tables and minimized solutions. TS-QCA provides: - a **systematic** way to vary thresholds, - a **transparent** way to evaluate sensitivity, - and a **reproducible** workflow for robustness assessment. The `TSQCA` package implements four sweep methods: | Method | What varies | What stays fixed | Purpose | |--------|-------------|------------------|----------| | **CTS–QCA** | One X threshold | Y + other Xs | Evaluate influence of a single condition | | **MCTS–QCA** | Multiple X thresholds | Y | Explore combinations of X thresholds | | **OTS–QCA** | Y threshold | All Xs | Assess robustness to Y calibration | | **DTS–QCA** | X and Y thresholds | None | Full 2D sensitivity analysis | > **Scope:** This package focuses on **sufficiency analysis**—identifying condition combinations that are sufficient for an outcome. Necessity analysis (whether a condition is required for an outcome) involves different logical structures and evaluation metrics, and is planned for future versions. --- # Data Requirements TS-QCA assumes: - TS-QCA handles any numeric data (continuous, interval, or ordinal scales). Pre-calibration is NOT required because raw scores (e.g., 1–85 scale) are binarized directly at specified thresholds. - Ordinal data can be analyzed directly, enabling analyses that are difficult with fsQCA (which requires continuous membership values). - Thresholds can be any numeric value (integer or real). - Missing values should be handled before running sweeps. Example dataset structure: ```{r} library(TSQCA) data("sample_data") dat <- sample_data str(dat) ``` Define outcome and conditions: ```{r} outcome <- "Y" conditions <- c("X1", "X2", "X3") ``` --- # Working with Mixed Data Types ## Handling Binary and Continuous Variables In real-world social science research, datasets often contain **both binary variables** (e.g., gender, yes/no responses) **and continuous variables** (e.g., sales, satisfaction scores). When using TSQCA with such mixed data, special attention is required. ### Key Principles 1. **Do NOT sweep binary variables** — they are already binarized (0/1) 2. **Use threshold = 1 for binary variables** to preserve their original values 3. **Explicitly specify thresholds** for each variable in `sweep_list` ### Why Threshold = 1 for Binary Variables? The internal `qca_bin()` function uses the rule `x >= thr` for binarization: - If `x = 0`: `0 >= 1` → FALSE → **0** (preserved) - If `x = 1`: `1 >= 1` → TRUE → **1** (preserved) This ensures that binary variables remain unchanged during the binarization process. ### Practical Example: Mixed Data Suppose your dataset has: - **X1**: Gender (0 = Male, 1 = Female) — binary variable - **X2**: Product Quality Score (0-10) — continuous variable - **X3**: Store Atmosphere Score (0-10) — continuous variable When using `ctSweepM()`: ```{r, eval=FALSE} # CORRECT: Specify threshold explicitly for each variable sweep_list <- list( X1 = 1, # Binary variable: use threshold 1 X2 = 6:8, # Continuous: sweep thresholds X3 = 6:8 # Continuous: sweep thresholds ) res_mixed <- ctSweepM( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_list = sweep_list, thrY = 7, dir.exp = c(1, 1, 1) ) ``` This explores 1 × 3 × 3 = **9 threshold combinations**, treating X1 as a fixed binary condition while sweeping X2 and X3. ### Common Mistake ```{r, eval=FALSE} # WRONG: Using sweep range for binary variables sweep_list <- list( X1 = 6:8, # All values become 0 (since 0 < 6 and 1 < 6) X2 = 6:8, X3 = 6:8 ) ``` If you accidentally specify `X1 = 6:8`, both 0 and 1 will fail the `>= 6` condition, making all X1 values become 0. This destroys the information in your binary variable. ### Best Practice Always examine your data structure before setting up threshold sweeps: ```{r, eval=FALSE} # Check variable ranges summary(dat[, c("X1", "X2", "X3")]) # Identify binary variables (only 0 and 1) sapply(dat[, c("X1", "X2", "X3")], function(x) { unique_vals <- sort(unique(x)) if (length(unique_vals) == 2 && all(unique_vals == c(0, 1))) { "Binary (use threshold = 1)" } else { paste("Continuous (range:", min(x), "-", max(x), ")") } }) ``` --- # CTS–QCA: Single-Condition Sweep (`ctSweepS`) CTS–QCA varies the threshold for **one X condition**, keeping the others fixed. ```{r, error=TRUE} sweep_var <- "X3" # Condition (X) whose threshold is swept sweep_range <- 6:9 # Candidate threshold values to evaluate thrY <- 7 # Outcome (Y) threshold (fixed) thrX_default <- 7 # Threshold for other X conditions (fixed) res_cts <- ctSweepS( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_var = sweep_var, sweep_range = sweep_range, thrY = thrY, thrX_default = thrX_default, dir.exp = c(1, 1, 1), return_details = TRUE ) summary(res_cts) ``` --- # MCTS–QCA: Multiple X Sweep (`ctSweepM`) MCTS–QCA evaluates all combinations of thresholds for multiple X conditions. ```{r, error=TRUE} # Create a sweep list specifying thresholds for each condition sweep_list <- list( X1 = 6:7, X2 = 6:7, X3 = 6:7 ) res_mcts <- ctSweepM( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_list = sweep_list, thrY = 7, dir.exp = c(1, 1, 1), return_details = TRUE ) summary(res_mcts) ``` --- # OTS–QCA: Outcome Sweep (`otSweep`) OTS–QCA varies only the threshold of **Y**, keeping X thresholds fixed. ```{r} res_ots <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7), dir.exp = c(1, 1, 1), return_details = TRUE ) summary(res_ots) ``` --- # DTS–QCA: Two-Dimensional Sweep (`dtSweep`) DTS–QCA varies both **X thresholds** and **Y thresholds**, creating a full 2D grid. ```{r} sweep_list_dts_X <- list( X1 = 6:7, X2 = 6:7, X3 = 6:7 ) sweep_range_dts_Y <- 6:7 res_dts <- dtSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_list_X = sweep_list_dts_X, sweep_range_Y = sweep_range_dts_Y, dir.exp = c(1, 1, 1), return_details = TRUE ) summary(res_dts) ``` --- # Understanding the Output Each sweep result contains: - threshold values tested, - minimized solution expression, - solution consistency (`inclS`), - solution coverage (`covS`). General guidance: - **Consistency ≥ 0.8** is typically required for sufficiency. - **Coverage** indicates empirical relevance; higher is better. - A solution sensitive to small threshold changes suggests low robustness. --- # Handling Multiple Solutions (New in v0.2.0) ## Why Multiple Solutions Matter When QCA minimization produces multiple equivalent intermediate solutions, researchers face a methodological challenge: which solution should be reported? Traditional approaches often report only the first solution (M1), but this may miss important causal heterogeneity. TSQCA v0.2.0 addresses this by: 1. **Detecting** when multiple solutions exist 2. **Extracting** essential prime implicants (terms common to all solutions) 3. **Identifying** selective prime implicants and unique terms ## Essential vs. Selective Prime Implicants Understanding the distinction between essential and selective prime implicants is essential for robust QCA interpretation: | Type | Definition | Interpretation | |------|------------|----------------| | **Essential prime implicants** | Present in ALL solutions | Robust findings; essential causal factors | | **Selective prime implicants** | Present in SOME but not all solutions | Context-dependent; may vary across cases | | **Unique terms** | Present in only ONE specific solution | Solution-specific; least robust | ## Using `extract_mode` The `extract_mode` parameter controls how solutions are extracted: ### Mode: "first" (Default) ```{r, eval=FALSE} # Returns only the first solution (M1) # Backward compatible with v0.1.x result <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7), extract_mode = "first" # Default ) ``` ### Mode: "all" ```{r, eval=FALSE} # Returns all solutions concatenated # Useful for seeing all equivalent solutions result_all <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7), extract_mode = "all" ) # Output includes n_solutions column head(result_all$summary) # expression column shows: "M1: A*B + C; M2: A*B + D; M3: ..." ``` ### Mode: "essential" ```{r, eval=FALSE} # Returns essential prime implicants (terms common to all solutions) # Best for identifying robust findings result_essential <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7), extract_mode = "essential" ) # Output includes: # - expression: essential prime implicants # - selective_terms: terms in some but not all solutions # - unique_terms: solution-specific terms # - n_solutions: number of equivalent solutions ``` ## Practical Example Consider a scenario where QCA produces three equivalent solutions: - M1: `A*B + C → Y` - M2: `A*B + D → Y` - M3: `A*B + E → Y` The analysis reveals: | Component | Terms | Interpretation | |-----------|-------|----------------| | Essential (EPI) | `A*B` | Present in all three solutions; robust finding | | Selective (SPI) | `C, D, E` | Each appears in one solution; context-dependent | | Unique (M1) | `C` | Only in solution 1 | | Unique (M2) | `D` | Only in solution 2 | | Unique (M3) | `E` | Only in solution 3 | **Recommendation**: Report the essential prime implicants (`A*B → Y`) as your main finding, and discuss the selective prime implicants as alternative pathways in your discussion section. --- # Generating Reports (New in v0.2.0) ## Overview The `generate_report()` function creates comprehensive markdown reports from your analysis results. This automates the documentation process and ensures reproducibility. ## Report Formats ### Full Report Contains comprehensive analysis details: - Analysis settings (for reproducibility) - Summary table across all thresholds - Per-threshold detailed results - All solution formulas - Essential and selective prime implicants - Fit measures (consistency, coverage, PRI) - Configuration charts (New in v0.5.0) ```{r, eval=FALSE} # Generate full report generate_report(res_ots, "my_analysis_full.md", dat = dat, format = "full") ``` ### Simple Report Designed for journal manuscripts: - Condensed format - Essential information only - Ready for supplementary materials ```{r, eval=FALSE} # Generate simple report generate_report(res_ots, "my_analysis_simple.md", dat = dat, format = "simple") ``` ## Using Reports Effectively ### For Research Papers 1. Run analysis with `return_details = TRUE` (default in v0.2.0) 2. Generate a **simple report** for the main manuscript 3. Generate a **full report** for supplementary materials ```{r, eval=FALSE} # Complete workflow result <- otSweep( dat = mydata, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7) ) # For main text generate_report(result, "manuscript_results.md", dat = mydata, format = "simple") # For supplementary materials generate_report(result, "supplementary_full.md", dat = mydata, format = "full") ``` ### Accessing Analysis Parameters All analysis parameters are stored in `result$params` for reproducibility: ```{r, eval=FALSE} # View stored parameters result$params # Includes: # - outcome, conditions: variable names # - thrX, thrY: threshold values # - incl.cut, n.cut, pri.cut: QCA parameters # - dir.exp, include: minimization settings ``` --- # Best Practices ## Start Small, Then Expand When using sweep functions, the number of QCA analyses grows quickly. A systematic approach prevents wasted computation: ### Step 1: Single Value Test ```{r, eval=FALSE} # Test with a single threshold first result <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 7, # Single value thrX = c(X1 = 7, X2 = 7, X3 = 7) ) ``` ### Step 2: Small Range ```{r, eval=FALSE} # Expand to a small range result <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:7, # Small range thrX = c(X1 = 7, X2 = 7, X3 = 7) ) ``` ### Step 3: Full Analysis ```{r, eval=FALSE} # Run full analysis only after confirming it works result <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 5:9, # Full range thrX = c(X1 = 7, X2 = 7, X3 = 7) ) ``` ## Computational Complexity Understanding computational complexity helps plan your analysis: | Function | Complexity | Example | Analyses | |----------|------------|---------|----------| | `otSweep()` | O(n) | 5 Y thresholds | 5 | | `ctSweepS()` | O(n) | 5 X thresholds | 5 | | `ctSweepM()` | O(m^k) | 3 thresholds × 3 conditions | 27 | | `dtSweep()` | O(n × m^k) | 3 Y × 3^3 X | 81 | ### Managing Large Sweeps For `dtSweep()` and `ctSweepM()`, reduce conditions first: ```{r, eval=FALSE} # Manageable: 2 × 2 × 2 = 8 combinations sweep_list <- list(X1 = 6:7, X2 = 6:7, X3 = 6:7) # Caution: 5 × 5 × 5 = 125 combinations sweep_list <- list(X1 = 5:9, X2 = 5:9, X3 = 5:9) # Avoid: 5 × 5 × 5 × 5 × 5 = 3125 combinations! sweep_list <- list(X1 = 5:9, X2 = 5:9, X3 = 5:9, X4 = 5:9, X5 = 5:9) ``` ## Interpreting Results ### When Solutions Are Stable If the same solution appears across multiple thresholds, your findings are robust: ``` thrY | expression | inclS | covS -----|---------------|-------|------ 6 | A*B + C → Y | 0.85 | 0.72 7 | A*B + C → Y | 0.88 | 0.68 8 | A*B + C → Y | 0.91 | 0.65 ``` **Interpretation**: The solution `A*B + C` is robust across threshold variations. ### When Solutions Change If solutions vary significantly, investigate the threshold sensitivity: ``` thrY | expression | inclS | covS -----|---------------|-------|------ 6 | A*B + C → Y | 0.82 | 0.75 7 | A*B → Y | 0.89 | 0.62 8 | B*D → Y | 0.93 | 0.45 ``` **Interpretation**: Results are threshold-sensitive. Consider reporting the most theoretically justified threshold with appropriate caveats. --- # Negated Outcome Analysis (New in v0.3.0) ## Why Analyze Negated Outcomes? In QCA, researchers typically analyze conditions sufficient for the **presence** of an outcome (Y = 1). However, understanding conditions for the **absence** of an outcome (~Y) can provide equally valuable insights: - What conditions lead to project **failure** (not success)? - What factors contribute to customer **dissatisfaction** (not satisfaction)? - What explains policy **non-adoption** (not adoption)? ## Using the `~` Notation TSQCA v0.3.0 supports the QCA package's tilde (`~`) notation for negated outcomes: ```{r, eval=FALSE} # Analyze conditions for Y >= threshold (standard) result_Y <- otSweep( dat = dat, outcome = "Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7) ) # Analyze conditions for Y < threshold (negated) result_negY <- otSweep( dat = dat, outcome = "~Y", conditions = c("X1", "X2", "X3"), sweep_range = 6:8, thrX = c(X1 = 7, X2 = 7, X3 = 7) ) ``` ## Interpreting Negated Results When using `~Y`, the solution shows conditions sufficient for the **absence** of the outcome: | Analysis | Solution Example | Interpretation | |----------|------------------|----------------| | `outcome = "Y"` | `X1*X2 + X3` | High X1 AND X2, OR high X3 → High Y | | `outcome = "~Y"` | `~X1*~X3 + ~X2*~X3` | Low X1 AND X3, OR low X2 AND X3 → Low Y | **Note**: Negated conditions (`~X1`) in the solution mean the **absence** of that condition (below threshold). ## All Sweep Functions Support Negation ```{r, eval=FALSE} # ctSweepS with negated outcome result <- ctSweepS( dat = dat, outcome = "~Y", conditions = c("X1", "X2", "X3"), sweep_var = "X3", sweep_range = 6:8, thrY = 7, thrX_default = 7 ) # ctSweepM with negated outcome result <- ctSweepM( dat = dat, outcome = "~Y", conditions = c("X1", "X2"), sweep_list = list(X1 = 6:7, X2 = 6:7), thrY = 7 ) # dtSweep with negated outcome result <- dtSweep( dat = dat, outcome = "~Y", conditions = c("X1", "X2"), sweep_list_X = list(X1 = 6:7, X2 = 7), sweep_range_Y = 6:8 ) ``` ## Checking Negation in Results The `params` object stores whether negation was used: ```{r, eval=FALSE} result <- otSweep(dat = dat, outcome = "~Y", ...) # Check if negated result$params$negate_outcome # [1] TRUE result$params$outcome # [1] "~Y" ``` --- # Configuration Charts (New in v0.5.0) TSQCA can generate Fiss-style configuration charts (Table 5 format) commonly used in QCA publications. ## Automatic Inclusion in Reports Configuration charts are now automatically included in reports generated by `generate_report()`: ```{r eval=FALSE} # Generate report with configuration charts (default) generate_report(result, "my_report.md", dat = dat, format = "full") # Disable charts if needed generate_report(result, "my_report.md", dat = dat, include_chart = FALSE) # Use LaTeX symbols for academic papers generate_report(result, "my_report.md", dat = dat, chart_symbol_set = "latex") ``` ## Standalone Chart Functions You can also generate configuration charts directly: ```{r} # From path strings paths <- c("A*B*~C", "A*D", "B*E") chart <- config_chart_from_paths(paths) cat(chart) ``` ### Symbol Sets Three symbol sets are available: ```{r} # ASCII (maximum compatibility) cat(config_chart_from_paths(paths, symbol_set = "ascii")) ``` ```{r eval=FALSE} # LaTeX (for PDF/academic papers) cat(config_chart_from_paths(paths, symbol_set = "latex")) # Output: $\bullet$ for presence, $\otimes$ for absence ``` ### Multiple Solutions When you have multiple equivalent solutions: ```{r} solutions <- list( c("A*B", "C*D"), c("A*B", "C*E") ) chart <- config_chart_multi_solutions(solutions) cat(chart) ``` --- # Conclusion TSQCA provides a structured and reproducible way to evaluate how threshold choices influence QCA results. Using CTS, MCTS, OTS, and DTS sweeps, researchers can: - assess robustness, - identify stable causal patterns, - detect threshold-sensitive relationships, - and strengthen QCA validity. **New in v0.5.0:** - Configuration charts automatically included in reports - New parameters: `include_chart`, `chart_symbol_set` - Standalone chart functions: `config_chart_from_paths()`, `config_chart_multi_solutions()` **New in v0.3.0:** - QCA-compatible argument names (`outcome`, `conditions`) - Negated outcome support (`~Y` notation) - Backward compatibility with deprecation warnings **New in v0.2.0:** - Detect and analyze multiple equivalent solutions - Extract essential prime implicants for robust findings - Generate comprehensive reports automatically - Access stored parameters for reproducibility ## References For more information on TS-QCA methodology, see: - Ragin, C. C. (2008). *Redesigning Social Inquiry: Fuzzy Sets and Beyond*. University of Chicago Press. DOI: [10.7208/chicago/9780226702797.001.0001](https://doi.org/10.7208/chicago/9780226702797.001.0001) - Duşa, A. (2019). *QCA with R: A Comprehensive Resource*. Springer. DOI: [10.1007/978-3-319-75668-4](https://doi.org/10.1007/978-3-319-75668-4) - Oana, I.-E., & Schneider, C. Q. (2024). A Robustness Test Protocol for Applied QCA: Theory and R Software Application. *Sociological Methods & Research*, 53(1), 57–88. DOI: [10.1177/00491241211036158](https://doi.org/10.1177/00491241211036158) # Session Info ```{r} sessionInfo() ```