The PTSDdiag package provides tools for analyzing and
optimizing PTSD diagnostic criteria using PCL-5 (PTSD Checklist for
DSM-5) data. Post-Traumatic Stress Disorder (PTSD) diagnosis
traditionally requires a complex evaluation of multiple symptom criteria
across different clusters. This package aims to simplify this process
while maintaining diagnostic accuracy.
This vignette demonstrates how to use the package to:
This package is currently only hosted on GitHub. Installation
requires the devtools package to be installed and
loaded.
# Install devtools if not already installed
if (!require("devtools")) install.packages("devtools")
# Install PTSDdiag
library("devtools")
devtools::install_github("WeidmannL/PTSDdiag")
introduction.R
Once the PTSDdiag is installed, it can be loaded the
usual way.
library("PTSDdiag")
introduction.R
Load additional packages:
library(psych) # For reliability analysis
introduction.R
This package includes a simulated dataset that mirrors hypothetical PCL-5-assessments. It contains 5,000 simulated patient responses, each rating 20 PTSD symptoms according to DSM-5 criteria.
Rating Scale
Each symptom is rated on a 5-point scale:
Symptom Clusters
The PCL-5 organizes symptoms into four distinct clusters according to DSM-5 criteria:
Data Format Requirements
Input data must be:
Let’s load the included sample data:
# Load the sample data
data("simulated_ptsd")
introduction.R
and take a look at the first few rows of the sample data:
# Display first few rows
head(simulated_ptsd)
#> S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20
#> 1 2 1 4 1 4 3 3 3 4 4 4 4 4 4 4 4 3 3 4 3
#> 2 3 3 4 3 3 2 3 3 3 3 4 2 4 4 3 4 3 3 3 4
#> 3 0 1 1 2 2 2 3 0 3 2 3 3 2 1 3 1 2 2 1 3
#> 4 3 3 3 3 3 3 4 1 4 4 2 3 4 2 2 2 4 3 3 3
#> 5 2 3 3 2 3 1 2 1 4 3 2 3 3 4 3 3 4 4 3 3
#> 6 1 1 2 1 2 2 1 0 2 1 2 3 3 3 1 2 1 0 3 3
introduction.R
The first step is to standardize column names for consistent analysis. Before standardization, columns might have various names:
# Example of potential input formats
names(simulated_ptsd)
#> [1] "S1" "S2" "S3" "S4" "S5" "S6" "S7" "S8" "S9" "S10" "S11" "S12"
#> [13] "S13" "S14" "S15" "S16" "S17" "S18" "S19" "S20"
introduction.R
After standardization, columns will be named systematically:
# Rename columns to standard format (symptom_1 through symptom_20)
simulated_ptsd_renamed <- rename_ptsd_columns(simulated_ptsd)
# Show new names
names(simulated_ptsd_renamed)
#> [1] "symptom_1" "symptom_2" "symptom_3" "symptom_4" "symptom_5"
#> [6] "symptom_6" "symptom_7" "symptom_8" "symptom_9" "symptom_10"
#> [11] "symptom_11" "symptom_12" "symptom_13" "symptom_14" "symptom_15"
#> [16] "symptom_16" "symptom_17" "symptom_18" "symptom_19" "symptom_20"
introduction.R
We’ll now process the data through several steps to calculate scores and determine diagnoses:
# Step 1: Calculate total scores (range 0-80)
simulated_ptsd_total <- calculate_ptsd_total(simulated_ptsd_renamed)
# Step 2: Apply DSM-5 diagnostic criteria and determine PTSD diagnoses
simulated_ptsd_total_diagnosed <- create_ptsd_diagnosis_nonbinarized(simulated_ptsd_total)
# Step 3: Generate summary statistics
summary_stats <- summarize_ptsd(simulated_ptsd_total_diagnosed)
print(summary_stats)
#> mean_total sd_total n_diagnosed
#> 1 57.772 12.36218 4710
introduction.R
The summary statistics provide:
Cronbach’s alpha is calculated to assess the internal consistency of the PCL-5:
cronbach <- psych::alpha(subset(simulated_ptsd_total_diagnosed, select = (-total)))
print(cronbach$total)
#> raw_alpha std.alpha G6(smc) average_r S/N ase mean
#> 0.912677 0.9156217 0.9212097 0.340688 10.85138 0.001758793 2.795905
#> sd median_r
#> 0.5937591 0.3318068
introduction.R
Now we come to the actual analysis. Current PTSD diagnosis requires evaluating 20 symptoms across four clusters with complex rules. Our goal is to identify simplified diagnostic criteria that:
We would like to identify the three six-symptom combinations that best represent the group of PTSD patients compared to the original DSM-5 criteria. We determine these optimal six-symptom combinations under two different structural approaches (a hierarchical approach, requiring at least one symptom from each cluster, and a non-hierarchical approach, ignoring cluster membership).
First, let’s find out the three optimal six-symptom combinations, of which at least four symptoms must be present for the diagnosis, where one symptom from each DSM-5 criterion cluster must be included. This approach maintains the hierarchical structure of PTSD diagnosis. As a reminder, these are the symptom clusters in the PCL-5:
The definition of the “optimal combination” can be determined with the score_by argument. Optimization can be based on either:
In our example, we want to miss as few diagnoses as possible compared to the original DSM-5 criteria, so we want to minimize the false negative cases (newly_nondiagnosed).
# Find best combinations with hierarchical approach, minimizing false negatives
best_combinations_hierarchical <- analyze_best_six_symptoms_four_required_clusters(
simulated_ptsd_renamed,
score_by = "newly_nondiagnosed"
)
introduction.R
Understanding the Output
The function returns three key elements. Let’s take a look at it.
best_combinations_hierarchical$best_symptoms
introduction.R
# Shows true/false values for original vs. new criteria
head(best_combinations_hierarchical$diagnosis_comparison, 10)
introduction.R
best_combinations_hierarchical$summary
introduction.R
The summary table includes:
Now we do the same for the non-hierarchical approach. We want to find the three optimal six-symptom combinations, of which at least four symptoms must be present for the diagnosis, regardless of cluster membership.
Here too, the definition of the “optimal combination” can be determined using the score_by argument. Optimization again can be based on either:
In our example, we want to miss as few diagnoses as possible compared to the original DSM-5 criteria, so we want to minimize the false negative cases (newly_nondiagnosed) in the non-hierarchical approach as well.
# Find best combinations with non-hierarchical approach, minimizing false negatives
best_combinations_nonhierarchical <- analyze_best_six_symptoms_four_required(
simulated_ptsd_renamed,
score_by = "newly_nondiagnosed"
)
introduction.R
Understanding the Output
Again, let’s take a look at the output
best_combinations_nonhierarchical$best_symptoms
introduction.R
# Shows true/false values for original vs. new criteria
head(best_combinations_nonhierarchical$diagnosis_comparison, 10)
introduction.R
best_combinations_nonhierarchical$summary
introduction.R
After identifying optimal symptom combinations, it’s crucial to
validate their performance on independent data. The
PTSDdiag package provides two validation approaches:
Holdout Validation and Cross-Validation.
Holdout Validation splits the data into training and test sets, allowing us to evaluate how well the identified symptom combinations generalize to new data.
# Perform holdout validation with 70/30 split
validation_results <- holdout_validation(
simulated_ptsd_renamed,
train_ratio = 0.7,
score_by = "newly_nondiagnosed",
seed = 123
)
introduction.R
Results
The Holdout Validation results show how the symptom combinations perform on data they weren’t trained on, providing a more realistic estimate of their diagnostic accuracy.
Examining Results for Non-Hierarchical Model:
# Best combinations identified on training data
validation_results$without_clusters$best_combinations
# Performance summary on test data
validation_results$without_clusters$summary
introduction.R
Examining Results for Hierarchical Model:
# Best combinations identified on training data (with cluster representation)
validation_results$with_clusters$best_combinations
# Performance summary on test data
validation_results$with_clusters$summary
introduction.R
For a more robust assessment, k-fold Cross-Validation tests the stability of symptom combinations across multiple data splits.
# Perform 5-fold cross-validation
cv_results <- cross_validation(
simulated_ptsd_renamed,
k = 5,
score_by = "newly_nondiagnosed",
seed = 123
)
introduction.R
Results by Fold
The function provides detailed results for each fold.
Examining Results for Non-Hierarchical Model:
# Summary statistics for each fold (non-hierarchical model)
cv_results$without_clusters$summary_by_fold
introduction.R
Examining Results for Hierarchical Model:
# Summary statistics for each fold (hierarchical model)
cv_results$with_clusters$summary_by_fold
introduction.R
Stable Combinations Across Folds
If certain symptom combinations appear in multiple folds, the function calculates their average performance.
Examining Results for Non-Hierarchical Model:
# Check for combinations that appeared in multiple folds (non-hierarchical)
if (!is.null(cv_results$without_clusters$combinations_summary)) {
print("Stable combinations in non-hierarchical model:")
cv_results$without_clusters$combinations_summary
} else {
print("No combinations appeared in multiple folds for the non-hierarchical model")
}
introduction.R
Examining Results for Hierarchical Model:
# Check for combinations that appeared in multiple folds (hierarchical)
if (!is.null(cv_results$with_clusters$combinations_summary)) {
print("Stable combinations in hierarchical model:")
cv_results$with_clusters$combinations_summary
} else {
print("No combinations appeared in multiple folds for the hierarchical model")
}
introduction.R
Interpreting Cross-Validation Results
Cross-Validation helps identify:
Key metrics to examine:
Both validation methods serve complementary purposes.
Holdout Validation:
Cross-Validation:
# Example: Compare sensitivity from both methods
# Note: Results will vary based on random splits
# Holdout Validation sensitivity (first combination, non-hierarchical)
holdout_summary <- validation_results$without_clusters$summary$x$data
if (nrow(holdout_summary) > 1) {
holdout_sensitivity <- holdout_summary[2, "Sensitivity"]
print(paste("Holdout sensitivity for first combination:",
round(holdout_sensitivity, 3)))
}
# Cross-Validation average sensitivity (if stable combinations exist)
if (!is.null(cv_results$without_clusters$combinations_summary)) {
cv_summary <- cv_results$without_clusters$combinations_summary$x$data
if (nrow(cv_summary) > 0) {
cv_sensitivity <- cv_summary[1, "Sensitivity"]
print(paste("Cross-validation average sensitivity:",
round(cv_sensitivity, 3)))
}
}
introduction.R
When validating PTSD diagnostic models:
seed = 123score_by = "newly_nondiagnosed" to minimize missed
diagnosesscore_by = "false_cases" to minimize total
misclassificationsWith the PTSDdiag package, PCL-5 data can be processed
and analyzed efficiently. It allows to identify reduced optimal symptom
combinations for PTSD diagnosis and to compare different diagnostic
approaches by generating detailed diagnostic accuracy metrics.