--- title: "Getting Started with the grmtree Package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with the grmtree Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` # Introduction The `grmtree` package implements recursive partitioning for Graded Response Models (GRM), allowing researchers to test for differential item functioning (DIF) and identify heterogeneous subgroups in their data based on item response patterns and covariates. This vignette describes the implementation of the tree-based graded response model (GRMTree) to test for DIF on the sample Medical Outcomes Study Social Support Survey (MOS-SS). This vignette demonstrates: * Data preparation and exploration * Unidimensionality checking * GRM fitting * GRMTree construction and interpretation # Install and Load required packages To implement the tree-based GRM (GRMTree), you will install the following packages if not previously installed. ```{r, eval=F} ## Install packages from CRAN repository install.packages(c("dplyr", "grmtree", "mirt", "psych")) ``` Once installed, load the packages as follows: ```{r, message=FALSE, warning=FALSE} library(dplyr) # For data manipulation library(grmtree) # For tree-based GRM DIF Test library(mirt) # For traditional GRM library(psych) # For psychometrics ``` # Import and explore the data The data set used in this demonstration is a test/sample data for the package. **Now, let's load this tidy data set into R.** ```{r, message=FALSE} ## Load the data data("grmtree_data", package = "grmtree") ## Take a glimpse at the data glimpse(grmtree_data) ``` # IRT Assumptions: Check for unidimensionality assumption There are various ways to check for the unidimensionality assumption. Some including using exploratory factor analysis (through parallel analysis), scree plot of the eigen values, and contrasts from principal component analysis. Here, we will use the scree plot of eigen values. Reeve, et al. suggested that a ratio of first-to-second eigenvalues greater than four is evidence of unidimensionality. ```{r, warning=FALSE} ## Create the response data (the 8 MOS-SS items) response_data <- grmtree_data %>% dplyr::select(MOS_Listen:MOS_Understand) %>% mutate_at(vars(starts_with("MOS")), as.ordered) ## Create response as outcomes response_data$resp <- data.matrix(response_data[, 1:8]) ``` ```{r, eval=FALSE} ## Calculate the polychoric correlation of items polycorr_mos <- polychoric(response_data$resp, global=FALSE)$rho ## Return the eigen values polycorr_eigen_mos <- eigen(polycorr_mos)$values round(polycorr_eigen_mos, 3) ## Ratio of first and second eigen value mos_ratio <- round(round(polycorr_eigen_mos, 3)[1]/round(polycorr_eigen_mos, 3)[2],3) ## Print the result mos_ratio cat("The ratio of the first-to-second eigen value is", mos_ratio, "which is >4 suggesting that the unidimensionality assumption is satisfied", "\n") ``` Now, let's create the screen plot. ```{r, eval=FALSE} ## Scree plot of eigen values plot(1:length(polycorr_eigen_mos), polycorr_eigen_mos, type = "b", pch = 20, xlab = "", ylab = "Eigen values") ``` **Note:** The `type = "b"` argument means we desire that the plot creates both points and lines, Based on the scree plot, we assume that the MOS-SS emotional domain is unidimensional. This implies that one dominant latent trait is being measured and that this trait is the driving force for the responses observed for each item in the measure. Other ways to check for unidimensionality of the MOS-SS emotional domain are shown below ```{r, eval=FALSE} ## Perform EFA with 1 factors efa_mos <- fa(response_data$resp, nfactors = 1, rotate = "varimax", fm = "mle") ## Print the results summary(efa_mos) print(efa_mos, sort=TRUE) efa_mos$loadings ## Visualizing factor structure using fa.diagram fa.diagram(efa_mos, main = "EFA Factor Structure") ## Scree plot to visualize the number of factors fa.parallel(response_data$resp, fa = "fa") ``` # Fit the graded response model (GRM) Having established unidimensionality, let's create the GRM. ```{r} ## Create GRM mos_grm <- mirt(data = response_data$resp, model = 1, itemtype = "graded", SE = TRUE, method = "EM", verbose = FALSE) ``` **Note:** The argument `model = 1` means that default is 1, indicating that a unidimensional model will be fitted ## Fit Indices for the GRM ```{r} ## Get the coefficients mos_coef <- coef(mos_grm, IRTpars = T, simplify = TRUE) mos_coef ## Get the residuals residuals(mos_grm, type = "Q3") ## Compute the M2 model fit statistic M2(mos_grm, type = "C2") ## Infit and outfit statistics mirt::itemfit(mos_grm, c('S_X2', 'infit'), method = 'EAP') ## Could also use method = 'ML' ``` ## Visualization: Visualize the results for all the items ```{r, eval = FALSE} ## Plot the category response curves plot(mos_grm, type = 'trace') ## Plot the test information plot(mos_grm, type = 'info') ``` # Fit the tree-based graded response model (GRMTree) Based on multiple fit indices including the Tucker–Lewis index (TLI) and Comparative fit index (CFI), we conclude that GRM has a good fit to the data. Now, let's create the GRNTree using covariates to test for DIF. To implement the GRMTree, it follows this four-step process. ```{r grmtree-process, echo=FALSE, out.width="75%", fig.align='center', fig.cap="GRMTree implementation process showing the four-step recursive partitioning approach for detecting differential item functioning"} knitr::include_graphics("GRMTree.png") ``` Before we proceed, let's put the variables in their proper formats: * All variables to be included in the model except `age` and `BMI` will be converted to factor since they are categorical variables and the MOS-SS emotional domain items will be converted to ordinal variables as required for the GRMTree. ```{r, warning=FALSE} ## Prepare the data resp.data <- grmtree_data %>% mutate_at(vars(starts_with("MOS")), as.ordered) %>% mutate_at(vars(c(sex, residency, depressed, Education, job, smoker, multimorbidity)), as.factor) ## Explore the data head(resp.data) ## Check the structure of the data glimpse(resp.data) ## Create response as outcomes resp.data$resp <- data.matrix(resp.data[, 1:8]) ## GRMTree control parameters with Benjamini-Hochberg grm_control <- grmtree.control( minbucket = 350, p_adjust = "BH", alpha = 0.05) ## Fit the GRMTree model mos_grmtree <- grmtree(resp ~ sex + age + bmi + Education + depressed + residency + job + multimorbidity + smoker, data = resp.data, control = grm_control) ``` **Note:** Unlike previous tree-based item response theory models, currently, the `grmtree` package implemented other post-hoc multiple adjustments methods and also the Bonferroni correction. The implemented post-hoc multiple adjustments methods include, "bonferroni" (Bonferroni), "holm" (Holm-Bonferroni), "BH" (Benjamini-Hochberg), "BY" (Benjamini-Yekutieli), "hochberg" (Hochberg), and "hommel" (Hommel). See `?grmtree` and `?grmtree.control` for more information on other control arguments. Also see below an example of using other multiple comparison methods with the control parameters in the function. ```{r, eval=FALSE} ## Bonferroni correction tree_bonf <- grmtree(response ~ covariate1 + covariate2, data = df, control = grmtree.control(p_adjust = "bonferroni")) ## Hommel tree_bh <- grmtree(response ~ covariate1 + covariate2, data = df, control = grmtree.control(p_adjust = "hommel")) ## Holm-Bonferroni tree_holm <- grmtree(response ~ covariate1 + covariate2, data = df, control = grmtree.control(p_adjust = "holm")) ``` ## Print the tree We can print the tree using the `print()` function. Note that using `print(mos_grmtree)` will print the tree with the item parameters. The code below also includes a way to print a simplified version of the tree showing only the nodes. ```{r} ## Print the tree model print(mos_grmtree) print(mos_grmtree, FUN = function(x) " *") ``` ## Plot the GRMTree Now, we can plot the tree using the `plot()` function. We have implemented several plotting options including plot the distribution of the factor scores in each node overlayed by a normal curve. See `?plot` for options of plots available. ```{r} ## Create the regions plot (by default) plot(mos_grmtree) ## This also creates a regions plot plot(mos_grmtree, type = "regions", tnex = 2L) ## Create the histogram plot of the factor scores plot(mos_grmtree, type = "histogram", tnex = 2L) ## Create the profile plot with different options plot(mos_grmtree, type = "profile", tnex = 2L, what = "threshold") plot(mos_grmtree, type = "profile", tnex = 2L, what = "discrimination") plot(mos_grmtree, type = "profile", tnex = 2L, what = "item") ``` **Note:** The `tnex = 2L` argument is a numeric value giving the terminal node extension in relation to the inner nodes. You may also rename a covariate in the plot. Let's assume we want to rename the variable that splits at the root node from `age` to `Age`. ```{r} ## Return the names of the covariates to know the position of age names(mos_grmtree$data) ## Rename the age to Age (Uncomment the code below and change to the correct name) #names(mos_grmtree$data)[3] <- "Age" ## Create the regions plot (by default) plot(mos_grmtree) ``` More than one subgroup (terminal node) was identified, suggesting the presence of sample heterogeneity with respect to DIF on the MOS-SS emotional domain items. Specifically, the GRMTree identified three distinct subgroups defined by interactions among age, sex and smoking status as splitting variables. Subgroup 1 consists of female patients who are ≤69 years old, subgroup 2 consists of males who are ≤69 years, and subgroup 3 consists of patients who are > 69 years. In the figure above, the larger the sections of the bars for each response category, the more likely patients are to endorse that category. For a more detailed example on the interpretation of these types of plot, see the article below. * Arimoro, O.I., Josephson, C.B., James, M.T. et al. Screening for depression in patients with epilepsy: same questions but different meaning to different patients. Qual Life Res 33, 3409–3419 (2024). https://doi.org/10.1007/s11136-024-03782-1. ## Extract item parameters for terminal nodes ```{r} ## Extract and print the threshold parameters thresholds <- threshpar_grmtree(mos_grmtree) print(thresholds) ## Extract and print the discrimination parameters discriminations <- discrpar_grmtree(mos_grmtree) print(discriminations) ## Extract and print the item parameters itemspars <- itempar_grmtree(mos_grmtree) print(itemspars) ``` **Note:** The `itempar_grmtree` extracts both discrimination parameters and average threshold parameters for each item from all terminal nodes of a graded response model tree. See `?itempar_grmtree` for more information. # Conclusion This vignette demonstrated basic usage of `grmtree` for identifying DIF in polytomous items. Key steps included checking dimensionality, fitting the GRM, and interpreting the resulting tree.