phantSEM

Introduction

This vignette describes the use of the functions SA_step1 SA_step2 and SA_step3. The goal of these functions are to conduct a sensitivity analysis with phantom variables. Phantom variables are variables that you did not observe, but that you can specify in a structural equation model by specifying the variance and covariances of the phantom variable with other variables. Creating a single phantom variable is fairly straightforward in the R package lavaan. There are scenarios where a researcher may wish to see how the results from an analysis might differ if they had observed additional variables. For instance, a researcher may want to assess whether a certain regression coefficient would be statistically significant if they had controlled for another variable in their analysis. Because the researcher does not know how this phantom variable covaries with the variables they did observe, it would be useful to try different values for the covariances in a sensitivity analysis. The phantSEM package is built for researchers to test different values for the covariances between their phantom variables and observed variables.

In this vignette, a full example is illustrated using the functions of phantSEM.

Cross-sectional Mediation

The example that we will use to illustrate the use of phantSEM is one in which we have fit a cross-sectional mediation model and would like to see if our estimates of the mediated effect would hold if we had collected an additional wave of data. To start, we will load the packages we need.


library(phantSEM)
library(lavaan)
#> Warning: package 'lavaan' was built under R version 4.3.1
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.3.1
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.3.1
#> Warning: package 'ggplot2' was built under R version 4.3.1
#> Warning: package 'tibble' was built under R version 4.3.1
#> Warning: package 'readr' was built under R version 4.3.1
#> Warning: package 'purrr' was built under R version 4.3.1
#> Warning: package 'dplyr' was built under R version 4.3.1
#> Warning: package 'stringr' was built under R version 4.3.1
#> Warning: package 'forcats' was built under R version 4.3.1
#> Warning: package 'lubridate' was built under R version 4.3.1

The example we use is based on a memory study described by MacKinnon et al., (2018). Participants in the study were given a list of words and were randomly assigned to either make mental images of the words, or repeat the words. The individuals were then asked to recall as many words from the list as they could. Therefore, the predictor X was the condition (1=imagery, 0=repetition), the mediator M was the extent to which they used mental imagery and the outcome Y was the number of words they recalled.

We first define a 3 x 3 cross-sectional covariance matrix

memory_CrossSectional <- matrix(c(
  0.2509804, 0.9511983, 0.4257081,
  0.9511983, 8.8661765, 2.6609477,
  0.4257081, 2.6609477, 10.8592048
), nrow = 3, byrow = T)
colnames(memory_CrossSectional) <- c("X", "M2", "Y2")

Next, we fit a single-mediator model to the covariance matrix. The model is defined using lavaan syntax.

Observed_Model <- "
M2 ~ X
Y2 ~ M2+X
"

fit_obs <- sem(model = Observed_Model, sample.cov = memory_CrossSectional, sample.nobs = 138)

We now define the model with phantom variables for M and Y using lavaan syntax. It is very important to add labels to the parameters that will be of interest for the sensitivity analysis. For this example, the a-path, b-path and c’-path may be of interest.

Phantom_Model <- "
M2 ~ M1 + Y1 + a*X
Y2 ~ M1 + Y1 + b*M2 + cp*X
"

we now have the necessary parts for the first step of the sensitivity analysis.

SA_step1

The purpose of this function is to determine which variables are the phantom variables and provide the covariance parameters involving the phantom variables so that the user can enter them into the next step.

Step1 <- SA_step1(
  lavoutput = fit_obs,
  mod_obs = Observed_Model,
  mod_phant = Phantom_Model
)
#> Here are the phantom covariance matrix parameters (copy and paste and add values/names for step2):
#> 
#> 
#> phantom_assignment <-("CovM1M2"= ,
#> "CovM1X"= ,
#> "CovM1Y2"= ,
#> "CovY1M1"= ,
#> "CovY1M2"= ,
#> "CovY1X"= ,
#> "CovY1Y2"= ,
#> "VarM1"= ,
#> "VarY1"= )
#> 
#>  Choose the names of the phantom covariances that you want to fix to single values and put in a vector. These will be used for the fixed_names argument in the SA_step2 function.  The phantom covariance parameters that you want to vary should be put in a list and used as the test_names argument.
#> Here are the observed covariance matrix parameters:
#> CovXM2,CovXY2,CovY2M2
#> Choose which values you want to use for your fixed parameters and put their names in a vector (fixed_values). Make sure the order is the same for both vectors.

This function prints the names of the phantom covariance parameters which will be used in the next function. Those parameters are: “CovM1M2”,“CovM1X”,“CovM1Y2”,“CovY1M1”,“CovY1M2”,“CovY1X”,“CovY1Y2”,“VarM1”,“VarY1”. The function also provides the names of the observed covariance parameters for the user’s reference.

SA_step2

The second step of the sensitivity analysis is to create covariance matrices that the phantom model will be fit to. To do this, the user must provide information about how the phantom variables covary with the observed variables. For this example, the user may wish to test different values for “CovM1M2”, “CovY1Y2”, “CovY1M2” and “CovM1Y2” and fix the other phantom covariances to single values. The easiest way to create the arguments for SA_step2 is to copy and paste the list of parameter names from SA_step1 and remove the names of the parameters they want to vary.

fixed_names <- c(
  "CovM1X",
  "CovY1M1",
  "CovY1X",
  "VarM1",
  "VarY1"
)

Now that the phantom parameters that will be fixed to single values have been put into a vector, the user must define a vector of the same length that provides the values that these parameters will be fixed at. For this example, random assignment would suggest that CovM1X and CovY1X would be zero and the variances for the phantom variables can be set at 1. That leaves CovY1M1, which can be fixed to be equal to CovY2M2

fixed_values <- c(0, "CovY2M2", 0, 1, 1)

The next arguments that the user has to define are the parameters that will be varied. This is done by creating a list of the names of the parameters. If we wanted to vary all four of these parameters, we could write this argument as follows:

test_names <- list("CovM1M2", "CovY1Y2", "CovY1M2", "CovM1Y2")

However, suppose that we would like to allow CovM1M2 and CovY1Y2 to be equal and CovY1M2 and CovM1Y2 to be equal. we can choose that option by writing the test_names list as follows:

test_names <- list(c("CovM1M2", "CovY1Y2"), c("CovY1M2", "CovM1Y2"))

The next argument tells the function a sequence of values to use for the parameters. The user could provide a vector of values, or a sequence using seq().

test_values <- list(seq(0, .6, .1), seq(-.6, .6, .1))

Now we are ready to use SA_step2

# Step2 <- SA_step2(fixed_names=fixed_names,
#                  fixed_values=fixed_values,
#                  test_names=test_names,
#                  test_values=test_values,
#                  step1=Step1)

phantom_assignment <- list(
  "CovM1X" = 0,
  "CovY1M1" = "CovY2M2",
  "CovY1X" = 0,
  "VarM1" = 1,
  "VarY1" = 1,
  "CovM1M2" = seq(0, .6, .1),
  "CovY1Y2" = "CovM1M2",
  "CovY1M2" = seq(-.6, .6, .1),
  "CovM1Y2" = "CovY1M2"
)



Step2 <- SA_step2(
  phantom_assignment = phantom_assignment,
  step1 = Step1
)

SA_step2 creates a covariance matrix for each combination of the parameter test values provided. The function returns a list of five objects that are then passed to SA_step3, they are the lavaan syntax for the phantom variable model (“mod_phant”), a vector of variances (“var_phant”), a data frame of the combinations of phantom covariances (“combos”), a list of correlation matrices (“correlation_matrix_list”) and a list of covariance matrices (“covariance_matrix_list”).

names(Step2)
#> [1] "mod_phant"               "var_phant"              
#> [3] "combos"                  "correlation_matrix_list"
#> [5] "covariance_matrix_list"

The third function is SA_step3, which is where the sensitivity analysis is run. This function can take a great amount of time depending on how many different phantom parameter combinations the user has entered. For this function, there are only two arguments. The first is for the output of SA_step2 and the second is the sample size from the observed data.

Step3 <- SA_step3(
  step2 = Step2,
  n = 138
)

The function returns a list of four objects. The first object is a list of the lavaan output from each model, the second object is the list of covariance matrices used for each model, the third object is a list of correlation matrices, and the fourth object is the data frame of phantom covariance combinations.

At this point, the sensitivity analysis is complete, but the results are not in the most user-friendly form. The package contains a function to help to organize the results so that they can be examined or plotted. This function is called ghost_par_ests and it is used to pull out parameter estimatesfor a particular parameter (which must be named) from each phantom covariance combination. It has three arguments: step3 is the results from SA_step3, parameter_label is the label used in the lavaan code to identify the parameter of interest, and remove_NA is a logical object indicating whether the results returned by the function should include rows in which the model did not produce estimates.

Let’s see what the b-estimates would be for our example. The output from this function is a data frame that has the values of the phantom covariances that were varied as well as the lavaan output for the chosen parameter.

b_ests <- ghost_par_ests(
  step3 = Step3,
  parameter_label = "b",
  remove_NA = FALSE
)

a_ests <- ghost_par_ests(
  step3 = Step3,
  parameter_label = "a",
  remove_NA = FALSE
)

b_ests$aest <- a_ests$est
b_ests$abest <- b_ests$est * b_ests$aest
b_ests <- cbind(Step2[["combos"]], b_ests)
b_ests$`CovM1M2,CovY1Y2` <- round(b_ests$`CovM1M2,CovY1Y2`, 2)
b_ests$`CovY1M2,CovM1Y2` <- round(b_ests$`CovY1M2,CovM1Y2`, 2)

head(b_ests)
#>           CovM1M2,CovY1Y2 CovY1M2,CovM1Y2 Var.3 rep  lhs   op  rhs label
#> reptemp               0.0            -0.6  -0.3   1   Y2    ~   M2     b
#> reptemp.1             0.1            -0.6  -0.3   2 <NA> <NA> <NA>  <NA>
#> reptemp.2             0.2            -0.6  -0.3   3 <NA> <NA> <NA>  <NA>
#> reptemp.3             0.3            -0.6  -0.3   4 <NA> <NA> <NA>  <NA>
#> reptemp.4             0.4            -0.6  -0.3   5 <NA> <NA> <NA>  <NA>
#> reptemp.5             0.5            -0.6  -0.3   6 <NA> <NA> <NA>  <NA>
#>                est        se        z pvalue  ci.lower ci.upper     aest
#> reptemp   1.146209 0.1187104 9.655506      0 0.9135409 1.378877 3.789931
#> reptemp.1       NA        NA       NA     NA        NA       NA       NA
#> reptemp.2       NA        NA       NA     NA        NA       NA       NA
#> reptemp.3       NA        NA       NA     NA        NA       NA       NA
#> reptemp.4       NA        NA       NA     NA        NA       NA       NA
#> reptemp.5       NA        NA       NA     NA        NA       NA       NA
#>              abest
#> reptemp   4.344053
#> reptemp.1       NA
#> reptemp.2       NA
#> reptemp.3       NA
#> reptemp.4       NA
#> reptemp.5       NA

We can then plot the results using ggplot2 (plotting function to be developed later).

library(ggplot2)
ggplot(b_ests, aes(x = `CovM1M2,CovY1Y2`, y = est)) +
  geom_point(size = 2) +
  scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
  scale_color_manual(values = c("grey", "black")) +
  facet_grid(~ b_ests$`CovY1M2,CovM1Y2`) +
  theme_bw() +
  scale_fill_grey() +
  theme(
    axis.title.y = element_text(size = 18),
    axis.text.x = element_text(size = 12),
    axis.title.x = element_text(size = 20),
    axis.text = element_text(size = 15),
    strip.text.y = element_text(size = 15),
    strip.text.x = element_text(size = 20),
    legend.text = element_text(size = 15),
    legend.title = element_text(size = 18),
    legend.position = "bottom",
    panel.spacing.x = unit(2, "mm")
  )

Another option is to use the lookup table that is part of the package. The function SA_lookup can be used to do this. Note that for this version, the SA_lookup function only provides standardized estimates as it uses the correlation matrix only.

cov2cor(memory_CrossSectional)
#>              X        M2        Y2
#> [1,] 1.0000000 0.6376509 0.2578654
#> [2,] 0.6376509 1.0000000 0.2711872
#> [3,] 0.2578654 0.2711872 1.0000000
lookup_results <- SA_lookup(CorXM = .64, CorXY = .26, CorMY = .27)
lookup_results_plot <- lookup_results %>% filter(round(corr, 1) == .3 & (stabr > -.1 & stabr < .7))

ggplot(lookup_results_plot, aes(x = stabr, y = est_ab)) +
  geom_point(size = 2) +
  scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
  scale_color_manual(values = c("grey", "black")) +
  facet_grid(~ lookup_results_plot$clr) +
  theme_bw() +
  scale_fill_grey() +
  theme(
    axis.title.y = element_text(size = 18),
    axis.text.x = element_text(size = 12),
    axis.title.x = element_text(size = 20),
    axis.text = element_text(size = 15),
    strip.text.y = element_text(size = 15),
    strip.text.x = element_text(size = 20),
    legend.text = element_text(size = 15),
    legend.title = element_text(size = 18),
    legend.position = "bottom",
    panel.spacing.x = unit(2, "mm")
  )

Comparing this result to the earlier result, the pattern of the results is similar, but the scale of the y-axis is different. This is because these results are standardized.