---
title: "Bootstrap Confidence Interval for Standardized Solution in lavaan"
author: "Shu Fai Cheung"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Bootstrap Confidence Interval for Standardized Solution in lavaan}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


# Introduction

This document introduces the function
`standardizedSolution_boot_ci()`,
and related helpers, from the package
[`semhelpinghands`](https://sfcheung.github.io/semhelpinghands/).


# What `standardizedSolution_boot_ci()` Does

In `lavaan`, even with `se = "bootstrap"`,
the confidence intervals in the standardized
solution are *not* bootstrap confidence
intervals. This is a problem when researchers
want to form bootstrap confidence intervals
for parameters such as
a *standardized* indirect effect.[^notboot]

[^notboot]: In `lavaan`, if bootstrapping is requested, the standard errors and
confidence intervals in the standardized solutions are computed by delta method
using the variance-covariance matrix of the bootstrap estimates. The intervals
are symmetric about the point estimates and are not the bootstrap
percentile confidence intervals users expect when bootstrapping is conducted.

The function `standardizedSolution_boot_ci()`
addresses this problem.
It accepts a `lavaan::lavaan-class` object fitted
with `se = "bootstrap"` (or `se = "boot"`) and forms the percentile
confidence intervals based on the bootstrap estimates stored in the object.

# Data and Model

A mediation model example modified from the
official `lavaan` website is used
(https://lavaan.ugent.be/tutorial/mediation.html).


```r
library(lavaan)
set.seed(1234)
n <- 100
# X drawn from a Chi-square distribution with df = 2
X <- (rchisq(n, df = 2) - 2) / sqrt(2 * 2)
M <- .40 * X + sqrt(1 - .40^2) * rnorm(n)
Y <- .30 * M + sqrt(1 - .30^2) * rnorm(n)
Data <- data.frame(X = X,
                   Y = Y,
                   M = M)
model <-
"
# direct effect
  Y ~ c*X
# mediator
  M ~ a*X
  Y ~ b*M
# indirect effect (a*b)
  ab := a*b
# total effect
  total := c + (a*b)
"
```

This model is fitted with `se = "bootstrap"`
and 5000 replication. (Change `ncpus`
to a value appropriate for the system
running it.)


```r
fit <- sem(model,
           data = Data,
           se = "bootstrap",
           bootstrap = 5000,
           parallel = "snow",
           ncpus = 4,
           iseed = 1234)
```

(Note that having a warning for some
bootstrap runs is normal. The failed
runs will not be used in forming the
confidence intervals.)

This is the standardized solution with delta-method confidence intervals.


```r
standardizedSolution(fit)
#>     lhs op     rhs label est.std    se      z pvalue ci.lower ci.upper
#> 1     Y  ~       X     c   0.054 0.118  0.461  0.645   -0.176    0.285
#> 2     M  ~       X     a   0.370 0.098  3.768  0.000    0.178    0.563
#> 3     Y  ~       M     b   0.255 0.097  2.622  0.009    0.064    0.446
#> 4     Y ~~       Y         0.922 0.055 16.653  0.000    0.813    1.030
#> 5     M ~~       M         0.863 0.073 11.866  0.000    0.720    1.006
#> 6     X ~~       X         1.000 0.000     NA     NA    1.000    1.000
#> 7    ab :=     a*b    ab   0.094 0.045  2.093  0.036    0.006    0.183
#> 8 total := c+(a*b) total   0.149 0.108  1.375  0.169   -0.063    0.361
```

# Bootstrap Percentile CIs for Standardized Solution

To form bootstrap percentile confidence intervals for the standardized
solution, simply use `standardizedSolution_boot_ci()` instead of
`lavaan::standardizedSolution()`:


```r
library(semhelpinghands)
ci_boot <- standardizedSolution_boot_ci(fit)
ci_boot
#>     lhs op     rhs label est.std    se      z pvalue ci.lower ci.upper
#> 1     Y  ~       X     c   0.054 0.118  0.461  0.645   -0.176    0.285
#> 2     M  ~       X     a   0.370 0.098  3.768  0.000    0.178    0.563
#> 3     Y  ~       M     b   0.255 0.097  2.622  0.009    0.064    0.446
#> 4     Y ~~       Y         0.922 0.055 16.653  0.000    0.813    1.030
#> 5     M ~~       M         0.863 0.073 11.866  0.000    0.720    1.006
#> 6     X ~~       X         1.000 0.000     NA     NA    1.000    1.000
#> 7    ab :=     a*b    ab   0.094 0.045  2.093  0.036    0.006    0.183
#> 8 total := c+(a*b) total   0.149 0.108  1.375  0.169   -0.063    0.361
#>   boot.ci.lower boot.ci.upper boot.se
#> 1        -0.171         0.286   0.117
#> 2         0.144         0.537   0.101
#> 3         0.061         0.443   0.097
#> 4         0.766         0.986   0.058
#> 5         0.712         0.979   0.070
#> 6            NA            NA      NA
#> 7         0.016         0.202   0.047
#> 8        -0.048         0.362   0.106
```

The bootstrap percentile confidence intervals are appended to the right
of the original output of `lavaan::standardizedSolution()`,
in columns `boot.ci.lower` and `boot.ci.upper`. The standard
errors based on the bootstrap estimates (the standard deviation
of the estimates) are listed on the column `boot.se`.


As expected, the bootstrap percentile
confidence interval of the indirect
effect, `ab`, is [0.016, 0.202],
wider than the
delta-method confidence interval,
[0.006, 0.183], and
is shifted to the right.

# Print in a Friendly Format

The print-method of the output of
`standardizedSolution_boot_ci()` supports
printing the results in a text Format
similar to the summary of `lavaan`
output. Call `print()` directly and
add `output = "text"`:


```r
print(ci_boot,
      output = "text")
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                            Bootstrap
#>   Confidence interval                        Bootstrap
#>   Confidence Level                               95.0%
#>   Standardization Type                         std.all
#>   Number of requested bootstrap draws             5000
#>   Number of successful bootstrap draws            5000
#> 
#> Regressions:
#>                Standardized  Std.Err ci.lower ci.upper
#>   Y ~                                                 
#>     X          (c)    0.054    0.117   -0.171    0.286
#>   M ~                                                 
#>     X          (a)    0.370    0.101    0.144    0.537
#>   Y ~                                                 
#>     M          (b)    0.255    0.097    0.061    0.443
#> 
#> Variances:
#>                Standardized  Std.Err ci.lower ci.upper
#>    .Y                 0.922    0.058    0.766    0.986
#>    .M                 0.863    0.070    0.712    0.979
#> 
#> Defined Parameters:
#>                Standardized  Std.Err ci.lower ci.upper
#>     ab                0.094    0.047    0.016    0.202
#>     total             0.149    0.106   -0.048    0.362
```

Note that it will replace the results
of *unstandardized* solution by those
from the *standardized* solution.

To print both the unstandardized and
standardized results in the text-format,
add `standardized_only = FALSE`
when calling `print()`.

# Note

The function `standardizedSolution_boot_ci()` takes some time to run because
it retrieves the estimates of the unstandardized solution in each bootstrap
sample and computes the estimates in the standardized solution. Therefore,
if 5,000 bootstrap samples are requested, this process is repeated 5,000 times.
Nevertheless, it is still much faster than fitting the model 5,000 times again.

# Background

This function was originally proposed in an [issue at GitHub](https://github.com/simsem/semTools/issues/101#issue-1021974657),
inspired by a discussion at the [Google group for lavaan](https://groups.google.com/g/lavaan/c/qQBXSz5cd0o/m/R8YT5HxNAgAJ).
It is not a versatile function and used some "tricks" to
do the work. A more reliable way is to use function like
`lavaan::bootstrapLavaan()`. Nevertheless, this simple function
is good enough for the cases I encountered in my work.