---
title: "Get Started"
author: "Shu Fai Cheung"
date: "2024-06-01"
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Get Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



# Introduction

The package [`semlbci`](https://sfcheung.github.io/semlbci/)
([Cheung & Pesigan, 2023](https://doi.org/10.1080/10705511.2023.2183860))
includes functions for finding the
likelihood-based confidence intervals (LBCIs)
of parameters in the output of a structural
equation modeling (SEM) function.
Currently, it supports the output from
`lavaan::lavaan()` and its wrappers,
such as `lavaan::sem()` and `lavaan::cfa()`.

The latest stable version can be installed from GitHub:

```r
remotes::install_github("sfcheung/semlbci")
```

Further information about `semlbci` can
be found in [Cheung and Pesigan (2023)](https://doi.org/10.1080/10705511.2023.2183860).

# Fit a Model to a Dataset

The package has a dataset, `simple_med`,
with three variables, `x`, `m`, and `y`.
Let us fit a simple mediation model to
this dataset.


``` r
library(semlbci)
data(simple_med)
dat <- simple_med
head(dat)
#>            x          m         y
#> 1 -0.3447375   7.284273 -5.636897
#> 2 -0.3658919  -5.449121 -4.525402
#> 3 -0.8294968  -7.016254 -7.823257
#> 4 -0.3389654   4.367018  1.563098
#> 5 -0.9628162  -4.015469 -7.288511
#> 6 -1.0749302 -11.538140 -4.153572
```

``` r
library(lavaan)
mod <-
"
m ~ a*x
y ~ b*m
ab := a * b
"
# We set fixed.x = FALSE because we will also find the LBCIs for
# standardized solution
fit <- sem(mod, simple_med, fixed.x = FALSE)
```

To illustrate how to find the LBCIs for
user-defined parameters, we labelled the
`m ~ x` path by `a`, the `y ~ m` path
by `b`, and defined the indirect effect,
`ab`, by `a * b`.

This is the summary:


``` r
summary(fit, standardized = TRUE)
#> lavaan 0.6.17 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>   Number of observations                           200
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                10.549
#>   Degrees of freedom                                 1
#>   P-value (Chi-square)                           0.001
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   m ~                                                                   
#>     x          (a)    1.676    0.431    3.891    0.000    1.676    0.265
#>   y ~                                                                   
#>     m          (b)    0.535    0.073    7.300    0.000    0.535    0.459
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .m                34.710    3.471   10.000    0.000   34.710    0.930
#>    .y                40.119    4.012   10.000    0.000   40.119    0.790
#>     x                 0.935    0.094   10.000    0.000    0.935    1.000
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>     ab                0.897    0.261    3.434    0.001    0.897    0.122
```

# Examples

The main function to find the LBCIs for
free parameters is `semlbci()`. This
should be the only function used by
normal users. We will first illustrate
its usage by some examples, and then
present other technical details in the
following section.

## Find the LBCI for Selected Free Parameters

All free parameters can be specified in
`lavaan` style. For example, the path
from `m` to `y` is denoted by `"y ~ m"`,
and the covariance or correlation
between `x` and `m` (not in the example)
is denoted by `"x ~~ m"` (order does
not matter).


``` r
out <- semlbci(sem_out = fit,
               pars = c("y ~ m",
                        "m ~ x"))
```

Since version 0.10.4.25, `lavaan`
model syntax operators
can be used to represent all parameters
of the same type: `"~"` for regression
paths, `"~~"` for variances and
covariances, `"=~"` for factor
loadings, and `":="` for all user-defined
parameters. For example, the following
call and the one above find LBCIs for
the same set of parameters:


``` r
out <- semlbci(sem_out = fit,
               pars = c("~"))
```


The output is the parameter table of
the fitted `lavaan` object, with two
columns added, `lbci_lb` and `lbci_ub`,
the likelihood-based lower bounds and
upper bounds, respectively.


``` r
out
#> 
#> Results:
#>   id lhs op rhs label   est lbci_lb lbci_ub    lb    ub cl_lb cl_ub
#> 1  1   m  ~   x     a 1.676   0.828   2.525 0.832 2.520 0.950 0.950
#> 2  2   y  ~   m     b 0.535   0.391   0.679 0.391 0.679 0.950 0.950
#> 
#> Annotation:
#> * lbci_lb, lbci_ub: The lower and upper likelihood-based bounds.
#> * est: The point estimates from the original lavaan output.
#> * lb, ub: The original lower and upper bounds, extracted from the
#>     original lavaan output. Usually Wald CIs for free parameters and
#>     delta method CIs for user-defined parameters
#> * cl_lb, cl_ub: One minus the p-values of chi-square difference tests
#>     at the bounds. Should be close to the requested level of
#>     confidence, e.g., .95 for 95% confidence intervals.
#> 
#> Call:
#> semlbci(sem_out = fit, pars = c("~"))
```

In this example, the point estimate of
the unstandardized coefficient from `x`
to `m` is
1.676,
and the LBCI is
0.828
to
2.525.

### Default Parameters

By default, factor loadings, covariances
(except for error covariances), and
regression paths are included in the
search. Therefore, `pars` can be omitted,
although the search will take time to
run for a big model. In this case,
it is advised to enable parallel processing
by add `parallel = TRUE` and set `ncpus`
to the number of processes to run
(these arguments are explained later):


``` r
out <- semlbci(sem_out = fit,
               parallel = TRUE,
               ncpus = 6)
print(out,
      annotation = FALSE)
#> 
#> Results:
#>   id lhs op rhs label   est lbci_lb lbci_ub    lb    ub cl_lb cl_ub
#> 1  1   m  ~   x     a 1.676   0.828   2.525 0.832 2.520 0.950 0.950
#> 2  2   y  ~   m     b 0.535   0.391   0.679 0.391 0.679 0.950 0.950
#> 6  6  ab := a*b    ab 0.897   0.427   1.464 0.385 1.409 0.950 0.950
```

### Customizing the Printout

For users familiar with the column names,
the annotation can be disabled by calling `print()`
and add `annotation = FALSE`:


``` r
print(out, annotation = FALSE)
#> 
#> Results:
#>   id lhs op rhs label   est lbci_lb lbci_ub    lb    ub cl_lb cl_ub
#> 1  1   m  ~   x     a 1.676   0.828   2.525 0.832 2.520 0.950 0.950
#> 2  2   y  ~   m     b 0.535   0.391   0.679 0.391 0.679 0.950 0.950
#> 6  6  ab := a*b    ab 0.897   0.427   1.464 0.385 1.409 0.950 0.950
```

The results can also be printed in a `lavaan`-like
format by calling `print()`, setting `sem_out`
to the original fit object (`fit` in this example),
and add  `output = "lavaan"`:


``` r
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   m ~                                                                   
#>     x          (a)    1.676    0.431    3.891    0.000    0.828    2.525
#>   y ~                                                                   
#>     m          (b)    0.535    0.073    7.300    0.000    0.391    0.679
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.897    0.261    3.434    0.001    0.427    1.464
```

By default, the original confidence intervals
will not be printed. See the
help page of `print.semlbci()` for other
options available.

## Find the LBCI for a User-Defined Parameter

To find the LBCI for a user-defined
parameter, use `label :=`, where `label`
is the label used in the model
specification. The definition of this
parameter can be omitted. The content
after `:=` will be ignored by `semlbci()`.


``` r
out <- semlbci(sem_out = fit,
               pars = c("ab := "))
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.897    0.261    3.434    0.001    0.427    1.464
```

In this example, the point estimate of
the indirect effect is
0.897,
and the LBCI is
0.427
to
1.464.

(Note: In some examples, we added
`annotation = FALSE` to suppress the
annotation in the printout to minimize the
length of this vignette.)

## Find the LBCI for the Parameters in the Standardized Metric

By the default, the unstandardized
solution is used by `semlbci()`. If the
LBCIs for the standardized solution
solution are needed,
set `standardized = TRUE`.


``` r
out <- semlbci(sem_out = fit,
               pars = c("y ~ m",
                        "m ~ x"),
               standardized = TRUE)
```

This one also works:


``` r
out <- semlbci(sem_out = fit,
               pars = "~",
               standardized = TRUE)
```

This is the printout, in `lavaan`-style:


``` r
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   m ~                                                                   
#>     x          (a)    0.265    0.066    4.035    0.000    0.133    0.389
#>   y ~                                                                   
#>     m          (b)    0.459    0.056    8.215    0.000    0.342    0.561
```

If LBCIs are for the standardized solution
and `output` set to `"lavaan"` when
printing the results, the parameter
estimates, standard errors, *z*-values,
and *p*-values are those from the
standardized solution.

The LBCIs for standardized user-defined
parameters can be requested similarly.


``` r
out <- semlbci(sem_out = fit,
               pars = c("ab :="),
               standardized = TRUE)
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Defined Parameters:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.122    0.034    3.538    0.000    0.059    0.194
```


``` r
out <- semlbci(sem_out = fit,
               standardized = TRUE)
print(out,
      sem_out = fit,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Regressions:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   m ~                                                                   
#>     x          (a)    0.265    0.066    4.035    0.000    0.133    0.389
#>   y ~                                                                   
#>     m          (b)    0.459    0.056    8.215    0.000    0.342    0.561
#> 
#> Defined Parameters:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.122    0.034    3.538    0.000    0.059    0.194
```


# Basic Arguments in `semlbci()`

## `sem_out` and `pars`: The Fit Object and the Parameters

The only required argument for
`semlbci()` is `sem_out`, the fit
object from `lavaan::lavaan()` or its
wrappers (e.g., `lavann::cfa()` and
`lavaan::sem()`). By default, `semlbci()`
will find the LBCIs for all free
parameters (except for variances and
error variances) and user-defined
parameters, which can take a long time
for a model with many parameters.
Moreover, LBCI is usually used when
Wald-type confidence interval may not
be suitable, for example, forming
the confidence interval for an indirect
effect or a parameter in the
standardized solution. These parameters
may have sampling distributions that
are asymmetric or otherwise
substantially nonnormal due to bounded
parameter spaces or other reasons.

Therefore, it is recommended to call
`semlbci()` without specifying any
parameters. If the time to run is long,
then call `semlbci()` only for selected
parameters. The argument `pars`
should be a model syntax or a vector of
strings which specifies the parameters
for which LBCIs will be formed
(detailed below).

If time is not a concern, for example,
when users want to explore the LBCIs
for all free and user-defined parameters
in a final model, then `pars` can be
omitted to request the LBCIs for all
free parameters (except for variances
and covariances) and user-defined
parameters (if any) in a model.

## `ciperc`: The Level of Confidence

By default, the 95% LBCIs for the
unstandardized solution will be formed.
To change the level of confidence, set
the argument `ciperc` to the desired
coverage probability, e.g., .95 for 95%,
.90 for 90%.

## `standardized`: Whether Standardized Solution Is Used

By default, the LBCIs for the
unstandardized solution will be formed.
If the LBCIs for the standardized
solution are desired, set
`standardized = TRUE`. Note that for
some models it can be much slower to
find the LBCIs for the standardized
solution than for the unstandardized
solution.

## `parallel` and `ncpus`

The search for the bounds needs to be
done separately for each bound and this
can take a long time for a model with
many parameters and/or with equality
constraints. Therefore, parallel
processing should always be enabled by
setting `parallel` to `TRUE` and `ncpus`
to a number smaller than the number of
available cores. For example, without
parallel processing, the following
search took about 28 seconds on
Intel i7-8700:


``` r
data(HolzingerSwineford1939)
mod_test <-
'
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9
'
fit_cfa <- cfa(model = mod_test,
               data = HolzingerSwineford1939)
semlbci(fit_cfa)
```

With parallel processing enabled and
using 6 cores, it took about 20 seconds.


``` r
semlbci(fit_cfa,
        parallel = TRUE,
        ncpus = 6)
```

The speed difference can be much greater
for a model with many parameters and
some equality constraints.

Enabling parallel processing also has
the added benefit of showing the
progress in real time.

## `try_k_more_times` and `semlbci_out`

For some models and some parameters,
the search may be difficult. By
default, `semlbci()` will try two
more times, successively changing some
settings internally. If still failed
in forming the LBCI, users can try
to set `try_k_more_times` to a larger
number slightly larger than 2 (the
default value) to see whether it can
help forming the LBCI. This can be done
without forming other LBCIs again if
the output of `semlbci()` is passed
to the new call using `semlbci_out`.

For example, assume that some LBCIs
could not be found in the first run:


``` r
lbci_some_failed <- semlbci(fit_cfa)
```

We can call `semlbci()` again,
increasing `try_k_more_times` to 5,
and set `semlbci_out` to
`lbci_some_failed`.


``` r
lbci_try_again <- semlbci(fit_cfa,
                          try_k_more_times = 5,
                          semlbci_out = lbci_some_failed)
```

It will only form LBCIs for parameters
failed in the first one. The output,
`lbci_try_again`, will have the original
LBCIs plus the new ones, if the search
succeeds.

## Other Arguments

For detailed documentation of other
arguments, please refer to the help
page of `semlbci()`. Advanced users
who want to tweak the optimization
options can check the help pages of
`ci_bound_wn_i()` and `ci_i_one()`,

# Additional Features

## Multiple-Group Models

`semlbci()` supports multiple-group
models. For example, this is a
two-group confirmatory factor analysis
model with equality constraints:


``` r
data(HolzingerSwineford1939)
mod_cfa <-
'
visual  =~ x1 + v(lambda2, lambda2)*x2 + v(lambda3, lambda3)*x3
textual =~ x4 + v(lambda5, lambda5)*x5 + v(lambda6, lambda6)*x6
speed   =~ x7 + v(lambda8, lambda8)*x8 + v(lambda9, lambda9)*x9
'
fit_cfa <- cfa(model = mod_cfa,
               data = HolzingerSwineford1939,
               group = "school")
```

The factor correlations between group
are not constrained to be equal.


``` r
parameterEstimates(fit_cfa)[c(22, 23, 58, 59), ]
#>       lhs op     rhs block group label   est    se     z pvalue ci.lower
#> 22 visual ~~ textual     1     1       0.416 0.097 4.271  0.000    0.225
#> 23 visual ~~   speed     1     1       0.169 0.064 2.643  0.008    0.044
#> 58 visual ~~ textual     2     2       0.437 0.099 4.423  0.000    0.243
#> 59 visual ~~   speed     2     2       0.314 0.079 3.958  0.000    0.158
#>    ci.upper
#> 22    0.606
#> 23    0.294
#> 58    0.631
#> 59    0.469
```

This is the LBCI for covariance between
visual ability and textual ability:


``` r
fcov <- semlbci(fit_cfa,
                pars = c("visual ~~ textual"))
print(fcov,
      sem_out = fit_cfa,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> 
#> Group 1 [Pasteur]:
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.416    0.097    4.271    0.000    0.221    0.654
#> 
#> 
#> Group 2 [Grant-White]:
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.437    0.099    4.423    0.000    0.263    0.663
```

This is the LBCI for correlation
between visual ability and textual
ability:


``` r
fcor <- semlbci(fit_cfa,
                pars = c("visual ~~ textual"),
                standardized = TRUE)
print(fcor,
      sem_out = fit_cfa,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Standardized Estimates Only
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> 
#> Group 1 [Pasteur]:
#> 
#> Covariances:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.485    0.087    5.601    0.000    0.291    0.640
#> 
#> 
#> Group 2 [Grant-White]:
#> 
#> Covariances:
#>                Standardized  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>   visual ~~                                                             
#>     textual           0.540    0.086    6.317    0.000    0.357    0.692
```

Note that the example above can take more
than one minute to one if parallel
processing is not enabled.

## Robust LBCI

`semlbci()` also supports the robust
LBCI proposed by Falk (2018). To form
robust LBCI, the model must be fitted
with robust test statistics requested
(e.g., `estimator = "MLR"`). To
request robust LBCIs, add
`robust = "satorra.2000"` when
calling `semlbci()`.

We use the simple mediation model as
an example:


``` r
fit_robust <- sem(mod, simple_med,
                  fixed.x = FALSE,
                  estimator = "MLR")
fit_lbci_ab_robust <- semlbci(fit_robust,
                              pars = "ab := ",
                              robust = "satorra.2000")
print(fit_lbci_ab_robust,
      sem_out = fit_robust,
      output = "lavaan")
#> Likelihood-Based CI Notes:
#> 
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#>                       bounds.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Sandwich
#>   Information bread                           Observed
#>   Observed information based on                Hessian
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) lb.lower lb.upper
#>     ab                0.897    0.305    2.941    0.003    0.353    1.571
```

## Latent Level Parameters

`semlbci()` support forming the LBCIs
for most free parameters. Not
illustrated above but LBCIs can be
formed for path coefficients between
latent variables and also user-defined
parameters based on latent-level
parameters, such as an indirect effect
from one latent variable to another.

## More Examples

More examples can be found
in the "examples" folders in
[the OSF page](https://osf.io/b9a2p/files/osfstorage)
for this package.

# Limitations

The following is a summary of the
limitations of `semlbci()`. Please
refer to `check_sem_out()` for the full
list of limitations. This function is
called by `semlbci()` to check the
`sem_out` object, and will raise
warnings or errors as appropriate.

## Estimators

The function `semlbci()` currently
supports `lavaan::lavaan()` results
estimated by maximum likelihood (`ML`),
full information maximum likelihood for
missing data (`fiml`), and their robust
variants (e.g., `MLM`).

## Models

This package currently supports single
and multiple group models with
continuous variables. It *may* work for
a model with ordered variables but this
is not officially tested.

## Methods

The current and preferred method is the
one proposed by Wu and Neale (2012),
illustrated by Pek and Wu (2015).
The current implementation in
`semlbci()` does not check whether a
parameter is near its boundary. The
more advanced methods by Pritikin,
Rappaport, and Neale (2017) will be
considered in future development.

# Technical Details

A detailed presentation of the internal
workflow of `semlbci()` can be found
in the `vignette("technical_workflow", package = "semlbci")`.
Users interested in calling the lowest
level function, `ci_bound_wn_i()`,
can see some illustrative examples
in `vignette("technical_searching_one_bound", package = "semlbci")`.

# References

Cheung, S. F., & Pesigan, I. J. A. (2023). *semlbci*: An R package for forming likelihood-based confidence intervals for parameter estimates, correlations, indirect effects, and other derived parameters. *Structural Equation Modeling: A Multidisciplinary Journal*. *30*(6), 985--999.
https://doi.org/10.1080/10705511.2023.2183860

Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation
modeling? *Structural Equation Modeling: A Multidisciplinary Journal, 25*(2), 244-266. https://doi.org/10.1080/10705511.2017.1367254

Pek, J., & Wu, H. (2015). Profile likelihood-based confidence intervals and regions for structural equation models. *Psychometrika, 80*(4), 1123--1145. https://doi.org/10.1007/s11336-015-9461-1

Pritikin, J. N., Rappaport, L. M., & Neale, M. C. (2017). Likelihood-based confidence intervals for a parameter with an upper or lower bound. *Structural Equation Modeling: A Multidisciplinary Journal, 24*(3), 395-401. https://doi.org/10.1080/10705511.2016.1275969

Wu, H., & Neale, M. C. (2012). Adjusted confidence intervals for a bounded parameter. *Behavior Genetics, 42*(6), 886--898. https://doi.org/10.1007/s10519-012-9560-z