--- title: 'Get started' description: > A vignette introducing the INLAvaan package for Bayesian latent variable modeling using INLA. vignette: > %\VignetteIndexEntry{Introduction to INLAvaan} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} knitr: opts_chunk: collapse: true comment: '#>' bibliography: refs.bib toc: true --- ```{r} #| label: setup #| include: false library(INLAvaan) library(lavaan) library(dplyr) ``` ## Introduction SEMs are ubiquitous in the social sciences, psychology, ecology, and other fields. The INLAvaan package provides a user-friendly interface for fitting Bayesian SEMs using Integrated Nested Laplace Approximations [INLA, @rue2009approximate]. This vignette will guide you through the basics of using INLAvaan to fit a simple SEM. Before we begin make sure you have installed the INLAvaan package from GitHub by running the commands below. ```{r} #| label: install-instructions #| eval: false # Run this if you have not yet installed INLAvaan install.packages("pak") pak::pak("haziqj/INLAvaan") # Load all libraries for this example library(INLAvaan) library(tidyverse) library(lavaan) ``` ## Motivating example To motivate the use of SEMs, consider the introductory example in @song2012basic: *Does poorer glycemic control lead to greater severity of kidney disease?* We observe three indicators of glycemic control ($y_1$, $y_2$, $y_3$) and three indicators of kidney disease severity ($y_4$, $y_5$, $y_6$). ![](glycemic-table.png){fig-align="center" width="75%"} Rather than fitting separate regression models for each indicator, SEM allows us to model the relationship between the latent constructs themselves, providing a clearer and more coherent representation of the underlying processes. The hypothesised SEM is illustrated by the figure below: ![](https://haziqj.ml/sembias-gradsem/index_files/figure-revealjs/unnamed-chunk-4-1.png){fig-align="center" width="75%"} ## Data For the two-factor SEM we described above, it is easy to simulate some data using the `{lavaan}` package to do so. The code is below: ```{r} #| label: simulate-data pop_mod <- " eta1 =~ 1*y1 + 0.8*y2 + 0.6*y3 eta2 =~ 1*y4 + 0.8*y5 + 0.6*y6 eta2 ~ 0.3*eta1 # Variances y1 ~~ 0.5*y1 y2 ~~ 0.5*y2 y3 ~~ 0.5*y3 y4 ~~ 0.5*y4 y5 ~~ 0.5*y5 y6 ~~ 0.5*y6 eta1 ~~ 1*eta1 eta2 ~~ 1*eta2 " set.seed(123) dat <- lavaan::simulateData(pop_mod, sample.nobs = 1000) glimpse(dat) ``` From the code above, note the true values of the parameters, including the factor loadings $\Lambda$, regression coefficient $\beta$ between the two latent variables, as well as the residual and latent variances $\Theta$ and $\Psi$ respectively. ## Model fit Now that we have simulated some data, we can fit the SEM using INLAvaan. The model syntax is similar to that of `{lavaan}`, making it easy to specify the model. For further details on the model syntax, refer to the [lavaan website](https://lavaan.ugent.be/tutorial/syntax1.html). `{INLAvaan}` provides mirror functions for the main model fitting functions in `{lavaan}`: - `acfa()` mirrors `lavaan::cfa()` for confirmatory factor analysis; and - `asem()` mirrors `lavaan::sem()` for structural equation models; - `agrowth()` mirrors `lavaan::growth()` for latent growth curve models. The code to fit the SEM model is below: ```{r} #| label: fit-model mod <- " eta1 =~ y1 + y2 + y3 eta2 =~ y4 + y5 + y6 eta2 ~ eta1 " fit <- asem(mod, dat) ``` `{INLAvaan}` computes an approximation to the posterior density by way of a Laplace approximation [@tierney1989fully]. The joint mode and the Hessian needs to be computed, which gives a Gaussian distribution for the joint posterior of the parameters. The default method for optimisation is `stats::nlminb()`, but other optimisers can be used by specifying `optim_method = "ucminf"` for the `{ucminf}` package or `optim_method = "optim"` to call the `stats::optim()` function with method `"BFGS"`. From this, marginal posterior distributions for each parameter can be obtained by one of several ways, including 1) Skew normal fitting [`marginal_method = "skewnorm"`, the default method, see @chiuchiolo2023joint]; 2) Two-piece asymmetric Gaussian fitting [`marginal_method = "asymgaus"`, see @martins2013bayesian]; 3) Direct marginalisation of the joint Gaussian posterior (`marginal_method = "marggaus"`); and 4) Sampling from the joint Gaussian posterior (`marginal_method = "sampling"`). Once the marginal posterior distributions have been obtained, we can further use these to compute any derived quantities of interest via copula sampling. The posterior predictive p-values [@gelman1996posterior] and Deviance Information Criterion [DIC, @spiegelhalter2002bayesian] are computed this way. Often, the posterior sampling takes longer than the model fitting itself, so the number of samples can be controlled via the `nsamp` argument (default is `nsamp = 1000`) or can be skipped altoghether (`test = "none"`). ## Methods The resulting object is of class `INLAvaan`, a subclass of `lavaan` objects. ```{r} #| label: results-str str(fit, 1) fit ``` As a result, most of the methods that work for `lavaan` objects will also work for `INLAvaan` objects. The most common ones are probably `coef()` and `summary()`. ```{r} #| label: results-coef-summary # Inspect coefficients coef(fit) # Summary of results summary(fit) ``` It's possible to request posterior medians and modes in the summary output by specifying `postmedian = TRUE` or `postmode = TRUE` in the `summary()` function. ### Predictions Predicted values for the latent variables can be obtained using the `predict()` function. This is done by sampling from the posterior distributions of the latent variables given the observed data. In the future, this function will also allow for out-of-sample predictions and also to retrieve predictions for observed variables. ```{r} #| label: results-pred eta_preds <- predict(fit, nsamp = 100) length(eta_preds) head(eta_preds[[1]]) ``` This is an S3 object with a summary method that provides posterior means and credible intervals for the latent variables. Alternatively, the user is welcome to perform their own summary statistics on the list of posterior samples returned by `predict()`. ```{r} #| label: results-pred-summary summ_eta <- summary(eta_preds) str(summ_eta) head(summ_eta$Mean) ``` ### Plot A simple plot method is provided to view the marginal posterior distributions of the parameters. The vertical lines indicate the posterior mode. ```{r} #| label: results-plot plot(fit) ``` ## Model comparison In addition to several global fit indices (i.e. PPP, DIC), it is possible to compare models by way of Bayes factors using the `compare()` function. This function takes two `INLAvaan` objects and computes the Bayes factor using the Laplace approximations to the marginal likelihoods. ```{r} mod2 <- " # A model with uncorrelated factors eta1 =~ y1 + y2 + y3 eta2 =~ y4 + y5 + y6 eta1 ~~ 0*eta2 " fit2 <- asem(mod2, dat) compare(fit, fit2) ``` As a note, there have been several criticisms of the use of Bayes factors for model comparison, particularly in the context of SEMs [@tendeiro2019review; @schad2023workflow]. The `{blavaan}` package is able to implement [WAICs and LOOs](https://ecmerkle.github.io/blavaan/articles/model_comparison.html) as alternative model comparison metrics, and these will hopefully also be implemented in future versions of `{INLAvaan}`. ## Setting priors The `{INLAvaan}` package uses the same prior specification syntax as `{blavaan}` [@merkle2018blavaan;@merkle2021efficient], as detailed [here](https://ecmerkle.github.io/blavaan/articles/prior.html). Essentially, there are two ways to set priors for model parameters: 1) Globally for all parameters of a certain type (e.g., all factor loadings, all regression coefficients, etc.); and 2) Individually for specific parameters in the model syntax. The default global priors are derived from `{blavaan}`: ```{r} #| label: priors-default blavaan::dpriors() ``` Note that, `{INLAvaan}` uses the SRS decomposition on variance matrices, and consequently places priors on **correlations** instead of *covariances*. If, instead we wished to set global priors, say a Gamma distribution on **variances** instead of **standard deviations** (default), then we would do the following: ```{r} #| label: priors-global DP <- blavaan::dpriors(theta = "gamma(1,1)", psi = "gamma(1,1)") DP ## fit <- asem(mod, dat, dpriors = DP) # not run ``` To set individual priors for specific parameters, we can do so in the model syntax itself. For instance, to set a normal prior with mean 1 and standard deviation 3 for the factor loading of `y3` on `eta1`, and a normal prior with mean 0 and standard deviation 0.5 for the regression coefficient from `eta1` to `eta2`, we would specify the model as follows: ```{r} mod <- " eta1 =~ y1 + y2 + prior('normal(1,3)')*y3 eta2 =~ y4 + y5 + y6 eta2 ~ prior('normal(0,.5)')*eta1 " ## fit <- asem(mod, dat) # not run ``` ## Dependency on R-INLA Dependency on R-INLA has been temporarily removed for tThe current version of `{INLAvaan}` (v0.2-0). For a wide class LVMs and SEMs where the latent variables are unstructured and independent, the current implementation is sufficient. However, future versions of `{INLAvaan}` will re-introduce dependency on R-INLA to allow for more complex latent structures, such as spatial and temporal dependencies. ## References