Semi-Confirmatory Structural Equation Modeling

Po-Hsien Huang

In this example, we will show how to use lslx to conduct semi-confirmatory structural equation modeling. The example uses data PoliticalDemocracy in the package lavaan. Hence, lavaan must be installed.

Model Specification

In the following specification, x1 - x3 and y1 - y8 is assumed to be measurements of 3 latent factors: ind60, dem60, and dem65.

model_sem <- "fix(1) * x1 + x2 + x3      <=: ind60
              fix(1) * y1 + y2 + y3 + y4 <=: dem60
              fix(1) * y5 + y6 + y7 + y8 <=: dem65
              dem60 <= ind60
              dem65 <= ind60 + dem60"

The operator <=: means that the RHS latent factors is defined by the LHS observed variables. In particular, the loadings are freely estimated. In this model, ind60 is measured by x1 - x3, dem60 is mainly measured by y1 - y4, and dem65 is mainly measured by y5 - y8. The operator <= means that the regression coefficients from the RHS variables to the LHS variables are freely estimated. In this model, dem60 is influenced by ind60 and dem65 is influenced by dem60 and ind60. Details of model syntax can be found in the section of Model Syntax via ?lslx.

Object Initialization

lslx is written as an R6 class. Everytime we conduct analysis with lslx, an lslx object must be initialized. The following code initializes an lslx object named lslx_sem.

library(lslx)
lslx_sem <- lslx$new(model = model_sem,
                    sample_cov = cov(lavaan::PoliticalDemocracy),
                    sample_size = nrow(lavaan::PoliticalDemocracy))
An 'lslx' R6 class is initialized via 'sample_cov' argument. 
  Response Variables: x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8 
  Latent Factors: ind60 dem60 dem65 

Here, lslx is the object generator for lslx object and $new() is the build-in method of lslx to generate a new lslx object. The initialization of lslx requires users to specify a model for model specification (argument model) and a sample moments to be fitted (argument sample_cov and sample_size). The sample moment must contain all the observed variables specified in the given model.

Model Respecification

After an lslx object is initialized, model can be respecified by $free_coefficient(), $fix_coefficient(), and $penalize_coefficient() methods. The following code sets y1<->y5, y2<->y4, y2<->y6, y3<->y7, y4<->y8, and y6<->y8 as penalized parameters.

lslx_sem$penalize_coefficient(name = c("y1<->y5",
                                       "y2<->y4",
                                       "y2<->y6",
                                       "y3<->y7",
                                       "y4<->y8",
                                       "y6<->y8"))
The relation y5<->y1 under g is set as PENALIZED with starting value = 0. 
The relation y4<->y2 under g is set as PENALIZED with starting value = 0. 
The relation y6<->y2 under g is set as PENALIZED with starting value = 0. 
The relation y7<->y3 under g is set as PENALIZED with starting value = 0. 
The relation y8<->y4 under g is set as PENALIZED with starting value = 0. 
The relation y8<->y6 under g is set as PENALIZED with starting value = 0. 

To see more methods for respecifying model, please check the section of Set-Related Method via ?lslx.

Model Fitting

After an lslx object is initialized, method $fit_lasso() can be used to fit the specified model into the given data with LASSO penalty function.

lslx_sem$fit_lasso()
CONGRATS: Algorithm converges under EVERY specified penalty level.
  Specified Tolerance for Convergence: 0.001 
  Specified Maximal Number of Iterations: 100 

The $fit_lasso() requires users to specify the considered penalty levels (argument lambda_grid). In this example, the lambda grid is automatically initialized by default. Note that MCP with delta = Inf is equivalent to the LASSO penalty. All the fitting result will be stored in the fitting field of lslx_sem.

Model Summarizing

Unlike traditional SEM analysis, lslx fit the model into data under all the penalty levels considered. To summarize the fitting result, a selector to determine an optimal penalty level must be specified. Available selectors can be found in the section of Penalty Level Selection via ?lslx. The following code summarize the fitting result under the penalty level selected by adjusted Bayesian information criterion (ABIC).

lslx_sem$summarize(selector = "abic")
General Information                                                            
   number of observations                                 75
   number of complete observations                        75
   number of missing patterns                           none
   number of groups                                        1
   number of responses                                    11
   number of factors                                       3
   number of free coefficients                            36
   number of penalized coefficients                        6

Numerical Conditions                                                            
   selected lambda                                     0.058
   selected delta                                       none
   selected step                                        none
   objective value                                     0.806
   objective gradient absolute maximum                 0.001
   objective Hessian convexity                         0.027
   number of iterations                               15.000
   loss value                                          0.593
   number of non-zero coefficients                    42.000
   degrees of freedom                                 35.000
   robust degrees of freedom                             NaN
   scaling factor                                        NaN

Fit Indices                                                            
   root mean square error of approximation (rmsea)     0.060
   comparative fit index (cfi)                         0.986
   non-normed fit index (nnfi)                         0.978
   standardized root mean of residual (srmr)           0.049

Likelihood Ratio Test
                    statistic         df    p-value
   unadjusted          44.468     35.000      0.131
   mean-adjusted          -          -          -  

Root Mean Square Error of Approximation Test
                     estimate      lower      upper
   unadjusted           0.060      0.000      0.116
   mean-adjusted          NaN        NaN        NaN

Coefficient Test (Std.Error = "observed_information")
  Factor Loading
                  type  estimate  std.error  z-value  P(>|z|)  lower  upper
    x1<-ind60    fixed     1.000        -        -        -      -      -  
    x2<-ind60     free     2.181      0.139   15.679    0.000  1.908  2.454
    x3<-ind60     free     1.819      0.152   11.945    0.000  1.520  2.117
    y1<-dem60    fixed     1.000        -        -        -      -      -  
    y2<-dem60     free     1.318      0.181    7.278    0.000  0.963  1.673
    y3<-dem60     free     1.062      0.148    7.155    0.000  0.771  1.353
    y4<-dem60     free     1.302      0.149    8.763    0.000  1.011  1.593
    y5<-dem65    fixed     1.000        -        -        -      -      -  
    y6<-dem65     free     1.223      0.169    7.250    0.000  0.892  1.553
    y7<-dem65     free     1.293      0.159    8.123    0.000  0.981  1.605
    y8<-dem65     free     1.301      0.162    8.023    0.000  0.983  1.619

  Regression
                  type  estimate  std.error  z-value  P(>|z|)  lower  upper
 dem60<-ind60     free     1.463      0.392    3.737    0.000  0.696  2.230
 dem65<-ind60     free     0.537      0.225    2.385    0.017  0.096  0.978
 dem65<-dem60     free     0.845      0.099    8.508    0.000  0.650  1.040

  Covariance
                  type  estimate  std.error  z-value  P(>|z|)  lower  upper
      y5<->y1      pen     0.534      0.331    1.614    0.107 -0.114  1.183
      y4<->y2      pen     0.576      0.604    0.952    0.341 -0.609  1.760
      y6<->y2      pen     1.311      0.607    2.160    0.031  0.122  2.501
      y7<->y3      pen     0.300      0.590    0.508    0.611 -0.857  1.457
      y8<->y4      pen     0.077      0.423    0.182    0.856 -0.753  0.906
      y8<->y6      pen     0.896      0.489    1.832    0.067 -0.063  1.855

  Variance
                  type  estimate  std.error  z-value  P(>|z|)  lower  upper
ind60<->ind60     free     0.454      0.088    5.169    0.000  0.282  0.627
dem60<->dem60     free     3.931      0.929    4.232    0.000  2.110  5.752
dem65<->dem65     free     0.158      0.213    0.743    0.458 -0.259  0.575
      x1<->x1     free     0.083      0.020    4.137    0.000  0.044  0.122
      x2<->x2     free     0.121      0.071    1.702    0.089 -0.018  0.260
      x3<->x3     free     0.473      0.090    5.232    0.000  0.296  0.651
      y1<->y1     free     1.947      0.425    4.582    0.000  1.114  2.779
      y2<->y2     free     6.337      1.085    5.842    0.000  4.211  8.463
      y3<->y3     free     5.134      0.933    5.500    0.000  3.305  6.964
      y4<->y4     free     2.805      0.649    4.324    0.000  1.534  4.077
      y5<->y5     free     2.373      0.454    5.231    0.000  1.484  3.263
      y6<->y6     free     4.327      0.727    5.950    0.000  2.902  5.752
      y7<->y7     free     3.393      0.681    4.980    0.000  2.058  4.729
      y8<->y8     free     2.925      0.603    4.853    0.000  1.744  4.107

  Intercept
                  type  estimate  std.error  z-value  P(>|z|)  lower  upper
        x1<-1     free     0.000      0.085    0.000    1.000 -0.166  0.166
        x2<-1     free     0.000      0.174    0.000    1.000 -0.342  0.342
        x3<-1     free     0.000      0.162    0.000    1.000 -0.318  0.318
        y1<-1     free     0.000      0.302    0.000    1.000 -0.592  0.592
        y2<-1     free     0.000      0.445    0.000    1.000 -0.872  0.872
        y3<-1     free     0.000      0.377    0.000    1.000 -0.739  0.739
        y4<-1     free     0.000      0.385    0.000    1.000 -0.755  0.755
        y5<-1     free     0.000      0.300    0.000    1.000 -0.589  0.589
        y6<-1     free     0.000      0.381    0.000    1.000 -0.747  0.747
        y7<-1     free     0.000      0.379    0.000    1.000 -0.742  0.742
        y8<-1     free     0.000      0.372    0.000    1.000 -0.729  0.729

In this example, we can see that the PL estimate under the selected penalty level doesn’t contain any zero value, which indicates that all of the covariance of measurements are relevant. The $summarize() method also shows the result of significance tests for the coefficients. In lslx, the default standard errors are calculated based on sandwich formula whenever raw data is available. In this example, because raw data is not used for lslx object initialization, standard error is calculated by using observed Fisher information matrix. It may not be valid when the model is misspecified and the data are not normal. Also, it is generally invalid after choosing a penalty level.

Objects Extraction

In lslx, many quantities related to SEM can be extracted by extract-related method. For example, the coefficient estimate and its asymptotic variance can be obtained by

lslx_sem$extract_coefficient(selector = "abic", type = "effective")
        x1<-1/g         x2<-1/g         x3<-1/g         y1<-1/g         y2<-1/g         y3<-1/g 
         0.0000          0.0000          0.0000          0.0000          0.0000          0.0000 
        y4<-1/g         y5<-1/g         y6<-1/g         y7<-1/g         y8<-1/g  dem60<-ind60/g 
         0.0000          0.0000          0.0000          0.0000          0.0000          1.4631 
 dem65<-ind60/g  dem65<-dem60/g     x2<-ind60/g     x3<-ind60/g     y2<-dem60/g     y3<-dem60/g 
         0.5368          0.8452          2.1809          1.8186          1.3182          1.0620 
    y4<-dem60/g     y6<-dem65/g     y7<-dem65/g     y8<-dem65/g ind60<->ind60/g dem60<->dem60/g 
         1.3018          1.2229          1.2933          1.3011          0.4544          3.9313 
dem65<->dem65/g       x1<->x1/g       x2<->x2/g       x3<->x3/g       y1<->y1/g       y5<->y1/g 
         0.1579          0.0828          0.1209          0.4732          1.9467          0.5342 
      y2<->y2/g       y4<->y2/g       y6<->y2/g       y3<->y3/g       y7<->y3/g       y4<->y4/g 
         6.3369          0.5756          1.3112          5.1343          0.2998          2.8054 
      y8<->y4/g       y5<->y5/g       y6<->y6/g       y8<->y6/g       y7<->y7/g       y8<->y8/g 
         0.0769          2.3734          4.3267          0.8960          3.3933          2.9252 
diag(lslx_sem$extract_coefficient_acov(selector = "abic", type = "effective"))
        x1<-1/g         x2<-1/g         x3<-1/g         y1<-1/g         y2<-1/g         y3<-1/g 
        0.00716         0.03043         0.02635         0.09134         0.19811         0.14220 
        y4<-1/g         y5<-1/g         y6<-1/g         y7<-1/g         y8<-1/g  dem60<-ind60/g 
        0.14822         0.09025         0.14533         0.14327         0.13822         0.15327 
 dem65<-ind60/g  dem65<-dem60/g     x2<-ind60/g     x3<-ind60/g     y2<-dem60/g     y3<-dem60/g 
        0.05066         0.00987         0.01935         0.02318         0.03280         0.02203 
    y4<-dem60/g     y6<-dem65/g     y7<-dem65/g     y8<-dem65/g ind60<->ind60/g dem60<->dem60/g 
        0.02207         0.02845         0.02535         0.02630         0.00773         0.86307 
dem65<->dem65/g       x1<->x1/g       x2<->x2/g       x3<->x3/g       y1<->y1/g       y5<->y1/g 
        0.04524         0.00040         0.00504         0.00818         0.18047         0.10952 
      y2<->y2/g       y4<->y2/g       y6<->y2/g       y3<->y3/g       y7<->y3/g       y4<->y4/g 
        1.17651         0.36540         0.36838         0.87136         0.34834         0.42092 
      y8<->y4/g       y5<->y5/g       y6<->y6/g       y8<->y6/g       y7<->y7/g       y8<->y8/g 
        0.17914         0.20582         0.52871         0.23928         0.46422         0.36338 

Here, the type argument is used to specify which types of parameters are used to calculate related quantities. type = "effective" indicates that only freely estimated and penalized non-zero parameters are used. By default, type = "all"