To give a bit of background, I learned structural equation modeling with LISREL. LISREL (LInear Structural RELations) is not only software to conduct SEM; it's also a specific type of notation, developed by Jöreskog, Keesling, and Wiley, that breaks a structural model into matrices. Conducting SEM with LISREL involves typing up a complex control file, defining each matrix by its Greek name from the notation system - Lambda x's and y's to describe factor loadings for exogenous and endogenous variables, respectively; phi to describe the variances and covariances of exogenous latent variables; and so on.
Matrices are an essential part of the math behind statistical analysis, and are also a valuable shorthand to describe complex relationships. When LISREL software was first developed in the 1970s, it made sense that matrices were specified to break down the huge amount of data that would need to be processed for a structural model. Even then, this kind of analysis took time and a lot of computing power.
Today, I'm running on my tiny Macbook Air something that required a mainframe. I don't want to downplay the importance of this matrix approach, because that's still going on on the back end, but when computational power comes cheap and easy, it feels needlessly complicated or, at the very least, outdated to require each matrix be specified to conduct the analysis. And it creates a high intellectual bar for anyone who wants to learn SEM. Is it nice to know that the gamma matrix describes causal paths between exogenous and endogenous variables? Sure. But is it essential to conduct and understand SEM? No, I don't think so.
If you're like me, when you design a structural model to describe your data, you're drawing your variables, some of which are measured directly (observed), some of which are assessed indirectly by combinations of measured variables (latent), and drawing arrows to show how they relate to each other. To use the LISREL approach, I would have to take what I drew and convert it to matrices before I could conduct my analysis. And I would constantly have to double-check which matrix referred to what. Wouldn't it be nice to go straight from model drawing to analysis?
My favorite way to conduct SEM is with an R package called lavaan. Rather than specifying matrices, lavaan has you convert the components of your model into equations, which is a much more direct conversion from a model drawing. A factor is defined followed by the observed variables assessing that factor, as follows:
Factor =~ var1 + var2 + ... + lastvar
It's structured like a linear equation because, at the basic level, that's exactly what it is. You could conduct a factor analysis with 1 or more factors, and factors can be correlated or uncorrelated (also referred to as orthogonal). Hold that thought for now - later on, I'll talk about how orthogonal models can be used for hypothesis testing.
Using my Facebook dataset, I could test many factor models. Three of the measures included in the study assess one topic without subscales, so those could be tested with a single factor model. For simplicity, let's start with one of those. The Diener Satisfaction with Life Scale is a 5-item measure that assesses how satisfied one is with his/her life overall. We would begin by specifying a single factor model. We can name the factor and the overall model whatever we want, but need to use the actual variable names when referring to the observed variables used to assess the factor:
Facebook<-read.delim(file="small_facebook_set.txt", header=TRUE) SWL_Model<-'SWL =~ LS1 + LS2 + LS3 + LS4 + LS5'
Here's what this model looks like drawn out:
This object we created is then used in lavaan to fit our model. You'll want to load lavaan (install if necessary) before proceeding to the next step:
install.packages("lavaan")
library(lavaan)
Now, let's fit our model and check our output:
SWL_Fit<-cfa(SWL_Model, data=Facebook) summary(SWL_Fit)
## lavaan (0.5-23.1097) converged normally after 24 iterations ## ## Number of observations 257 ## ## Estimator ML ## Minimum Function Test Statistic 26.760 ## Degrees of freedom 5 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Information Expected ## Standard Errors Standard ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## SWL =~ ## LS1 1.000 ## LS2 0.974 0.071 13.625 0.000 ## LS3 0.969 0.065 14.909 0.000 ## LS4 0.855 0.071 12.038 0.000 ## LS5 0.790 0.085 9.290 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .LS1 0.984 0.111 8.869 0.000 ## .LS2 0.890 0.102 8.742 0.000 ## .LS3 0.493 0.073 6.744 0.000 ## .LS4 1.124 0.115 9.793 0.000 ## .LS5 2.092 0.197 10.641 0.000 ## SWL 1.678 0.229 7.332 0.000
Seriously, that's all it takes to conduct SEM with lavaan. We'll dig into interpretation later - for now, note that all observed variables load significantly onto the latent variable SWL - but two things you will want to add when examining the summary output is the standardized solution and fit measures:
summary(SWL_Fit, standardized=TRUE, fit.measures=TRUE)
## lavaan (0.5-23.1097) converged normally after 24 iterations ## ## Number of observations 257 ## ## Estimator ML ## Minimum Function Test Statistic 26.760 ## Degrees of freedom 5 ## P-value (Chi-square) 0.000 ## ## Model test baseline model: ## ## Minimum Function Test Statistic 635.988 ## Degrees of freedom 10 ## P-value 0.000 ## ## User model versus baseline model: ## ## Comparative Fit Index (CFI) 0.965 ## Tucker-Lewis Index (TLI) 0.930 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -2111.647 ## Loglikelihood unrestricted model (H1) -2098.267 ## ## Number of free parameters 10 ## Akaike (AIC) 4243.294 ## Bayesian (BIC) 4278.785 ## Sample-size adjusted Bayesian (BIC) 4247.082 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.130 ## 90 Percent Confidence Interval 0.084 0.181 ## P-value RMSEA <= 0.05 0.003 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.040 ## ## Parameter Estimates: ## ## Information Expected ## Standard Errors Standard ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## SWL =~ ## LS1 1.000 1.295 0.794 ## LS2 0.974 0.071 13.625 0.000 1.262 0.801 ## LS3 0.969 0.065 14.909 0.000 1.256 0.873 ## LS4 0.855 0.071 12.038 0.000 1.107 0.722 ## LS5 0.790 0.085 9.290 0.000 1.023 0.578 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .LS1 0.984 0.111 8.869 0.000 0.984 0.370 ## .LS2 0.890 0.102 8.742 0.000 0.890 0.359 ## .LS3 0.493 0.073 6.744 0.000 0.493 0.238 ## .LS4 1.124 0.115 9.793 0.000 1.124 0.478 ## .LS5 2.092 0.197 10.641 0.000 2.092 0.666 ## SWL 1.678 0.229 7.332 0.000 1.000 1.000
Next, let's go through how you could specify a multiple factor model. The Ruminative Response Scale assesses 3 subscales: Depression-Related Rumination, Reflecting, and Brooding. So we can specify a 3 factor model, enclosing all 3 equations, one for each factor, within the quote marks. We'll then fit this model in the same way we did our Satisfaction with Life model above. By default, the 3 factors are allowed to correlate:
RRS_Model<- ' Depression =~ Rum1 + Rum2 + Rum3 + Rum4 + Rum6 + Rum8 + Rum9 + Rum14 + Rum17 + Rum18 + Rum19 + Rum22 Reflecting =~ Rum7 + Rum11 + Rum12 + Rum20 + Rum21 Brooding =~ Rum5 + Rum10 + Rum13 + Rum15 + Rum16 ' RRS_Fit<-cfa(RRS_Model, data=Facebook) summary(RRS_Fit)
## lavaan (0.5-23.1097) converged normally after 40 iterations ## ## Number of observations 257 ## ## Estimator ML ## Minimum Function Test Statistic 600.311 ## Degrees of freedom 206 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Information Expected ## Standard Errors Standard ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## Depression =~ ## Rum1 1.000 ## Rum2 0.867 0.124 6.965 0.000 ## Rum3 0.840 0.124 6.797 0.000 ## Rum4 0.976 0.126 7.732 0.000 ## Rum6 1.167 0.140 8.357 0.000 ## Rum8 1.147 0.141 8.132 0.000 ## Rum9 1.095 0.136 8.061 0.000 ## Rum14 1.191 0.135 8.845 0.000 ## Rum17 1.261 0.141 8.965 0.000 ## Rum18 1.265 0.142 8.887 0.000 ## Rum19 1.216 0.135 8.992 0.000 ## Rum22 1.257 0.142 8.870 0.000 ## Reflecting =~ ## Rum7 1.000 ## Rum11 0.906 0.089 10.138 0.000 ## Rum12 0.549 0.083 6.603 0.000 ## Rum20 1.073 0.090 11.862 0.000 ## Rum21 0.871 0.088 9.929 0.000 ## Brooding =~ ## Rum5 1.000 ## Rum10 1.092 0.133 8.216 0.000 ## Rum13 0.708 0.104 6.823 0.000 ## Rum15 1.230 0.143 8.617 0.000 ## Rum16 1.338 0.145 9.213 0.000 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) ## Depression ~~ ## Reflecting 0.400 0.061 6.577 0.000 ## Brooding 0.373 0.060 6.187 0.000 ## Reflecting ~~ ## Brooding 0.419 0.068 6.203 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Rum1 0.687 0.063 10.828 0.000 ## .Rum2 0.796 0.072 11.007 0.000 ## .Rum3 0.809 0.073 11.033 0.000 ## .Rum4 0.694 0.064 10.857 0.000 ## .Rum6 0.712 0.067 10.668 0.000 ## .Rum8 0.778 0.072 10.746 0.000 ## .Rum9 0.736 0.068 10.768 0.000 ## .Rum14 0.556 0.053 10.442 0.000 ## .Rum17 0.576 0.056 10.370 0.000 ## .Rum18 0.611 0.059 10.418 0.000 ## .Rum19 0.526 0.051 10.352 0.000 ## .Rum22 0.609 0.058 10.428 0.000 ## .Rum7 0.616 0.067 9.200 0.000 ## .Rum11 0.674 0.069 9.746 0.000 ## .Rum12 0.876 0.080 10.894 0.000 ## .Rum20 0.438 0.056 7.861 0.000 ## .Rum21 0.673 0.068 9.867 0.000 ## .Rum5 0.955 0.090 10.657 0.000 ## .Rum10 0.663 0.065 10.154 0.000 ## .Rum13 0.626 0.058 10.819 0.000 ## .Rum15 0.627 0.064 9.731 0.000 ## .Rum16 0.417 0.050 8.368 0.000 ## Depression 0.360 0.072 4.987 0.000 ## Reflecting 0.708 0.111 6.408 0.000 ## Brooding 0.455 0.096 4.715 0.000
If we wanted our 3 factors to be uncorrelated, we would update the RRS_Fit statement as follows:
RRS_Fit<-cfa(RRS_Model, data=Facebook, orthogonal=TRUE)
One more thing, just for fun - instead of having 3 correlated factors, we might instead believe that our 3 factors in turn assess a single, higher-order factor of Rumination. In fact, this type of model makes more sense with this measure. That is, all 22 items can be summed to obtain an overall Rumination score, and combinations of items are summed to get the 3 subscale scores. If we wanted to test this type of model, we would use the following:
RRS_HO_Model<- ' Depression =~ Rum1 + Rum2 + Rum3 + Rum4 + Rum6 + Rum8 + Rum9 + Rum14 + Rum17 + Rum18 + Rum19 + Rum22 Reflecting =~ Rum7 + Rum11 + Rum12 + Rum20 + Rum21 Brooding =~ Rum5 + Rum10 + Rum13 + Rum15 + Rum16 Rumination =~ Depression + Reflecting + Brooding ' RRS_HO_Fit<-cfa(RRS_HO_Model, data=Facebook) summary(RRS_HO_Fit)
## lavaan (0.5-23.1097) converged normally after 33 iterations ## ## Number of observations 257 ## ## Estimator ML ## Minimum Function Test Statistic 600.311 ## Degrees of freedom 206 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Information Expected ## Standard Errors Standard ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## Depression =~ ## Rum1 1.000 ## Rum2 0.867 0.124 6.965 0.000 ## Rum3 0.840 0.124 6.797 0.000 ## Rum4 0.976 0.126 7.732 0.000 ## Rum6 1.167 0.140 8.357 0.000 ## Rum8 1.147 0.141 8.132 0.000 ## Rum9 1.095 0.136 8.061 0.000 ## Rum14 1.191 0.135 8.845 0.000 ## Rum17 1.261 0.141 8.965 0.000 ## Rum18 1.265 0.142 8.887 0.000 ## Rum19 1.216 0.135 8.992 0.000 ## Rum22 1.257 0.142 8.870 0.000 ## Reflecting =~ ## Rum7 1.000 ## Rum11 0.906 0.089 10.138 0.000 ## Rum12 0.549 0.083 6.603 0.000 ## Rum20 1.073 0.090 11.862 0.000 ## Rum21 0.871 0.088 9.929 0.000 ## Brooding =~ ## Rum5 1.000 ## Rum10 1.092 0.133 8.216 0.000 ## Rum13 0.708 0.104 6.823 0.000 ## Rum15 1.230 0.143 8.617 0.000 ## Rum16 1.338 0.145 9.213 0.000 ## Rumination =~ ## Depression 1.000 ## Reflecting 1.123 0.144 7.780 0.000 ## Brooding 1.047 0.148 7.067 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .Rum1 0.687 0.063 10.828 0.000 ## .Rum2 0.796 0.072 11.007 0.000 ## .Rum3 0.809 0.073 11.033 0.000 ## .Rum4 0.694 0.064 10.857 0.000 ## .Rum6 0.712 0.067 10.668 0.000 ## .Rum8 0.778 0.072 10.746 0.000 ## .Rum9 0.736 0.068 10.768 0.000 ## .Rum14 0.556 0.053 10.442 0.000 ## .Rum17 0.576 0.056 10.370 0.000 ## .Rum18 0.611 0.059 10.418 0.000 ## .Rum19 0.526 0.051 10.352 0.000 ## .Rum22 0.609 0.058 10.428 0.000 ## .Rum7 0.616 0.067 9.200 0.000 ## .Rum11 0.674 0.069 9.746 0.000 ## .Rum12 0.876 0.080 10.893 0.000 ## .Rum20 0.438 0.056 7.861 0.000 ## .Rum21 0.673 0.068 9.867 0.000 ## .Rum5 0.955 0.090 10.657 0.000 ## .Rum10 0.663 0.065 10.154 0.000 ## .Rum13 0.626 0.058 10.819 0.000 ## .Rum15 0.627 0.064 9.731 0.000 ## .Rum16 0.417 0.050 8.368 0.000 ## Depression 0.003 0.017 0.204 0.838 ## Reflecting 0.259 0.051 5.072 0.000 ## Brooding 0.064 0.026 2.446 0.014 ## Rumination 0.356 0.073 4.858 0.000
That's all for now! Check back for future posts to better understand output and how to assess and improve your models.
No comments:
Post a Comment