Deeply Trivial: Statistics Sunday: Using semPlot

using semPlot with Facebook Models Today's post will be mostly demonstration, but I'll build on some of the things I covered in yesterday's semPlot post. This month, I've blogged about two SEM models: confirmatory factor analysis and latent variable path analysis. Using the models from those posts, I'll show how to diagram them in semPlot and how to make some changes to the appearance of the plots to make them presentation ready.

Facebook<-read.delim(file="small_facebook_set.txt", header=TRUE)

First up, confirmatory factor analysis. As part of that post, I tested models with the Satisfaction with Life Scale and Ruminative Response Scale. In fact, I gave a sneak preview of semPlot with the SWLS model.

SWL_Model<-'SWL =~ LS1 + LS2 + LS3 + LS4 + LS5'
library(lavaan)

## This is lavaan 0.5-23.1097

## lavaan is BETA software! Please report any bugs.

SWL_Fit<-cfa(SWL_Model, data=Facebook)
library(semPlot)
semPaths(SWL_Fit)

This diagram is fine to quickly show what the model looks like, but we want to tweak it if we were to use it for a presentation or publication. First up, I used very abbreviated variable names, so semPlot is having no trouble displaying all of it in the diagram. But I might want more descriptive names for my variables, and I probably want to do that without having to rename variables or rewrite my model.

labels<-c("Ideal Life","Excellent","Satisfied","Important","Change","SWL\nScale")
semPaths(SWL_Fit,nodeLabels=labels,sizeMan=10)

I've selected a keyword (or two) for each SWLS item, and made that the variable name. For instance, item 1 text is, "In most ways, my life is close to ideal." When creating your labels object, you want to put the y-variables in the order in which they appear in the equation(s), followed by x-variables, again in the order in which they appear.

We probably want to add a title and our parameter estimates. I'll use standardized estimates, since these are a bit easier to interpret. But there are a few other changes I'd like to make. semPlot automatically fades based on size of the parameter estimates, so larger estimates are darker than smaller estimates. It also changes the width of paths based on size, including for error estimates; so larger error means a thicker, darker line. Fortunately, I can turn these features off. I can also change the size of the arrowheads. (BTW, Man refers to manifest (or observed) variables, and Lat refers to latent variables (or factors); so these size arguments change the width of observed and latent variables, respectively.)

semPaths(SWL_Fit,what="std",edge.label.cex=0.75,edge.color="black",
nodeLabels=labels,sizeMan=10,sizeLat=10,fade=FALSE,esize=2,asize=2)
title("Diener Satisfaction with Life Scale CFA", line=3)

Feel free to play around with the numbers I've selected to see how it affects the final look.

But then, the Satisfaction with Life Scale is a short measure and this is a small, simple model. What happens if I throw a much larger model at semPlot? Including full labels for the observed variables might overwhelm a large model, so I may simple want to use item number only and only label the factor.

RRS_Model<- '
  Depression =~ Rum1 + Rum2 + Rum3 + Rum4 + Rum6 + Rum8 + 
    Rum9 + Rum14 + Rum17 + Rum18 + Rum19 + Rum22
  Reflecting =~ Rum7 + Rum11 + Rum12 + Rum20 + Rum21
  Brooding =~ Rum5 + Rum10 + Rum13 + Rum15 + Rum16
'
RRS_Fit<-cfa(RRS_Model, data=Facebook)
rrslabels<-c(1:4,6,8,9,14,17:19,22,7,11,12,20,21,5,10,13,15,16,"Depression",
"Reflecting","Brooding")
RRS<-semPaths(RRS_Fit,what="par",whatLabels="hide",nodeLabels=rrslabels, sizeLat=12,
              sizeMan=4.5,edge.label.cex=0.75, edge.color="black", asize=2)

Adding estimates to this diagram would probably make it difficult to read, so personally, I would probably also create a table with the actual parameter estimates and use the diagram for display purposes only. Instead, I allowed fading, so that stronger relationships would be darker than weaker relationships. This is done by asking it to include parameter estimate (what="par") but then to hide those labels (whatLabels="hide").

What about for even more complex models, like a latent variable path model? In that post, I tested a structural regression with rumination and depression.

Rum3_Dep<-'
Depression =~ Dep1 + Dep2 + Dep3 + Dep4 + Dep5 + Dep6 + Dep7 + Dep8 +
              Dep9 + Dep10 + Dep11 + Dep12 + Dep13 + Dep14 + Dep15 + Dep16
DRR =~ Rum1 + Rum2 + Rum3 + Rum4 + Rum6 + Rum8 + Rum9 + Rum14 + Rum17 + Rum18 + 
              Rum19 + Rum22
Reflecting =~ Rum7 + Rum11 + Rum12 + Rum20 + Rum21
Brooding =~ Rum5 + Rum10 + Rum13 + Rum15 + Rum16
Depression ~ DRR + Reflecting + Brooding
'
RD3<-sem(Rum3_Dep, data=Facebook)
semPaths(RD3)

For this type of model, I tend prefer a different rotation, with the x-variables on the left and the y-variables on the right. I also want to customize my labels. Remember, y goes before x, but observed go before latent, so the order is: y-observed, x-observed, y-latent, x-latent. I may also use Lisrel style errors, which are simple arrows rather than curved double-pointed arrows, and may change the appearance of the covariances for the exogenous latent variables, so they don't get as lost.

rrsdlabels<-c(1:16,1:4,6,8,9,14,17:19,22,7,11,12,20,21,5,10,13,15,16,"Depression",
"Dep-Related","Reflecting","Brooding")
semPaths(RD3, rotation=2,nodeLabels=rrsdlabels,sizeMan=3,
style="lisrel",curvePivot=TRUE, edge.color="black", )
title("Rumination and Depression Structural Regression Model")

If all paths are significant, it's okay not to have parameter estimates or fading displayed. I used fading for the measurement model, but I could just have easily done this instead: added additional text indicating that everything is significant. Pretend for the sake of argument that this is true for this model. We could easily add this descriptive text on the model drawing.

semPaths(RD3, rotation=2,nodeLabels=rrsdlabels,sizeMan=3,style="lisrel",
curvePivot=TRUE, edge.color="black")
title("Rumination and Depression Structural Regression Model")
text(0,-.9,"All paths and variances significant, p<0.05")

Either approach is fine - it really depends on what information you want to communicate. Do you want to demonstrate which items best measure the underlying construct? Or do you simply want to show that all items significantly contribute to the measurement of the construct? The same goes for LVPA; do you want to show what variables are the strongest predictors or just that all variables are significant predictors? In this particular case, not all paths are significant - brooding and reflecting do not significantly predict depression. So I may want a different approach to highlight this fact.

semPaths(RD3, rotation=2,nodeLabels=rrsdlabels,sizeMan=3,style="lisrel",
curvePivot=TRUE, edge.color="black", what="par",whatLabels="hide",)
title("Rumination and Depression Structural Regression Model")

With the fading on, we see that Depression-Related Rumination is the strongest predictor of Depression. Reflecting is the weakest predictor, but both brooding and reflecting are non-significant, and the paths are very light.

Back to A to Z posts tomorrow, where I'll talk about a new data structure - tibbles!

Deeply Trivial

Sunday, April 22, 2018

Statistics Sunday: Using semPlot

No comments:

Post a Comment