Monday, February 26, 2018

Statistics Sunday (Late Edition): Exogenous vs. Endogenous Variables

I had a busy (but great) Sunday, and completely spaced on writing my Statistics Sunday post! But then, I could just say I posted on Sunday within a certain margin of error. (You've probably heard the one about the three statisticians trying to hit a target. One hits to the left of center, the other to the right. The third yells, "We got it!")

I'm planning on writing more posts on one of my favorite statistical techniques (or set of techniques): structural equation modeling. For today, I'm going to write about some terminology frequently used in SEM - exogenous and endogenous variables.

(Note, these terms are used in other contexts as well. My goal is to discuss how they're used in SEM specifically, as a set-up for future SEM posts.)

Whenever you put together a structural equation model, you're hypothesizing paths between variables. A path means one variable influences/is caused by another. In a measurement model, where observed variables are being used to reflect an underlying (latent) construct, the path from the construct to each of the variables signifies that the construct influences/causes the values of the observed variables.
Created with the semPlot package using a lavaan dataset - look for a future blog post on this topic!
In a path model, it means the same thing - that path means that one variables causes the other - but the variables used in the paths are usually the same kind of variable (observed or latent).
Created with the semPlot package using a lavaan dataset - look for a future blog post on this topic!
For instance, in the figure immediately above, Ind (short for Industrialization) causes both D60 and D65 (measures of democratization of nations in 1960 and 1965). D60, in turn, also causes D65. All 3 are latent variables, with observed variables being used to measure them. Lets ignore those observed (square) variables for now and just look at the 3 latent variables in the circles. Exogenous is the term used to refer to variables that cause other variables (and are not caused by any other variables). Endogenous refers to variables caused by other variables. So in the model just above, Ind is the only exogenous variable: it is caused by 0 (in the context of the model) and causes 2 variables. Both D60 and D65 are endogenous variables: D60 is caused by 1 and D65 is caused by 2.

You may be wondering what we would call a variable that is caused by 1 or more variables, and in turn, causes 1 or more variables. In this terminology, we would still call them endogenous, but we might also use another term: mediator.

Stay tuned for more SEM posts, where we'll start digging into the figures above and showing how it all works! I'm also gearing up for Blogging A to Z; look for a theme reveal on March 19. Spoiler alert: It will be stats related.

1 comment:

  1. Given their use in other contexts, these terms makes sense. Exogenous means to come from the outside. Endogenous means from within. Endogenous variables have their variance explained from within the model, i.e., explained by the exogenous variable. Exogenous variables would have to have their variances explained by other unmodeled sources of variances.