Deeply Trivial: Statistics Sunday: Introduction to Factor Analysis

One of the first assumptions of any measurement model is that all items included in a measure are assessing the same thing. In measurement, we refer to this thing being measured as a construct - also known as a latent variable or factor.

The measurement model, a 3 factor model, from last week's Statistics Sunday post

One method of checking that the items are assessing this factor (or these factors) is through factor analysis. This statistical approach is one of the ways of seeing if items "hang together" in such a way that they appear to measure the same thing. It does this by looking at covariance of the items - shared variance between items. Covariance is basically a combination of correlation and variance.

Factor analysis can be conducted in two ways - exploratory, a data-driven approach that separates items into factors based on combinations of items suggested by the data, and confirmatory, where the person conducting the analysis specifies which items measure which factors. Data-driven approaches are tricky, of course. You don't know what portion of the covariance is systematic and what portion is error. Exploratory factor analysis (EFA) doesn't care.

This is why, for the most part, I prefer confirmatory factor analysis (CFA), where you specify, based on theory and some educated guesses, which items fall under which factor. As you might have guessed, you can conduct a factor analysis (exploratory or confirmatory) with any number of factors. You would specify your analysis as single factor, two-factor, etc. In fact, you can conduct EFA by specifying the number of factors the program should extract. But keep in mind, the program will try to find something that fits, and if you tell it there should be 2 factors, it will try to make that work, even if the data suggest the number of factors is not equal to 2.

Whether you're conducting EFA or CFA, the factors are underlying, theoretical constructs (also known as latent variables). These factors are assessed by observed variables, which might be self-reported ratings on a scale of 1 to 5, judge ratings of people's performances, or even whether the individual responded to a knowledge item correctly. Basically, you can include whatever items you think hang together, with whatever scales they use in measurement. In the measurement model above, observed variables are represented by squares.

While the observed variables have a scale, the factors do not - they are theoretical and not measured directly. Therefore, analysis programs will request you to select or select on their own (through convention) one of the items connected to the factor to give it a scale. (Most programs will, by default, fix the first item at 1, meaning a perfect relationship between that item and the factor, to give the factor a scale.) The remaining factor loadings - measures of the strength of the relationship between the observed variable and the factor - will be calibrated to be on the same scale as the fixed item. In the model above, factors are represented by circles. The double-ended arrows going between the three factors represent correlations between these factors. And the arrows pointing back at an individual factor or observed variables refers to error - variance not explained by the model.

When conducting a factor analysis, you can (ahem, should) request standardized loadings, which like correlations, range from -1 to +1, with values closer to 1 (positive or negative) reflecting stronger relationships. Basically, the closer to the absolute value of 1, the better that item measures your factor.

In addition to factor loadings, telling how well an item maps onto a factor, there are overall measures that tell you how well your model fits the data. More on this information, and how to interpret it, later!

In other news, I'm looking at my previous blog posts and putting together some better labels, to help you access information more easily. Look for those better tags, with a related blog post, soon!

Deeply Trivial

Sunday, March 4, 2018

Statistics Sunday: Introduction to Factor Analysis

1 comment: