Tuesday, April 25, 2017

U is for Univariate

During the spring of my third year of grad school (10 years ago now!), around the time I was finishing my masters, I was on the job market. We were required to have an internship related to teaching and/or research (either 1,000 hours in one of those areas of 500 hours in each; I did the latter). I was applying for job as a data analyst at various non-profits and remember in one interview, a person asking me what statistics I knew. I told her what courses I'd taken. We moved on with the interview. A little more into the interview, she asked again what statistics I knew. I was really at a loss for how to answer. I had told what courses I had taken in statistics. I asked if she needed me to list each and every statistical analysis I knew how to do. She nodded, so I started rattling off, "Z-test, t-test, chi-square, regression, etc." She stopped me, and didn't bring it up again.

I did not get that job.

I remember heading home after that interview and wondering how I should have responded. Sure, some areas I know like structural equation modeling or meta-analysis have nice neat titles, and I had said those first. But how to describe the many other statistics I know, that are taught in beginning or advanced statistics classes? And then it hit me - statistics can really be divided into two types: univariate and multivariate. So now, when people ask what statistics I know, I tell them univariate and multivariate statistics, including... blah blah blah. So far, everyone is happy with that response.

This is kind of a non-answer. Or rather, it doesn't tell you any more than listing what classes I took, but people seem satisfied with it. These classifications simply refer to the number of variables an analysis involves. Univariate statistics use only one variable.

This obviously includes descriptive statistics like mean and standard deviation, or frequencies. There are other statistics that describe the shape of a distribution of scores (which tell you how much a distribution deviates from normality) that would also be considered univariate. But univariate statistics can also include some inferential statistics, provided you're only using one variable.

For instance, you can examine whether the observed frequencies match a hypothesized distribution for a single categorical variable; this type of analysis is called goodness of fit chi-square. (There's another type of chi-square that uses two categorical variables; basically chi-square is like an analysis of variance for categorical data. The variance you're examining is how much the proportions of categories deviate from what is expected.)

You can also test whether a single sample significantly differs from the population value with what is called a one-sample t-test. This works when the population value is known (such as with standardized tests that are normed to have a known population value). For instance, I might have an intervention that is supposed to turn children in geniuses, and I could compare their average cognitive ability (the currently accepted term for "intelligence) score to the population mean and standard deviation (which are often set at 100 and 15 respectively for many cognitive ability tests).

The t-test I demonstrated yesterday, on the other hand, is called an independent samples t-test. Since we have two variables - a grouping variable and test score variable, this analysis is considered multivariate.

I do know people who will debate you on the meaning of the terms, and insist that a test is univariate unless there is more than 1 independent variable and/or more than 1 dependent variable. And I've heard people talk about univariate, bivariate, and multivariate. So there's probably a bit of a gray area here. Anyone who tells you there are no debates over minutiae in statistics is a liar.

No comments:

Post a Comment