## Friday, April 28, 2017

### X is for X (Independent or Predictor Variables)

April A to Z is an interesting time.These last three posts (today, tomorrow, and Sunday) are on topics I would probably teach first (or at least close to the beginning) if I were teaching a course on statistics. But since we have to go in alphabetical order - that's part of the fun and challenge - I'm finally getting to some of the basic concepts of statistics.

As with necessary and sufficient conditions, independent and dependent variables are often difficult topics for people. I remember doing well in my 6th grade science fair mostly (I think) because I was able to correctly state my independent and dependent variable. I can't take credit for that - my mom, one of the most logical people I know, taught me the difference.

In research language, the independent variable is the causal variable. It's what you think causes your outcome. In an experiment, where you can control what happens to your participants, it is the variable you manipulate to see how it affects the outcome.

In statistics, the term can be used a bit more broadly. Your x, used to signify the independent variable, is what you think affects your outcome; you conduct statistics to measure and understand that effect. Some statisticians will refer to their x as the independent variable whether they can manipulate it or not. Others will reserve "independent" only for manipulated variables, and will use the broader term, "predictor variable" to refer to variables they think affects the outcome but that they can't necessarily manipulate. And in some analyses, like correlation, we will often arbitrarily define one variable as x, since the equation for correlation uses the symbols x and y.

In our caffeine study, our x variable is caffeine. Specifically, it is a two-level variable - the experimental group received caffeine and the control group did not. I don't have to use just two levels - I could have more if I'd like. I've set it up as an experiment, where I can directly control the independent variable.

But I could just as easily set it up as a non-experiment. For example, I could have people come into the lab and have them sit in a waiting room with a coffee maker. While they're waiting for the study to start, I could encourage them to have a cup (or more) of coffee. I could then measure the amount of caffeine they had (based on the number of cups of coffee they consumed) and see if that affects test score. Now, you'd have stronger evidence for the effect of caffeine if you did an experiment, but caffeine is still your x variable (statistically, at the very least) even in this example.