## Sunday, August 20, 2017

### Statistics Sunday: Odds Ratios

As you may recall, there are different types of variables. Some variables are continuous. But some are categories, and some of those categories consist of two levels, or what we call a dichotomy. A coin flip is one example: we have two outcomes, heads and tails.

Sometimes we need to study - that is, understand what causes or contributes to - two-level outcomes: fracture or no fracture, malignant or benign, present or absent. Some of these variables are non-ordered categories (like heads or tails) while others can be thought of as two-level ordinal outcomes (alive or dead is one example where one outcome is clearly better than the other, but it wouldn't be considered continuous).

While the descriptive statistics for continuous variables would be a mean (and standard deviation) or median, the descriptive statistics for a dichotomous variable would be frequencies and proportions (percentages in decimal form). These proportions could be considered probabilities of a particular outcome. For instance, if you flip a coin enough times, your proportions of heads and tails would both be close to 0.5, meaning the probability of flipping heads, for instance, is 0.5.

On the other hand, you might want to look at probabilities of one outcome versus another with what we call odds. These are expressed as two whole numbers. So if half of your coin flips will be heads (1/2) and half of your coin flips will be tails (1/2), we would express those odds as 1 to 1 (or 1:1). Basically, probabilities are decimals and odds are built from two fractions with the same denominator. They tell you similar things, just in different numerical forms.

Now, what if you want to understand the relationship between two dichotomous variables? The chi-square test is one way that you could do that. But this test just tells us if the combination of these two dichotomous variables show a different relationship than you would expect by chance alone. Also, chi-square is biased to be significant when sample sizes are large, so you might have a statistically significant effect that doesn't have any practical importance.

If you want to understand the strength of a relationship, you need an effect size. One effect size for describing the relationship between two dichotomous variables - one that has some very important applications I'll delve into later - is the odds ratio. An odds ratio tells you how much more likely an outcome is at one level of a dichotomous variable than at the other level: it's the odds of one outcome divided by the odds of the other.

Let's use a practical example. I have a cancer drug I want to test and see if it will cause people to go into remission. I randomly assign people to take my drug or a placebo, and at the end of the study, I run tests to see if their cancer is in remission or not. That means I have two variables (group and outcome) each with two levels (drug or placebo, in remission or not in remission). At the end of the study, I create a 2 x 2 table of frequencies - what's called a 2 x 2 contingency table:

Group
In Remission
Not In Remission
Drug
a
b
Placebo
c
d

Each cell would normally have a frequency, but I've instead given the labels for each cell that would be used for the formula (a-d). The formula for the odds ratio is (a*d)/(b*c).1 If I fill in my table with fake values:

Group
In Remission
Not In Remission
Drug
125
375
Placebo
30
470

and fill in those values for the formula - (125*470)/(375*30) - I get an odds ratio of 5.2. What this means is that people who took my drug are 5.2 times more likely to be in remission at the end of the study than people who took a placebo.

The main problem with the odds ratio is that you can't compute one if any of your cells has a frequency of 0. The resulting odds ratio will either be 0 (if the 0 cell ends up in the numerator) or undefined (if the 0 cell ends up in the denominator). If you encounter this problem but still need to compute an odds ratio, the usual approach is to add 0.5 to all 4 cells.

As I said previously, the odds ratio - and a specific transformation of it - has some very important applications, especially in the work I do as a psychometrician. Look for a post or two on that later!

1Technically, the formula is the odds of one outcome (a/c) divided by the other (b/d), but you can cross-multiply your fractions, resulting in (a*d) divided by (b*c).