Sometimes we need to study  that is, understand what causes or contributes to  twolevel outcomes: fracture or no fracture, malignant or benign, present or absent. Some of these variables are nonordered categories (like heads or tails) while others can be thought of as twolevel ordinal outcomes (alive or dead is one example where one outcome is clearly better than the other, but it wouldn't be considered continuous).
While the descriptive statistics for continuous variables would be a mean (and standard deviation) or median, the descriptive statistics for a dichotomous variable would be frequencies and proportions (percentages in decimal form). These proportions could be considered probabilities of a particular outcome. For instance, if you flip a coin enough times, your proportions of heads and tails would both be close to 0.5, meaning the probability of flipping heads, for instance, is 0.5.
On the other hand, you might want to look at probabilities of one outcome versus another with what we call odds. These are expressed as two whole numbers. So if half of your coin flips will be heads (1/2) and half of your coin flips will be tails (1/2), we would express those odds as 1 to 1 (or 1:1). Basically, probabilities are decimals and odds are built from two fractions with the same denominator. They tell you similar things, just in different numerical forms.
Now, what if you want to understand the relationship between two dichotomous variables? The chisquare test is one way that you could do that. But this test just tells us if the combination of these two dichotomous variables show a different relationship than you would expect by chance alone. Also, chisquare is biased to be significant when sample sizes are large, so you might have a statistically significant effect that doesn't have any practical importance.
If you want to understand the strength of a relationship, you need an effect size. One effect size for describing the relationship between two dichotomous variables  one that has some very important applications I'll delve into later  is the odds ratio. An odds ratio tells you how much more likely an outcome is at one level of a dichotomous variable than at the other level: it's the odds of one outcome divided by the odds of the other.
Let's use a practical example. I have a cancer drug I want to test and see if it will cause people to go into remission. I randomly assign people to take my drug or a placebo, and at the end of the study, I run tests to see if their cancer is in remission or not. That means I have two variables (group and outcome) each with two levels (drug or placebo, in remission or not in remission). At the end of the study, I create a 2 x 2 table of frequencies  what's called a 2 x 2 contingency table:
Group

In Remission

Not In Remission


Drug 
a

b

Placebo 
c

d

Each cell would normally have a frequency, but I've instead given the labels for each cell that would be used for the formula (ad). The formula for the odds ratio is (a*d)/(b*c).^{1} If I fill in my table with fake values:
Group

In Remission

Not In Remission


Drug 
125

375

Placebo 
30

470

and fill in those values for the formula  (125*470)/(375*30)  I get an odds ratio of 5.2. What this means is that people who took my drug are 5.2 times more likely to be in remission at the end of the study than people who took a placebo.
The main problem with the odds ratio is that you can't compute one if any of your cells has a frequency of 0. The resulting odds ratio will either be 0 (if the 0 cell ends up in the numerator) or undefined (if the 0 cell ends up in the denominator). If you encounter this problem but still need to compute an odds ratio, the usual approach is to add 0.5 to all 4 cells.
As I said previously, the odds ratio  and a specific transformation of it  has some very important applications, especially in the work I do as a psychometrician. Look for a post or two on that later!
^{1}Technically, the formula is the odds of one outcome (a/c) divided by the other (b/d), but you can crossmultiply your fractions, resulting in (a*d) divided by (b*c).
No comments:
Post a Comment