Sunday, September 3, 2017

Statistics Sunday: Everyone Loves a Log (Odds Ratio)

A couple weeks ago, I introduced the concept of the odds ratio, the odds of one outcome relative to another. Odds ratios are often used to present and understand dichotomous outcome data, and researchers using logistic regression - which like linear regression, uses one or more variables to predict an outcome, but unlike linear regression, predicts a dichotomous (not continuous) outcome - will often present results in terms of odds ratios. And odds ratios are used a lot in news stories because they're a bit easier for us to understand: e.g., people who do X are twice as likely to have this outcome than people who do Z. We're naive statisticians, with a rudimentary understanding of gambling, so we have some understanding of odds.

The thing about odds ratios - and this issue becomes more pronounced when you're working with a bunch of odds ratios - is that the distribution is not symmetrical, which creates some very interesting results when we look at the inverse odds. For instance, X being twice as likely as Y (odds ratio = 2.0) makes sense. Y being half as likely as X (odds ratio = 0.5) might not make as much sense. But they're the same. And it gets more tricky with other odds ratios. Because something with 50/50 odds will have an odds ratio of 1.0, a more likely outcome A will be greater than 1 and a less likely outcome A will be the inverse, a fraction between 0 and 1. The upper bound of an odds ratio > 1.0 is infinite (∞). And the lower bound of an odds ratio < 1.0 is also infinite, but as a fraction (1/∞) moving asymptotically toward 0.

Most people I know who work with odds ratios regularly will - instead of presenting a fraction odds ratio - simply switch the order of the variables in the analysis. But what if you're working with a bunch of odds ratios around an outcome and you know that some will be greater than 1.0 and some will be less than 1.0?

It's not an unusual situation. A logistic regression may have multiple predictors, some of which will have a negative coefficient, meaning a less likely outcome A. And if you wanted to do a meta-analysis on something with a binary outcome, your effect size will be odds ratio. Some of the analyses you would run in a meta-analysis - such as a special type of regression frequently called meta-regression - won't work so well with variables that have an asymmetric distribution. Meta-regression, which is similar to linear regression but adds an additional weight to outcomes, assumes a continuous linear outcome.

But have no fear! There's a solution: the log odds ratio. Here are the basic equations you need:


It's a really easy correction. You're simply doing a log-transform of your odds ratio - more specifically, you're taking the natural log of your odds ratio.

You can do this in Excel with =LN(oddsratio). And many statistics programs are capable of log-transformations.

In SPSS, the syntax is very similar to Excel: COMPUTE log_oddsratio = LN(oddratio). (Or, if you prefer to use the GUI, go to the Transform menu, and click Compute Variable. You can type that text directly in the box, or find the LN function in the Arithmetic function group.)

The syntax in R is simply: dataframe$log_oddsratio <- LOG(dataframe$oddsratio)

And if you're working in SQL to interact with a relational database, natural log is a mathematical function, usually LOG or LN, depending on which vendor you're using. (For instance, I just wrapped an online course where I learned PostgreSQL, for which the syntax is LN.)

Because this is probably better seen than described, I've done the following. I referred back to the 2x2 contingency table from the odds ratio post and decided to test out some different frequencies. This table has 4 cells, so I simplified things a bit. I made it so the placebo group had 50/50 odds of being in remission, so those two cells are both 250. The only thing I changed is the drug group, where I tested values of 1 in the 'in remission' group and 499 in the 'not in remission' group all the way to 499 in remission and 1 not. The odds ratios for those combinations ranged from 0.002 to 499.0 (Note, these extremes would be exceedingly rare - it's very unlikely you'll see an odds ratio over 10, let alone in the 100s. This is purely for demonstration purposes.) When you graph it, it looks like this:


When I took the natural log of that array, I had a perfectly symmetrical (though not completely linear - but close enough for the typical range of log odds ratios) -6.21 to +6.21:


The natural log gets rid of the inverse property of odds ratios less than 1.0, and sets the bounds from -∞ to +∞. Now you can analyze those values and, if you want to present summary statistics as odds ratios instead of log odds ratios, you can just convert them back. To do that, you raise the natural number, e, to the power of the log odds ratio.

The syntax is EXP and the number you're converting:

Excel: =EXP(log_oddsratio)

SPSS: COMPUTE oddsratio = EXP(log_oddsratio) or select Exp from the Arithmetic functions in the Transform->Compute Variable dialog box

R: dataframe$oddsratio<-exp(dataframe$log_oddsratio)

SQL: EXP(log_oddsratio)

So you can play too, if you'd like, here's the Excel file containing the raw data - I've left the functions in, as well as the two charts, so you can change numbers if you'd like to play around.

Meta-analysis isn't the only analysis that uses (log) odds ratios. The Rasch measurement model (the psychometric approach I use) is built on log odds ratios. That's part of the magic behind its ability to turn ordinal scales into interval scales of measurement. (More on that later.)

BTW, for anyone wondering why I named this post as I did: Hopefully I'm not the only one who remembers the great Slinky parody seen on Ren & Stimpy. (In fact, when I typed "Ren and Stimpy" into Google, it auto-completed with "log," so clearly I'm not.)

For review:

No comments:

Post a Comment