Wednesday, July 19, 2017

Statistical Sins: Misleading Graphs

As you're probably aware, the Senate healthcare bill does not have the support it needs to pass. Reading stories about the bill, and the history of the Republican effort to repeal the Affordable Care Act, I'm reminded of this great (terrible) graph that was actually presented in Congressional testimony back in 2015 - a graph showing that Planned Parenthood has increased abortion services and decreased preventive services. But because preventive services still far outpaces abortions, they altered the chart:


In fact, when I decided to start the Statistical Sins series, this graph was on my mind. The chart should look more like this:


So what initially looked like a dramatic increase in abortion services is actually a nearly flat line.

The thing is, misleading graph-making happens a lot. It's not unusual for people to change the minimum value on the y-axis to zoom in on a very small trend to make it look more dramatic. In a recent book I read, Fooled by Randomness, author Nassim Nicholas Taleb shows graphs of market trends that have been cropped to make small blips look like major leaps and falls in stock prices.

The problem is that certain people will purposefully edit graphs to prove some point, so reminding people about proper graph-making isn't going to help with people who are intentionally misleading us. But there are certainly things people can do to help with unintentionally misleading graphs.

First, you should know something about the type of data you're working with. When you're working with ratio values (variables that have a meaningful 0 - that is, 0 means the absence of something), you should use 0 as the minimum value on the y-axis (the vertical axis on the side). So things like money, healthcare services, and so on should have 0 as the minimum value.

If you're working with interval data (variables that are continuous but do not have a meaningful 0), like temperature, you should choose minimum and maximum values that make sense with the scale. Each scale will be different, so there will be some judgment calls. When in doubt, talk to a stats person.

It should go without saying that placement of points on the chart should reflect the actual value of that data point, and you should always have standard scales on the axes. I've seen unbelievably wrong graphs on news programs, and chatting with people in the industry, I've learned that often, they use stock graph images and just change the values. They're not purposefully being misleading; they just don't even think about the fact that graphs are supposed to reflect real numbers. I can't even begin to say how wrong that is. You can create simple graphs in Excel. Don't reuse graphs. Ever.

And don't label points if you didn't actually measure or include them. If you look at the first chart above, you'll see each year from 2006 to 2013 marked, with a very clear linear trend on both variables (abortion services and preventive services). But I highly doubt the data were that clean. More likely, the chart only includes 2006 and 2013 data, with a line connecting them. This too is misleading, not just because the intermediate values are false, but because it removes the variance. You can't see how much these values bounce around from year to year. So an increase of 37,250 instances of abortion services might be meaningful (if these values don't usually vary) or might be just normal variation. The same goes with the drop in preventive services. It appears Vox came to the same conclusion, because they only label 2006 and 2013 on the x-axis.

I'm thinking it might not be a bad idea to write a statistics post on visualizing data. Look for that some Sunday!

No comments:

Post a Comment