Monday, November 13, 2017

Statistics Sunday: What is Bootstrapping?

Last week, I posted about the concept of randomness. This is a key concept throughout statistics. For instance, I may have mentioned before but many statistical tests assume that the cases used to generate the statistics were randomly sampled from the population of interest. That, of course, rarely happens in practice, but this is a key concept in what we call parametric tests - tests that compare to an assumed population distribution.

The reason for this focus on random sampling goes back to the nature of probability. Every case in the population of interest has a chance of being selected - an equal chance in simple random sampling, and unequal but still predictable chances when more complex sampling methods are used, like stratified random sampling. It's true that you could, by chance, draw a bunch of really extreme cases. But there are usually fewer cases in the extremes.

If you look at the normal distribution, for instance, there are so many more cases in the middle that you have a much higher chance of drawing cases that fall close to the middle. This means that, while your random sample may not have as much variance as the population of interest, your measures of central tendency should be pretty close to the underlying population values.

So we have a population, and we draw a random sample from it, hoping that probability will work in our favor and give us a sample data distribution that resembles that population distribution.

But what if we wanted to add one more step, to really give probability a chance (pun intended) to work for us? Just as cases that are typical of the population are more likely to end up in our sample, cases that are typical of our sampling distribution are more likely to end up in a sample of the sample. (which we'll call subsample for brevity's sake) And if we repeatedly drew subsamples and plotted the results, we could generate a distribution that gets a little closer to the underlying population distribution. Of course, we're limited by the size of our sample, in that our subsamples can't exceed that size, but we can bypass that by random sampling with replacement. That means that after pulling out a case and making a note of it, we put it back into the mix. It could get drawn again. This gives us a theoretically limitless sample from which to draw.

That's how bootstrapping works. Bootstrapping is a method of generating unbiased (though it's more accurate to say less biased) estimates. Those estimates could be things like variance or other descriptive statistics, or it could be used in inferential statistical analyses. Bootstrapping means that you use random sampling with replacement to estimate values. Frequently, it means using your observed data as a sort of population, and repeatedly drawing large samples with replacement from that data. In our Facebook use study, we used bootstrapping to test our hypothesis.

To summarize, we measured Facebook use among college students, and also gave them measures of rumination (tendency to fixate on negative feelings), and subjective well-being (life satisfaction, depression, and physical symptoms of ill health). We hypothesized that rumination mediated the effect of Facebook use on well-being. Put in plain language, we believed using Facebook made you more likely to ruminate, which in turn resulted in lower well-being.

The competing hypothesis is that people who already tend to ruminate use Facebook as an outlet for rumination, resulting in lower well-being. In this alternative hypothesis, Facebook is the mediator, not rumination.

Testing mediation means testing for an indirect effect. That is, the independent variable (Facebook use) affects the dependent variable (well-being) indirectly through the mediator (rumination). We used bootstrapping to estimate these indirect effects; we took 5000 random samples of our data to generate our estimates. Just as we're more likely to draw cases typical of our sample (which are hopefully typical of our population), we're more likely to draw samples that (hopefully) have the typical effect of our population. The resulting indirect effects we get from bootstrapping won't be the same as a simple analysis of our observed data. We're using probability to remove bias from our estimates.

And what did we find in our Facebook study? We found stronger support for our hypothesis than the alternative. That is, we had stronger evidence that Facebook use leads to rumination than the alternative that rumination leads to Facebook use. If you're interested in finding out more, you can read the article here.

1 comment:

  1. Hi Sara, interesting blog post!

    One thing I am wondering: I am pretty sure that the strength of indirect effects across different specifications is not informative with respects to causality and can not help us decide between different models. Felix Thoemmes wrote a paper about that: Reversing Arros in Mediation Models Does Not Distinguish Plausible Models (

    Thus, I think it's not valid to say that you find stronger support for hypothesis A over B. You can only say that if we assume that hypothesis A is true, the estimated indirect effect would be larger. But the estimated indirect effect is only meaningful if we assume that the underlying mediation model (including the flow of causality) was valid to begin with! So no way to "bootstrap" causality from data alone.

    Leaving that aside, there are of course plausible alternative models (Facebook usage <- well-being -> rumination) which you also cannot possibly distinguish based on cross-sectional data alone. The only thing you can do is assume that one model is true based on e.g. theoretical considerations.