Friday, September 1, 2017

Great Minds in Statistics: Jerzy Neyman's Confidence Intervals

Wednesday, August 30th was the 80-year anniversary of the publication of Jerzy Neyman's article, Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability. The classical theory refers, in this case, to Neyman's work with E. Pearson on null hypothesis significance testing and the concepts of Type I and Type II error. But this paper was groundbreaking, not just in connecting to these concepts, but in telling how we, as statisticians and scientists, should be dealing with uncertainty in the presentation of our results.

Jerzy Neyman, photographed while he was at UC-Berkley.
By Konrad Jacobs, Erlangen, Copyright is MFO - Mathematisches Forschungsinstitut Oberwolfach,, CC BY-SA 2.0 de, Link
All of this starts with the basic assumption that we are trying to estimate a population parameter, which is unknown. In theoretical work, such as Neyman's paper, this value is often represented as θ (theta). We use a sample to attempt to estimate theta; we can call that estimate T. If we're estimating a mean, we have one measure of precision already included in our analysis - our standard deviation can be thought as a measure of precision of the estimate, in that it expresses the typical variation we see in scores. And in fact, as Neyman notes in his article, prior to his introduction of confidence intervals, people would often present estimates as + or - standard deviation. But, Neyman states, it probably makes more sense conceptually to use a multiple of standard deviation, if you want to express an interval with a high likelihood of containing the actual population value.

Why? Because the standard deviation tells us the typical spread of scores (the individual units that make up the mean) but that doesn't tell us the typical (expected) spread of means. Confidence intervals allow you to do that, not just for means but for a variety of aggregate statistics.

But I'm perhaps getting ahead of myself. I dug into Neyman's paper to try to summarize it for you. I've only read Neyman's work summarized in the past. In Fisher, Neyman, and the Creation of Classical Statistics, I've read some of his early correspondence with E. Pearson, when Neyman was still learning English. So I wasn't completely sure what to expect when I read his confidence interval article from 1937. I highly recommend reading it, as Neyman excellently summarizes complicated mathematical concepts in plain language. His work is highly approachable and he uses lots of examples to help drive home his points.

Neyman argued that, unless we have access to the full population we are studying, and are capable of measuring each individual in that population, there will be probabilities associated with our work; both the estimation process and the estimate itself should be expressed in probability. In fact, his classical approach to statistics includes probability in the estimation process, through the use of significance testing. He acknowledges that there are many different approaches to estimating values, and that while some are more right than others, none are likely to get you the exact population value, θ. They will all be estimates within a certain margin of error. Confidence intervals communicate that margin of error.

That is, he essentially says there is disagreement on the process of estimating population values from samples, and disagreement on the use of different estimation techniques (such as maximum likelihood, developed by R.A. Fisher). Though some approaches may be superior - and some of Neyman's footnotes feel very directed at Fisher - we are still trying to estimate an unknown parameter, so there is really no way to prove one is superior. But we can perhaps identify an interval surrounding the true population value.

That interval - the confidence interval - will be based on probability - the confidence coefficient - which is greater than 0 and less 1. The usual convention is 0.95 (95%), though he uses a variety of confidence coefficients in his paper and doesn't really settle on one as the gold standard. The 95% convention came later, probably because of it's connection to the probability we use in significance testing, where the convention for alpha is 0.05.

Without knowing the precise shape of the distribution of population values, we would instead use values with "intuitive" (his word) appeal, such as, for instance, the normal distribution (either the standard normal distribution or the t-distribution). He offers a variety of equations for different scenarios, but this one - the one that works with the normal distribution, which came at the end of the paper - is probably the approach most statistics students are familiar with. That is, we can use the values associated with different proportions of the curve around the mean to generate our confidence interval. We use these values from the z or t distribution (Neyman recommends t) as the multiple for the standard deviation. The exact procedure for confidence intervals varies depending on what type of estimate you're working with. For instance, some confidence intervals use standard error instead. But the basic procedure of choosing a probability and combining it with results of your analysis and values from a known distribution remains.

As the size of the sample used to estimate the population value increases, the bias (difference between the estimate and actual value) reduces toward 0. So estimates based on larger samples are more likely to be close to the true population value, and confidence intervals generated from that estimate will be narrower while still being likely to contain the actual value.

How likely? We don't actually know. As Neyman points out continuously in his paper, the probability that a range actually contains the population value is 0 or 1. There is no in between when it comes to real probability; it's either there or it isn't. But when we generate a confidence interval around our estimate, we don't know if it truly contains the actual value or not. So we draw upon the law of large numbers, that over time, with repeated estimates of the population value, we'll have the real population value in our ranges a certain proportion of the time, with that proportion equal to the confidence coefficient we choose.

Say we always use 95% confidence intervals in a certain area of study. (That's the convention, anyway.) With repeated research (conducted in an unbiased way), we'll have the real population value in our range much of the time, with the actual percentage of the time approaching 95% as the number of studies approaches infinity. As has been shown repeatedly, chance is lumpy, and a 95% change of something doesn't mean it will happen exactly 95% of the time, just like you won't have a perfect 50% heads in your coin flips.

A glance at Neyman's reference section shows many of the greats of statistics: Fisher, Hotelling, Kolmogorov, Lévy, Markoff... Many names you'll hear again and again in these GMIS posts.

No comments:

Post a Comment