I recently finished reading Inference and Disputed Authorship: The Federalist by Frederick Mosteller and David L. Wallace. This book, which details the study Mosteller and Wallace did to determine (statistically) who authored the disputed Federalist papers, was highly recommended as a good primer on not only authorship studies but Bayesian inference.
As is often the case, this book frequently used a term I see in many statistical texts: likelihood. I breezed over this word multiple times (as I usually do) before I finally stopped and really considered it. (Which I do now, usually before saying, "Ooo, would this make a good blog post?")
Likelihood is a term used often in statistics, but I realized I wasn't completely clear on this concept, or rather, how it differed from related concepts, like odds or probability. If it meant the same thing as, say, probability, why use this other term? And why use it in seemingly specific ways?
It turns out that these terms are, in fact, different from each other, but they reflect related concepts. As with many statistics concepts (like degrees of freedom - see posts here and here), there are simple ways to describe these concepts, and more difficult ways.
First, the simple way. Probability deals with the chance that something will happen. That is, it is generated beforehand. When I flip a coin, I know the probability of heads is 50%. And I can use what I know about probability to determine chance of a certain string of events. Likelihood deals with something that already happened, and gets at the inference for the thing that happened. So if I flip a coin 20 times and get heads each time, you might want to discuss the likelihood that I'm using a fair coin. (Maybe I'm using a double-headed coin, for instance.)
Now, the more complex way. Likelihood is specifically related to our use of data to derive underlying truths. Remember that when we conduct statistical analyses, we're often using sample data to estimate underlying population values (parameters). We may never know what those parameters actually are, because we can't measure everyone in the population, or because of measurement error, or any number of explanations. We can only estimate those values.
We know that sample data can be biased in a number of ways, and we have different corrections to help us turn those sample values (statistics) into estimated population values (parameters). We want to make sure that we estimate population values that make sense with our sample data. We're never going to get sample data that exactly matches population values, so there will be margins of error, but we want our sample data to have a high chance of occurring given our estimated population value is correct. This concept - the chance of observing our sample data given the estimated population value is correct - is likelihood.
In the coming weeks, check back for a post discussing an application of this concept of likelihood: maximum likelihood estimation!