Friday, October 14, 2016

Batting Statistics with Bayes

One of my favorite blogs, Variance Explained, has spent a lot of time talking about a particular type of statistics known as Bayesian statistics. Bayesian statistics approaches probability differently than traditional statistics, and can take into account data from previous studies, known as "priors." In fact, there's a great book about Bayesian statistics I've been working through, known colloquially as "the puppy book" because there are dogs on the cover (including a corgi! Yes, this may have been what initially attracted me to the book.)


A couple days ago, he published a new post on Bayesian hierarchical modeling, which he explains using baseball statistics. As always, he provides all code he used to run his analyses. In his post, he tests an assumption held by many that left-handed batters outperform right-handed batters. Using historical batting data, he tests this assumption to develop his "prior" so that he can then use it to predict which of two hypothetical batters (one left-handed and one right-handed with the same success rate from 100 at-bats) should be hired:
One interesting feature is that while the ratio of righties to lefties is about 9-to-1 in the general population, in professional baseball it is only 2-to-1. Managers like to hire left-handed batters- in itself, this is some evidence of a left-handed advantage!

According to our beta-binomial regression, there is indeed a statistically significant advantage to being left-handed, with lefties hitting about 1% more often. This may seem like a small effect, but over the course of multiple games it could certainly make a difference. In contrast, there’s apparently no detectable advantage to being able to bat with both hands. (This surprised me- does anyone know a reason this might be?)
But, he can also use the historical data to look at the effect of handedness over time. As he says, "It’s absurd to expect that players in the 1880s would have the same ranges of batting averages as players today, and we should take that into account in our estimates." And he noticed some interesting historical trends:
Well, there’s certainly a trend over time, but there’s nothing linear about it: batting averages have both risen and fallen across time. If you’re interested in baseball history and not just Bayesian statistics, you may notice that this graph marks the “power struggle” between offense and defense:
  • The rise in the 1920s and 1930s marks the end of the dead-ball era, where hitting, especially home runs, became a more important part of the game
  • The batting average “cools off” as pitchers adjust their technique, especially when the range of the strike zone was increased in 1961
  • Batting average rose again in the 1970s thanks to the designated hitter rule, where pitchers (in one of the two main leagues) were no longer required to bat
  • It looks like batting averages may again be drifting downward
He also discovered that the gap between left-handed and right-handed batting averages is closing, so that today, there is almost no difference. So if you were a baseball scout in 1915, you would do best to choose the left-handed batter from the two hypothetical choices. Today, handedness doesn't give you any additional information.

For his next post, he'll be exploring multimodal distributions - distributions with more than one "peak" (most frequent value), as can be seen in this graph showing pitchers versus non-pitchers.

No comments:

Post a Comment