By David Roher
“Happy families are all alike; every unhappy family is unhappy in its own way.”
So begins Leo Tolstoy’s 1878 masterpiece, Anna Karenina, an engrossing novel about late 19th century statistical analysis in baseball. Or about Russian aristocratic society. I’m not sure; I read it pretty quickly.
I only had to pay attention for the first sentence, since it’s only the above quote that’s pertinent. Its meaning is essentially that in order for a family to be happy, a lot of different things need to work out, while only one of those things needs to fail in order for a family to be unhappy. Scholars and authors like Jared Diamond have extended this idea beyond families: the “Anna Karenina principle” is that any one factor can cause the failure of its system. An important corollary is that if the system does indeed work, then it inherently possesses all of those factors.
After the jump, I’ll try to use this principle to answer a perplexing question: why do Major League hitters who strike out more perform better?
One of the main points espoused by sabermetricians in the early days was that high strikeout hitters could still be very valuable, and that the ability to avoid striking out was highly overrated. But it is nevertheless bizarre to me that, every single year, strikeout rate never inversely correlates with offensive production.
Maybe this isn’t as weird to you as it is to me. Think about it this way: strikeout rate is simply a subset of overall out rate, which is another way to express on-base percentage (it’s 1 minus OBP). OBP is extremely highly correlated to production: in 2009, 82% of the difference among qualified players in wOBA, a great measure of overall hitting production, could be explained by OBP. Furthermore, strikeouts are integral in evaluating pitching production: 46% of the change among qualified pitchers in FIP, a great measure of overall pitching skill, can be explained by strikeout rate.
So given all that, wouldn’t you expect strikeout rate to be at least a little predictive for hitters? Well, it is – in the wrong direction. In 2009, strikeout rate among qualified hitters explained 5% of the change in wOBA – a small but statistically significant figure (P = .001), and the coefficient was positive. In other words, the more a qualified hitter struck out in 2009, the better he probably was.
I’ve thought for a while about how this could be. When I learned about the Anna Karenina principle last semester, it hit me. Here’s how that quote applies to baseball and athletics in general: I believe that in order for an athlete to reach an elite level, there are some skills and attributes that he or she must possess. If they don’t possess a certain trait, then they can’t make it. I see this on a daily basis on our rowing team: different rowers have different strengths and weaknesses, but because of the selection bias it takes to assemble an elite team, they all have a lot in common: every rower in our first varsity lightweight boat last year was between 5’10″ and 6’2″, for instance. That’s an example of the Anna Karenina principle at work: a rower could have the technique, strength, commitment, bizarre willingness to wake up early in the morning as a college student, etc. necessary to row well, but if he were my height, he’d have too much trouble getting the necessary length through the water on each stroke, and couldn’t compete on an elite level. And if he were too tall, he would not be able to make the weight limit with a body type conducive to rowing well. (Great lightweight rowers can be out of that specific height range, but likely not by more than a couple inches on either end.)
Even in baseball, where there are different positions and different approaches to hitting that enable a wider possible skill set than that of crew, this is still the case. Case in point: every single qualified pitcher in 2009, no matter what type of pitcher he was, threw a fastball that averaged at least 84.7 miles per hour with some semblance of control. All but 5 pitchers averaged over 88 miles per hour. Think about what percent of pitchers in the world can do that – it’s far from 100%.
Here’s my hypothesis for strikeout rate: in order for a hitter to be good enough to get at least 3.1 plate appearances per game in Major League Baseball, he has to possess some skill that prevents him from being unduly penalized by strikeouts. The way to test this hypothesis is to look at the relationship between production and strikeout rate in minor league ball. If we start to see a change from a positive correlation to no correlation to, finally, negative correlation as we move down the ranks, there’s strong evidence that the Karenina principle is at work: hitters who are too vulnerable to the strikeout will get weeded out and thus won’t move up any further than that level (or, they’ll acquire the skill).
R^2 is a measure of how well strikeout rate predicts wOBA. P-value is a measure of significance; if it is below .05, then the results are statistically significant. Coeff. (+/-) is the nature of the relationship: if more strikeouts means more production, then it’s positive. If it means less, then it’s negative.
As you can see, strikeouts are positively correlated with offensive production in the Majors and AAA ball. In the middle three levels, there is no significant relationship. But in Short-Season A ball and Rookie ball, there is a negative correlation.
My findings support the existence of the Anna Karenina principle in this case. Players who strike out more are less likely to succeed in the lowest levels, and there is a smooth transition from the Majors to Rookie ball. If I were to examine college and high school ball, I think we’d see the R^2 values continue to increase.
I think it’s a compelling explanation for why players who strike out a lot are still able to succeed. It might also lend a view into how hitting and pitching change as players move up the ladder: it looks like the “weeding out” is mostly happening between A- and AA ball. I’d be curious to see if we could find the same thing through scouting.
Can anyone else think of a case in sports where the Anna Karenina principle (or just simple selection/survivor bias) explains something similar?
(All data from (where else?) Fangraphs.)