By David Roher
Pitchers and catchers! I don’t need much of an excuse to start talking about baseball, but I probably need one to assume that people might listen. So I’m taking the opportunity presented on Wednesday by baseball’s first day of the year to unveil some research.
Let me pose a question: Team A and Team B are of equal strength and play a game against each other. Team A is very offensively oriented, while Team B is all about pitching and defense. You know nothing about the result of the game, except for the fact that it was very high-scoring. Does this make Team A more likely to win? Team B? Neither?
This comes up a lot when broadcasters and writers try to frame a game as a “pitchers’ duel” or a “slugfest.” When a team wins 2-1, everyone says that they outpitched the other team. When they lose 10-9, they were outhit. I’ve always wondered if this is just a convenient interpretation. Is hitting prowess really more important in a high-scoring game? Do teams perform better when they’re in their own run-scoring environment, like it’s some kind of home-field advantage?
My first problem was determining what constituted a “run environment.” The obvious answer is the total number of runs scored in each game. This is the definition of run environment when sabermetricians determine how many runs are worth one win. For instance, in 2009, 9.22 runs was worth about a win, because that was the average number of runs scored in a game.
However, I wasn’t sure that this definition captured what I wanted. I think most fans would call a 10-0 game high-scoring, while they’d call a 6-4 game about average. My solution was to use the denominator of Pythagorean Expectation: (Runs Scored^1.85 + Runs Allowed^1.85)^1/1.85, with a small adjustment to scale the result to real-life totals. This calculation considers more lopsided games to be more high-scoring.
With that out of the way, I went over to Retrosheet and obtained a log of every game in Major League history, all the way back to 1871. Because the question deals with run environment in the scope of individual games, there’s no real need to adjust for time period: I was able to use 139 years of baseball history.
For each game, I calculated the run environment according to the definition above. I then found each team’s average run environment. These three stats are the components of the “Comfort Zone:” a measure of the degree to which one team’s average run environment is closer to the game-specific run environment than the other team’s average. The formula is as follows:
Z is the Comfort Zone Rating, and RE stands for run environment. A Comfort Zone rating greater than 1 indicates that the team is playing in a more familiar run environment than their opponents, and a Z less than one indicates a less familiar RE. When Z is exactly 1, the two teams are equally familiar.
With this rating calculated for about 195,000 games, I started to determine if it had any effect. I started simple: a raw count of how many times the team with the higher Comfort Zone won the game.
Team with higher Comfort Zone wins: 98152/194919, 50.4%
According to these results, there is possibly something small to the Comfort Zone: teams with the more familiar run environment usually win. I’ll address the statistical significance of it later. But there might be a confounding variable here: home-field advantage. A team’s park influences both the game-specific run environment and the average run environment of the home team. Thus, a team is more likely to play in a familiar environment at home. While this may well be a key part of home-field advantage, we can’t know for sure from this alone.
One way to correct for this is to further split the calculation into home/away situations.
Home Team wins: 106580/194919, 54.7%
Away Team wins: 88339/194919, 45.3%
Home Team has Comfort Zone advantage, wins: 55074/100335, 54.9%
Away Team has Comfort Zone advantage, wins: 43078/94584, 45.6%
There’s still something: home and away teams appear to be more or less equally boosted. But there’s one more thing we need to correct for: the teams’ overall quality.
Team with better record wins: 115128/194919, 59.1%
Team with worse record wins: 79791/195919, 40.9%
Better record has Comfort Zone advantage, wins: 58422/99486, 58.7%
Worse record has Comfort Zone advantage, wins: 38727/95433, 40.6%
And that’s the problem: when controlling for overall record, the effect is actually reversed. Comfort Zone is not useful as a predictor in a logistic regression model when controlling for home-field advantage and the teams’ records. I can’t say that a familiarity with the run environment has any effect whatsoever on the outcome of the game.
…Yet. In the next installment, I’ll make an adjustment that might help find something meaningful here. For now, though, all run environments appear to be the same.