By David Roher
Update: An astute commenter below pointed out a key mathematical error that I made. The corrected values on the table are in italics.
It seems that every game in a short series gets classified as either a “must-win” game or a “swing” game. The latter usually comes whenever a series is tied, be it 1-1, 2-2, or even 0-0 while the former comes whenever one team must win to still have a good shot at winning the series. (The deciding game 5 or 7, of course, fits both of these descriptions and thus sends sportswriters around the country into fits of joy.)
Tonight’s upcoming Game 3 of the World Series features the Yankees and Phillies tied 1 game apiece, making it a “swing” game. There’s no doubt that this game will be of much importance in deciding the eventual winner, but how much more so than any other game? It’s the World Series. Every game is important, and eventually we just run out of different ways to say how important each one is.
That’s not to say it couldn’t be more important, though. Game 3 might indeed be the pivot point of the series. It could also be that the concept is overblown. I’ll try to figure out the truth mathematically, after the jump.
This is the way I’m going to frame the definition of the swing game: there is a a series that has already ended between two teams. You know nothing at all about this series except its maximum length (i.e. best of 7) and that the two teams playing each other each had a 50% chance of winning any given game in the series. Some guy in an inflatable sumo wrestler costume (it’s Halloween, after all) comes up to you and makes you guess the winner of the series. He’ll give you one additional piece of information: you give him a game number, and he’ll tell you who won it (or that no one did, if it wasn’t played). What game result should you ask for to maximize your chances of picking the right team?
From there, the easiest way to think of the importance of a single game is by finding the average probability that the winning team from that game has of winning the entire series. The problem with this definition is that the answer is obvious – it has to be Game 7, since the winner of that game is guaranteed to win it all. That’s not necessarily what we’re looking for here: if the series gets to the final game, then obviously that game will be the most important. But words like “swing” and “pivotal” imply that we’re more interested in the path that the series takes. In the hypothetical example above, if we guess a game that might not have happened, we run the risk of not getting any information out of it.
A way to fix this is by multiplying that increase in probability by the probability that the game will occur in the first place. Then we’ll add half the probability that the game does NOT occur to model the fact that we still have a 50% chance of guessing correctly with no additional information. We can figure out the probability of a series lasting at least x games pretty easily though binomial distributions. Here are those values for a 7-game series with equal teams:
At least 1-4 games: 100%
At least 5 games: 87.5% (chance that a team has won either 1, 2, or 3 games out of 4)
At least 6 games: 62.5% (team has won either 2 or 3 games through 5)
7 games: 31.25% (team has won exactly 3 games out of 6)
We can use these flat values for games 1 and 7, since the series is guaranteed to be tied going into those games. Games 2 and 6 present a problem: in game 6, for instance, the winning team can either have been up or down 3-2. If they were up, their chance would be 100% after the game. If they were down, it would only be 50%. This is easily correctable, though – just take the average. Games 3, 4, and 5 present this problem and one more – there are 2 configurations each of the series situation. Game 3 can either be 1-1 or 2-0, for example. We take another average here, but it will be weighted for the probability of each situation within the game.
Now that we have those numbers, we can figure out how important a single game is. I have the complete results split by situation below, but here’s the main result:
Games 1, 2, 3, 4, 5, 6, and 7: 65.625% chance that we’ll get the winner of the series by basing our answer on the outcome.
Games 1-4 being equal make some sense. But the idea that all 7 games come out to be the same is really weird when considering the disparate math. For example, in Game 4, we get to the result by finding the weighted average of the series win probability for a team winning after 3-0 (25%), 2-1 (25%), 1-2 (25%), 0-3 (25%). Game 7 is modeled by taking the probability that it occurs and adding half the probability that it does not occur…and each result is exactly .65625.
Even that would make sense if they were all the same, I suppose. But I can’t figure out for the life of me why Game 5 would be different. (Update: It’s not, I just messed up the math before. See Matt Agard’s comments for details). I’m interested in trying this for a best-of-5 series, trying to generalize for all lengths, and also answering the “swing” question in a different way (including more knowledge of what’s going on the series, for example).
|Situation||Winner’s P||P of Game||P of Situation||P Correct