By Chris Bruce and John Ezekowitz
The poll ranking system in college football is essentially a popularity contest – sportswriters and coaches vote by ranking the teams, and the results are compiled into the AP and Coaches Poll respectively. You would expect these rankings to be fairly unpredictable – they are subject to the perceptions and vicissitudes of human voters that are invested in the games and these perceptions can change wildly from week to week depending on which way media coverage has been leaning or who is perceived to have “momentum”. Despite this, we are still able to predict a great amount of the movement in rankings just based on the numerical results of the games.
To look at the predictability of the polls, we collected all NCAA FBS game data and AP poll results from 2003 through the current week – a total of over 8,000 games and 3,000 rankings. We then ran a regression on a variety of factors that could potentially affect the polling outcome, using only games of teams that were ranked in that week or the following week (excluding games of unranked teams). We then iterated to determine which variables added value towards predicting the next weeks polling points. In addition to a set of dummies for the week of the season, a list of variables that were ultimately used in the regression can be seen below:
The above variables all added significant predictive value to the regression, with P(|t|) values of < 0.05. The impact column in the above table describes the impact of a positive movement in that variable on the predicted points in the next week’s poll. For example, a win or playing a ranked team will increase expected points in the poll while being unranked the previous week decreases expected points. Some interesting and not necessarily intuitive things come out of the regression. All else equal, being ranked higher in the poll (lower number rank, higher number of poll points) means you are more likely to drop in points the next week – basically you have less room to move up with a win and plenty of room to move down with a loss. A bye also has a significant effect on predicted points – a team with a bye will rarely get skipped over in the rankings, but will often move up when higher ranked teams lose. Additionally, whether you are in the top 5 or playing a team in the top 5 you are more likely to increase your poll points – if you’re in the top 5 you are just more likely to win, and if you are playing a top 5 team you won’t get penalized as much for a loss.
As for the results of the regression, it is able to predict poll points with an R2 value of 55%. Additionally, in out-of-sample testing the top 10 positions – those that are most important for BCS bowl positions at the end of the year – were predicted more accurately than those at the bottom of the poll. This makes sense given the greater amount of parity at the lower levels and the greater amount of noise with teams moving in and out of the poll more often there. It is also interesting to note that one of the most difficult things in creating this regression was accounting for the penalty that voters give to a team that loses. When trying to predict poll results we see that voters will give teams a modest increase for wins, but a disproportionate penalty for losing, regardless of the opponent or position in the poll. This manifests itself in the model not penalizing teams that lose at the top enough. We will see how far Oklahoma and Wisconsin fall this week.
So without further ado, this is what we predict the AP Poll to look like when they come out later today:
Update: Here is the actual poll. Our predictions did well, accurately predicting the exact ranking of 11 teams. That is not counting minor mis-orderings, like the Wisconsin-Kansas State-Oklahoma triad. We struggled a bit at the bottom of the poll, as we suspected.
Finally, the correlation between the predicted points for each team and the actual points was an amazing 0.985.