The United States of Football

By Andrew Mooney

In 2013, it seems safe to say that by almost any measure—viewership, ticket sales, merchandising—America considers football to be its national pastime. Gone are the days when youngsters snuck into the Polo Grounds or Fenway Park to catch the matinee performance of their favorite ballplayers; now is the era of Friday Night Lights andMadden.

Of course, only the thinnest slice of those kids who grow up dreaming of gridiron glory actually achieve it at the highest professional level; over a million U.S. teenagers play high school football, and only 1,696 roster spots exist in the NFL. Yet despite this miniscule success rate, nearly two thousand successful cases is certainly a large enough sample on which to perform some analysis. In this case, I’ll look at how that NFL talent is distributed across the country.

I examined the birth states of all active NFL players from the records kept on pro-football-reference.com and combined this with state population estimates from 2012 U.S. Census data. There were quite a few players that PFR listed as having “Unknown” birthplaces, but I assumed for the purposes of this post that the distribution of this group’s birthplaces was similar to the data I was able to gather. Armed with this dataset, I determined the most prolific football talent-producing states, both on an absolute and per capita basis.

The first state north of the Mason-Dixon line to appear on the per capita list is Nebraska, at number eight. Though the state contains less than two million people, 14 Nebraskans are on NFL rosters at present.

It’s not surprising that the South, the holy land of football, leads the way in NFL players per capita with seven of the top ten states. But the top individual states aren’t the ones you might expect. The triangle of Louisiana, Mississippi, and Alabama comprise three of the top seven states, all of which lie ahead of traditional football sanctuaries Texas and Florida.

footballmap.png

Of course, these latter two are penalized by this method of measurement for their sheer size. In absolute numbers, Texas and Florida rank second and third in number of active pros with 130 and 107, respectively. The only other state to surmount the century mark for active players is the behemoth of this list, California, which boasts 143 current NFL players. Though it has by far the largest population of any state in the country (nearly 50 percent larger than second place Texas), California still comes in around the middle of the pack (18th) in the per capita rankings, a testament to the football prowess of the Golden State.

Bringing up the rear is New England. Despite a near-dynastic pro franchise in close proximity, the Northeast produces almost no native NFL players of its own. Maine, New Hampshire, and Rhode Island have a combined two players currently in the NFL. Massachusetts, the home state of the Patriots, has a slightly less embarrassing total of seven active players, and Connecticut, a state of a little over three million people, has a respectable 11. Still, just two in a million people from all of Patriots territory have reached the highest level of the sport.

footballmap1.png

Posted in Uncategorized | 8 Comments

At Which Positions Have NFL Rookies Had the Most Success?

By Alex Koenig

The NFL draft is an exercise in patience. Unlike the NBA, it is rare for highly touted NFL rookies to make significant impacts at key positions. Sure, a player like Cam Newton will come around every now and then, but by and large draft picks are made under the reasonable assumption that the transition from the college to the pro game takes a while.

Nevertheless, talking heads and coaches alike regularly claim that players can “step in right away and make an impact.” Often these statements are made because of a players’ supernatural athletic ability or because he’s run a pro-style offense for the last five years – both pretty reasonable assumptions. But it got us thinking: is there any basis for these claims? Are there any patterns in terms of what types of players at what positions are able to step in and regularly prove effective?

It’s important to clarify how we measure quality in a rookie season. If we just look at the recipients of AP Offensive and Defensive Player of the Year awards, we get a jaded perception. Since the Offensive Rookie of the Year award was created in 1967, it’s been given out to six quarterbacks, eight wide receivers and 30 running backs. It’s possible that no offensive linemen have ever been worthy of the award but unlikely. On the defensive side, six cornerbacks have been so honored, six defensive tackles, eight defensive ends, 21 linebackers, and just two safeties.

But that’s just compared to other rookies; what about compared to the rest of the league? Another thing to look at is Pro Bowl appearances. Since these are allotted by position, and not limited to rookies, it can be assumed that – if rookie impact is indeed the same across positions – there will be a more even distribution of success.

Since 2000, 29 rookies have played in the Pro Bowl. Only six positions had multiple representatives in this time period. Running backs, which make up 68 percent of all Offensive Rookie of the Year awards, only had two – Chris Johnson in 2008 and Adrian Peterson in 2007. Meanwhile, the offensive line had four representatives, including three at the all-important position of left tackle. Perhaps the award system should be revised to reflect actual impact, but that’s a debate for a different time.

On the defensive end, linebackers had seven representatives (four on the inside, three on the inside), perhaps justifying their disproportionate number of Defensive ROY awards and solidifying themselves as the most readily impactful position on the defense. The most commonly occurring position for rookies on Pro Bowl rosters, however, is return man, with rookies notching eight of the possible 24 spots since 2000.

But these awards just represent the best the league has to offer each year. A better statistic for measuring cumulative impact – independent of season – is Approximate Value (AV), from Pro Football Reference. The methodology is explained here, but AV is a way of trying to equalize players’ contributions across positions.

First, let’s limit the sample size to first and second rounders. No matter what coaches and media members say, no one really expects an immediate impact from the later rounds.

Since the AFL-NFL merger in 1970, the rookie season that measured best in terms of AV was cornerback/returnman Patrick Peterson of the Arizona Cardinals in 2011. Peterson’s four punt return touchdowns and two interceptions gave him an AV of 21. For reference, the highest AV total Peyton Manning ever accumulated was 21 in 2004.

The distribution of the top 100 rookie seasons since 1970 in terms of AV is as follows:

Position

Number

Highest

Running Back

40

Edgerrin James (21) – 1999

Linebacker

16

Lawrence Taylor (17) – 1981

Wide Receiver

13

Randy Moss (17) – 1998

Quarterback

9

Cam Newton (19) – 2011

Cornerback

5

Patrick Peterson (25) – 2011

Safety

5

Ronnie Lott (18) – 1981

Left Tackle

5

Ryan Clady (13) – 2008

Defensive End

3

Jevon Kearse (16) – 1999

Tight End

3

Charle Young (13) – 1973

Defensive Tackle

1

Ndamukong Suh (15) – 2010

What is it about running backs and linebackers that make them so much more effective as rookies than players at other positions?

There’s probably a pretty simple answer. Both positions are very much reliant on instincts and reaction time. Unlike a quarterback, an offensive lineman, or a wide receiver, the playbook does not expand a whole lot for a running back in transitioning to the NFL. Certainly adjustments need to be made – new blocking assignments, an increased role catching passes out of the backfield, etc.  – but by and large running backs are relying on good blocking up front, an ability to quickly assess a situation and react to it, and their own physical gifts. Yes, the defense will be better prepared and conditioned than most they will face in college, but key attributes for running backs like speed and agility tend to peak early on in their careers.

Similarly, the linebacker position relies on an ability to react and hit the hole. The factors that contribute to making that hole may be more complex, but the reaction time is still the same. Whether it’s figuring out a blocking scheme or adjusting to play action, linebackers are forced to rely on their instincts more than almost any other position, and while experience will certainly fine tune those instincts, for the elite young linebacker, those instincts are already there.

Posted in Uncategorized | 8 Comments

Lakers Resurgence: New Team or Regression to the Mean?

By Julian Ryan

The Lakers have risen from the grave. Seemingly destined to hand over their lottery pick to the Suns for much of the season, the team has turned their season around and are now furiously battling it out with the Jazz for the final playoff spot in the West. (Editor’s Note: This article was written before Kobe’s Achilles injury, hence the more optimistic tone — they should still make the playoffs, but their outlook is pretty bleak once they are there without their star player.)

LA started out 17-25 – a far cry from the expected output of the superstar filled team– but have since rallied, going 25-12. Kobe Bryant has rolled back the clock with a series of stellar performances, including a season high 47 points against Portland on Wednesday, earning himself Western Conference Player of the Month in February for his efforts. The turnaround is all the more remarkable given the injuries sustained by starters Metta World Peace and Steve Nash.

Is this improvement in form entirely a result of the team of all-stars finally coming together and clicking as a unit? If the Lakers had objectively improved in the second half of the season, one would expect that their average point differential would be improving. Point differential data can help us strip out some of the luck of winning and losing close games and is thus a good indicator over time of true performance as a complement to simple win-loss percentage.

The graph below charts the Lakers’ winning percentage and point differential over the course of the season. The blue line shows changes in winning percentage by game and corresponds to the y-axis on the right, while the red line shows changes in point differential by game and corresponds to the y-axis on the left.

Screen shot 2013-04-15 at 2.26.30 AM

Looking at the Lakers’ average point differential (+/-) at each point of the season, there does not seem to be a discernible increase during their recent run. Their winning percentage has been increasing while their +/- declines. It appears that to some extent, the Lakers’ luck has turned and they are simply regressing to the mean. They were 3-7 in their first ten games decided by five points or less, and since then have been 9-2 in such fixtures. Their increasing winning percentage is more likely a return to the true quality of the team this year, its perception not distorted by close losses, than an increase in the quality of the team.

All of this does not take away from the achievements of the team. The big three of Bryant, Howard and Gasol have all improved their game and deserve credit for dragging the Lakers back into contention. However, the data would seem to suggest that regression to the mean through better luck in close games may also be at play.

Posted in NBA Basketball | Tagged | 6 Comments

Rethinking the Greatest Performances in Masters History

By Andrew Mooney

Cue Jim Nantz and the tinkling piano: the Masters is officially under way. The field of 93 will attempt to duplicate the legendary green jacket-winning performances of years past, including a “who’s who” of golfing royalty: Tiger Woods, Jack Nickalus, Arnold Palmer, Phil Mickelson, Seve Ballesteros.

Though the competitors always play the same course, Augusta National has undergone many significant changes since it hosted the first Masters tournament in 1934. As a result, it’s difficult to compare performances at the Masters across the years. Simply tabulating raw scores is not the most accurate way to do it; in different years, the field faces different temperatures, winds, and moisture, not to mention the alterations to the course itself and the equipment in players’ bags.

In a piece written in 2011, Grantland’s Bill Barnwell proposed another method for evaluating golf scores using a statistical measure called a Z-score. Barnwell’s argument was that a golfer’s actual raw score was less important than how that score compared to the rest of the field.

Take two victories by Golfer A and Golfer B, each of whom shoot 11-under-par, while the runners-up each shoot 6-under-par. They look equal, but the rest of the field’s performance matters. Let’s say the third-place finisher in Golfer A’s tournament shoots 5-under-par, but the third-placed duffer in Golfer B’s tourney shoots 1-over-par. Player B has clearly outperformed the rest of the field to a greater level than Player A, but raw margin of victory fails to capture that detail.

A Z-score measures the number of standard deviations a particular observation is from the mean of those observations. When applied to golf scores, a Z-score can tell us how different a player’s score was from the average, in addition to incorporating the range of those scores. Barnwell applied this method to the four major tournaments and recorded the top-20 performances since 1960, three of which came at the Masters, which you can read here.

As the first round kicks off today, I decided to examine the greatest performances in the history of the Masters more in-depth. I gathered the 72-hole scores for every Masters tournament, pulling the data from golfobserver.com all the way back to 1934. I then converted each year’s raw scores into Z-scores so I would have a uniform standard for comparison across years. Since a low score in golf is good, a more negative Z-score reflects a better performance. The 20 best scores are in the table below.

masters.png

Many people point to Tiger Woods’ 1997 Masters as the most impressive 72 holes Augusta has ever seen, significant not only for his otherworldly display of golf, but for the fact that it came at age 21, it was his first major win, and it was the first time a non-white player had won the tournament.

Though Woods’ score of 270 remains a tournament record, his performance was slightly less superior relative to the rest of the field than Jack Nicklaus’ 271 in 1965. Woods defeated second-place Tom Kite by 12 strokes in 1997, three more than Nicklaus bested Arnold Palmer by. But only eight other players were under par in the 1965 tournament, compared to 15 in 1997. Nicklaus’ lower Z-score suggests that his total of 17-under came in a more difficult overall course environment and thus exhibited more dominance than Tiger’s 18-under.

A couple of other takeaways from this list:

  • Woods, Nicklaus, Raymond Floyd, and Nick Faldo are the only golfers to appear on this list more than once.
  • In 2005, Chris DiMarco submitted one of the best-ever Masters performances and finished second, falling to Woods in a playoff, much to the relief of the Nike marketing department.
  • Doug Ford tied with Tommy Aaron for the worst to-par score on this list at five-under, but in Ford’s case, only two other players in the tournament finished under par.
  • In gathering this data, I was also able to uncover the worst performances in Masters history (for players that made the 36-hole cut). The most shameful 72-hole score at Augusta came in 1940, when Chick Evans posted a 43-over, shooting 82-84-86-79 for a total of 331 (Z-score: 3.627). In second-to-last place is the aforementioned Tommy Aaron’s 2000 Masters, which came 27 years after he donned the green jacket. A score of 25-over can be forgiven for a man of 63, but it still yielded a ghastly Z-score of 3.346.
Posted in Uncategorized | 2 Comments

Cinderella or Fairytale: March Madness v. the FA Cup

by Julian Ryan

In America, March is the month of the Cinderella story for even the most casual of college basketball followers.  Millions of fans will turn on their televisions in the hope of watching the next Butler make a run at winning it all.

England has its own equivalent of the Tournament: the FA Cup. Cinderella teams who progress further than they reasonably should are not referred to as such but rather as ‘fairytale teams’. While Americans will staunchly defend the utmost unpredictability of their fabled Dance, which competition is more likely to produce the David who smites Goliaths? Which is greater: the madness of the tourney, or the magic of the cup?

To be clear, I am not trying to ascertain which competition has more upsets. Since the talent distribution in professional English soccer is much more skewed towards the top teams than in college basketball, March Madness almost certainly has more underdogs winning individual matchups. However, it is still possible that there are more fairytale teams in the FA Cup as bad teams can manage to sneak their way into the latter stages. I want to ask the question of which tournament has the greater probability of a magical underdog story in any given year.

The FA Cup’s structure is what generates its crazy unpredictability. The FA Cup runs parallel to the regular season of league play and features every single professional team in the country along with a large number of non-professional teams. There is a whole mass of qualifying rounds as the bad teams knock each other out at the beginning of the season and then the nation actually starts paying attention when the “third round” is played in the first week of January. The “third round” is when the twenty premier league teams are added to the forty-four lower league teams who have made it that far. From then on a single-elimination format proceeds, and at this point the FA Cup begins to resemble the NCAA baskteball tournament.

Unlike the Big Dance, however, there is no seeding in the FA Cup. Teams are drawn out of a hat at each round and which team is home or away is also completely random. This breeds chaos, as in each round good teams knock each other out and bad teams keep each other in the competition. And so the fairytale story is born. Soccer also has far fewer scoring plays than basketball, which adds to the randomness and increases the possibility of an upset.

To compare which effect is greater – no seeding and fewer scores or greater parity and amateur players – we need to find a ranking system to compare Cindarellas from each tournament. Because the FA Cup runs parallel to league play and does not substantially interfere with performance, the best ranking system in English soccer is end of season league position. This is a fairly good determinant of the overall quality of your team over the course of the season. Such that all teams are ranked, we can take the league as one big whole rather than the reality of several divisions. So for example, in our ranking system if you finished tenth in the championship you have a ranking of thirtieth after the twenty premier league teams and the nine championship teams above you.

We now need to find something similar in college basketball. Taking regular season record as our ranking system is nonsensical as it does not account for strength of schedule at all. A reasonable analogue is RPI ranking from before the tournament. If we used end of season RPI, then Cinderella teams would have a boosted ranking having just won several games against top opposition. Taking rankings from before the tournament eliminates this bias. As our own John Ezekowitz noted, RPI really is not that good a predictor of tourney success compared to the Kenpom rankings or John’s own Survival system, but Kenpom rankings from before the tournament were not available and thus the RPI system will have to suffice.

So we now have an equivalency: a team finishing tenth in the Championship making the semi-finals is as big a Cinderella as the team ranked thirtieth in the pre-tournament RPI rankings making the final four. We are now in a position to compare the relative fairytale status of the two competitions.

Data from 2001-2012 Average Final Four Rank Average Elite Eight Rank Standard Deviation of Final Four Teams Standard Deviation of Elite Eight Teams
NCAA Tournament 9.79 11.29 9.93 11.48
FA Cup 10.30 14.11 11.71 11.81

 

From the table we can see that the average rank of teams reaching the latter stages of NCAA competition is in general better than the average of those respectively for the FA Cup. Indeed, the standard deviation is also lower for NCAA teams, so this would suggest that there are more poorly ranked teams going on runs in the FA Cup.

These numbers are just a rough guide though, let us categorize three groups of Fairytale teams and see which is more likely in each competition.

The first group is the “Regular Cinderella”: a team from outside the top 20 making it all the way to the final four. This is not to diminish the accomplishments of such team by describing them as ‘regular’ since they are anything but, and yet as a fan it is approximately the standard requirement to be described as a fairytale story. Recent examples include VCU and Butler in 2011 in the NCAA, or West Brom, Barnsley and Cardiff City in the crazy 2008 FA Cup.

The second group is the “Cinderella Cut Short”: a team from outside the top 20 making it to the elite eight. This aims to capture teams that may have gone on but came up against eventual champions and the like in this round. Examples here would be Florida in 2012 and Davidson in 2008 for instance, or Leicester’s run in 2012 and Reading’s runs in 2010 and 2011 in the FA Cup.

The final group is the “Ultimate Cinderella”: a team from outside the top 40 making it to the elite eight or further. Only Temple in 2001, Missouri in 2002 and VCU in 2011 (all the more impressive as they won to progress even further) have accomplished this feat in the past 12 years of the tournament. Meanwhile in the same period for the FA Cup, Tranmere accomplished the feat twice in 2001 and 2004 and Wycombe went to the semi-finals in 2001.

 

Probability of Cinderella Average Rank of Cinderella
NCAA Tournament FA Cup NCAA Tournament FA Cup
Regular Cinderella 12.5% 16.7% 31.2 32.1
Cinderella Cut Short 17.7% 22.9% 32.7 31.6
Ultimate Cinderella 3.1% 3.1% 50 51

 

From the data it would appear that the FA Cup is the more likely to produce fairytales. Both competitions produce Cinderellas of roughly equal rank but the FA Cup produces more of them. Only the “Ultimate Cinderella” is equally likely in both competitions.

So it would appear that seeding overshadows parity in producing wild runs for teams in comparing these two tournaments. To illustrate this let us take this year as an example. Wichita State had to beat both Gonzaga and Ohio State (and a good Pitt team) to make its final four dream a reality. Meanwhile, Millwall has beaten Preston North End, Aston Villa, Luton Town and Blackburn to make the semi-finals. Of those teams only Villa are in the Premier League, and they are currently lying in the relegation zone at 18th.

If one of your friends makes a comment about the craziness of the Big Dance in advance of tonight’s final, you might want to ask them if they’ve heard of the FA Cup.

Posted in NCAA Basketball, Soccer | Tagged , , , | Leave a comment

Links We Like: Week of April 1

The best in the week of sports (or sort of sports) analysis:

Chris Boyle gives a primer on fenwick, an advanced hockey stat, and reveals how central it is to playoff and regular season success.

Heading into the Final Four, Nate Silver gives Louisville the best chance of raising the trophy in Atlanta.

Dave Cameron at Fangraphs examines whether Cuban pitchers give the Miami Marlins an attendance boost.

At ESPN Insider, Kevin Pelton looks at how Brittney Griner compares physically to NBA and WNBA players.

Deadspin breaks down the epidemiology of the Now That’s What I Call Music! series.

 

Posted in Uncategorized | Leave a comment

Links We Like: 3/29/13

It’s been a while since our last set of links, but we’re back in force to bring you the best reads from around the web for your weekend reading.

– Jeff Sullivan analyzes Justin Verlander’s $180 million contract at FanGraphs
– What does March Madness fandom look like in a map? Michael Bailey’s got the answers.
– Chase Stuart expands on his draft analysis, examines the time value of draft picks.
– Harvard’s own Kirk Goldsberry shows LeBron’s evolution over the past few seasons.
– To get a view of what professional NBA analytics looks like, check out Zach Lowe’s great piece.
– Muthu Alagappan re-examines the 10 positions in the NBA.

Posted in March Madness, MLB Baseball, NBA Basketball, NCAA Basketball, NFL Football, Weekly Links | Tagged | 1 Comment

So Far, 2013 Among History’s Maddest Marches

By Andrew Mooney

An already unpredictable season of college basketball got a little bit wackier last weekend. After defeating Georgetown, 15-seed Florida Gulf Coast has received the majority of the Cinderella-centric media coverage, and rightly so, but let’s not forget about the other two double-digit seeds in the Sweet Sixteen: 12-seed Oregon and 13-seed La Salle.

The NCAA tournament hasn’t lacked for madness in recent years; this is the fourth consecutive year at least three double-digit seeds have survived into the second weekend. 2012 saw the 13-seed Ohio Bobcats advance to the round of 16. The year before featured a matchup between a No. 8 (Butler) and a No. 11 (VCU) in the Final Four, and Cornell nearly took out top-seeded Kentucky in 2010.

But, with the first ever 15-seed in the Sweet Sixteen this year, is it safe to say 2013 has been the craziest of the bunch? I attempted to quantify just how wild the first weekend of each tournament has been since the field expanded to 64 teams in 1985 to see how this one compares to tournaments past.

To start, I summed the seeds of all the teams that made up each year’s Sweet Sixteen, then normalized those sums into an index from 0-100, with 0 being the chalkiest possible sixteen teams (1-4 seeds in all four regions) and 100 being the “maddest” Sweet Sixteen we’ve seen so far: 1986, when the average remaining team’s seed was 5.56. The Madness ratings for each year are graphed below.

madness

By this measure, 2013 has indeed been a particularly mad year—only three other tournaments (1986, 1990, and 2000) rank ahead of it. In 1990, only one two-seed made the Sweet Sixteen, and in 2000, two one-seeds (Arizona and Stanford) bowed out early to a pair of eight-seeds (Wisconsin and North Carolina). In addition to the aforementioned double-digit seeds still alive in this year’s tournament, No. 9 Wichita State eliminated No. 1 Gonzaga and now squares off with La Salle, ensuring the presence of at least one big underdog in the Elite Eight.

In examining the graph, there doesn’t seem to be much of a chronological pattern to the Madness. The recent stretch of craziness was preceded by the most boring year in history, when, in 2009, only one team seeded higher than fifth reached the round of sixteen. And who could forget the snoozefest that was 1989, when every No. 1 and No. 2 seed survived its first two contests?

This March, however, the cause of the little guy is being abundantly supported. For another week, we can revel in the improbable and urge Dunk City or La Salle deeper into the tournament. If just for images like this, let’s hope these folks keep dancin’.

Posted in Uncategorized | 8 Comments

Blackhawks vs. Heat: Which Streak is More Unlikely?

William Marks

As of this writing, the Miami Heat have won 22 consecutive games. Meanwhile over in the NHL, from January 19th through March 6th, the Chicago Blackhawks earned at least one point in 24 consecutive games. While the Heat’s streak is about the same length, the Blackhawks’ run appears more impressive, given that at the time, they had earned a point in every game they had played this season.  (Points are earned in a win (2 points) or an overtime loss (1 point))  To determine which streak was actually more difficult to pull off, I looked at the money lines of each team’s games over the length of their respective streaks. I then converted the money line from each game into the odds of winning.  Under the assumption that each game is an independent event, I calculated the probability of the Heat winning all 22 games (according to Vegas odds) to be 0.2344174%. Then, treating each of the games in which the Blackhawks earned a point as a win, I found the probability of their streak to be 0.0000597915%. When it comes down to the odds, the streak the Blackhawks put together looks much more unlikely in hindsight, considering how infinitesimally small the probability of this points streak is compared to the Heat’s run. Given the relative randomness of hockey, the Heat would have to extend their current streak by a significant margin to even approach a streak as unlikely as the Blackhawks’.

Screen shot 2013-03-21 at 12.45.44 AM

Posted in NBA Basketball, NHL Hockey | Tagged , , | 3 Comments

Survival of the Fittest: Predicting the 2013 NCAA Tournament

The goal of every team in the NCAA tournament is to survive and advance. And, if you want to win your March Madness pool, your goal should be to predict which teams will do just that.

Most prediction systems view the NCAA tournament as an extension of the regular season. While that may be the best way to pick the most games in the tournament correctly, I do not believe it is the way to predict the most important games correctly. Correctly selecting a team to make the Championship Game can more than make up for a relatively poor first round.

That is why, building off of Ken Pomeroy’s great work, for the past two years I have been publishing a model of the NCAA tournament based on Survival Analysis. Academic researchers use Survival Analysis to determine whether new pharmaceutical drugs or treatments are effective. I co-opted the framework to try to discover something truly important: the path to bragging rights over your friends. Continue reading

Posted in NCAA Basketball | 43 Comments