Deflate-gate. Whether you care about sports or not, a lot of people have a lot to say about the recent New England Patriots scandal over deflating the balls in the Conference Championship. Personally, I follow football very little, preferring to devote my sports watching time to the amazing sport of Ultimate Frisbee. But, since rarely do Frisbee scandals make their way to the front page, I have instead taken the time to scrape some NFL data from the web and run my own analysis on whether the New England Patriots were in fact benefiting from improperly inflated footballs. Now I have to say, I'm not trying to make ANY claims as to whether the Patriots cheated the rules of the National Football League. The goal of my post is simply that of teaching data analysis and having a little fun with interesting data.
In a recent article which I will be following for some of my analysis, the Patriots plays per fumble were put under a microscope before and after the 2007 rules change which allowed away teams to supply their own footballs to games. In order to analyze this, we first have to get NFL data. Let's take data from 2000-2014 and see what trends we can find. Here I use Python to scrape data from the web (you can also download but I wanted to walk through how to scrape data and have the code available so others can validate) and all of my code is available here including the code in R which does the analysis.
A powerful method of determining changes in trends over time is what is referred to as a difference in differences estimation. In this case, it refers to taking the difference among the Patriots in plays per fumble before and after 2007 and then subtracting the difference among the rest of the league before and after 2007. Yes, that seems confusing so let's make it simple. The Pats averaged 42.96 plays per 1 fumble before the rules change in 2007. After they averaged 72.4. The rest of the league was at 42.8 before and 48.1 after. So, the difference in differences comes out to (72.4 - 42.96) - (48.1-42.8) or 24.2 which is around a 50% jump in plays per fumble! Let's view a graph of what that looks like.
For the statisticians out there, here's a print out of the associated regression:
As we can see, this result comes back as significant at a very high level (24.2, <.01). What that means is that 99.526% of the time, we will not see these results if we were to randomly pick from the league's normal distribution of fumbles per play. So, maybe the Pats ARE actually cheaters... Hold up Colts fans and just about everybody else. There is another team that has similar numbers when this style of regression is run. If I run the numbers for all of the teams in the league and only print out those with a high significance I also find that the Atlanta Falcons have suspiciously high numbers of plays before a fumble (refer to "team_after").
So maybe it's a case of multiple cheaters! We should just expel the lot of them and teach these kids to behave right?! But, let's do a little inspection into the Falcons before we rush in to judge their ridiculously outrageous statistics. While the Patriots had the same type of QB for the entire period from 2000-2014, the Falcons went from Mike Vick to Matt Ryan coincidentally in 2007. Mr. Vick happens to be a large outlier in the data. I mean, the guy had 16 fumbles in 2004. That's one per game! So, let's go ahead and re-run the numbers accounting for that. All I'm going to do is replace Vick's fumbles with Matt Ryans yearly average (4.4) and here is the new output:
Wow! The results almost completely disappear now. Once again, we are back to thinking that there is a lone shooter. But, skeptics might say that it's unfair to compare the Patriots to the entirety of the rest of the league, which is what this analysis currently does. That's a very astute analysis skeptics. Instead, let's use a technique which involves using "synthetic controls". Sounds confusing, but in reality it's just taking an average of the teams which are closest to the Pats leading up to the 2007 rules change (we will call that "synthetic Patriots") and then comparing what that team does to what the Patriots do. Below is a print out of teams and how much weight they get in the creation of the "synthetic Patriots".
So, here we can see that the Denver Broncos receive 47% of the weight and the Rams come in second at 24%. Now that we have a synthetically created team, let's see how they match up. Here is a graph of that:
Well, it looks like the Patriots are a little closer to the synthetic team. But, they still outperform them in every year except 2013.
In conclusion, maybe the Patriots were fully aware of their actions and maybe they deserve some sort of punishment. I honestly don't care either way, but the data is interesting and I hope you could follow along and perhaps glean some information on how to go about your own sports research.

