Whilst American eyes were mainly turned towards the Super Bowl this weekend, the English Premier League produced one of the most extraordinary days of (association) football, or soccer, in domestic history. On Saturday 41 goals were scored across just 8 matches, with Newcastle's spectacular four goal comeback against Arsenal the more-than-worthy highlight.
Being both a football fan and a statistician, I couldn't resist doing a bit of data digging to see quite how remarkable the day's action was. I happen to have a dataset lying around which has the results of every Premier League match since its inception in 1992 up until the end of the 2009/2010 season, and a quick frisk revealed that the average number of goals in a match was 2.6. Every game on Saturday had more goals than this, and the day's overall average was just over 5, so we can already start to see how unusual an afternoon it was.
Data are no fun without a graph, so let's have a look at the distribution of total goals:
This gives us another insight into how unusual Saturday was. Two games saw 8 goals, and a quick look at this figure shows that such matches are pretty uncommon. On average, the Premier League sees a game with 8 or more goals once every 120 matches. We might expect to see 3 such games in an entire season, let alone 2 in just one day.
Can we go a bit further than this? How unlikely was Saturday itself? One or two freak high scoring games is one thing, but 41 goals across all 8 of that day's matches is a bit more remarkable. This is a slightly harder question to answer, but we can use a bit of statistical wizardry to get an idea for it.
To do so, we need to try and find a model that fits the data, and I'm going to try using the Poisson distribution. This is a probability distribution, meaning it gives us a formula which we can use to calculate the probability that certain events will happen. What's remarkable about it is that despite its simplicity, one formula has been shown to accurately describe all sorts of data. For instance, one classic example of its use was made famous in a book by Ladislaus Bortkiewicz, where it was used to model the number of soldiers killed by horse kicks each year in the Prussian army.
So 19th century horse kicks is one thing, what about our football data? Using the formula for the Poisson distribution we can calculate how many matches with each number of goals we would expect to see under this model. Will we find that this entirely theoretical bit of maths will translate into a good fit for our results? The following is a repeat of the above graph, but we've added bars for what the Poisson distribution would expect to see.
Not a bad fit, I reckon (we can do a statistical test to back this up, but I think I've bored you with enough details by now). We have slightly too few 3-goal matches, and a few too many 0-0 draws, but otherwise it's pretty good.
Moreover, now that we've shown the Poisson distribution fits our data so well, we can easily calculate the probability of all sorts of goal-related events. These probabilities do need to be treated with some caution, since there are assumptions that require more careful consideration than I've given them here, but it's good enough for an ad hoc approach.
So how crazy was the goal-fest last Saturday? Thanks to the Poisson distribution we can work it out: it's a pretty staggering 18,000 to 1. Interestingly, if you factor in Sunday's two 1-0 matches (which brings us up to a full week's worth of football), the odds are still a pretty healthy 760 to 1. This makes it a weekend you'd expect about once every 20 years - roughly the age of the Premier League itself.
As a statistician I'm obliged to have a favourite probability distribution and, as you might have guessed by now, this is it. The Poisson distribution starts from an incredibly simple formula, but it can describe data as diverse as horse kicks, football matches or even murders. It may not be a perfect fit, but it certainly helps justify the quote "all models are wrong, but some are useful".