41 goals in a day?! How unusual was last Saturday's football?

Author: Michael Wallace

Manchester City's Carlos Tevez
Despite his hat trick, Carlos Tevez played in
one of the day's lowest scoring games.
Image by Alfonso Jimenez via Wiki Commons.

Whilst American eyes were mainly turned towards the Super Bowl this weekend, the English Premier League produced one of the most extraordinary days of (association) football, or soccer, in domestic history. On Saturday 41 goals were scored across just 8 matches, with Newcastle's spectacular four goal comeback against Arsenal the more-than-worthy highlight.

Being both a football fan and a statistician, I couldn't resist doing a bit of data digging to see quite how remarkable the day's action was. I happen to have a dataset lying around which has the results of every Premier League match since its inception in 1992 up until the end of the 2009/2010 season, and a quick frisk revealed that the average number of goals in a match was 2.6. Every game on Saturday had more goals than this, and the day's overall average was just over 5, so we can already start to see how unusual an afternoon it was.

Data are no fun without a graph, so let's have a look at the distribution of total goals:

This gives us another insight into how unusual Saturday was. Two games saw 8 goals, and a quick look at this figure shows that such matches are pretty uncommon. On average, the Premier League sees a game with 8 or more goals once every 120 matches. We might expect to see 3 such games in an entire season, let alone 2 in just one day.

Can we go a bit further than this? How unlikely was Saturday itself? One or two freak high scoring games is one thing, but 41 goals across all 8 of that day's matches is a bit more remarkable. This is a slightly harder question to answer, but we can use a bit of statistical wizardry to get an idea for it.

To do so, we need to try and find a model that fits the data, and I'm going to try using the Poisson distribution. This is a probability distribution, meaning it gives us a formula which we can use to calculate the probability that certain events will happen. What's remarkable about it is that despite its simplicity, one formula has been shown to accurately describe all sorts of data. For instance, one classic example of its use was made famous in a book by Ladislaus Bortkiewicz, where it was used to model the number of soldiers killed by horse kicks each year in the Prussian army.

So 19th century horse kicks is one thing, what about our football data? Using the formula for the Poisson distribution we can calculate how many matches with each number of goals we would expect to see under this model. Will we find that this entirely theoretical bit of maths will translate into a good fit for our results? The following is a repeat of the above graph, but we've added bars for what the Poisson distribution would expect to see.

Observed and expected total goals

Not a bad fit, I reckon (we can do a statistical test to back this up, but I think I've bored you with enough details by now). We have slightly too few 3-goal matches, and a few too many 0-0 draws, but otherwise it's pretty good.

Moreover, now that we've shown the Poisson distribution fits our data so well, we can easily calculate the probability of all sorts of goal-related events. These probabilities do need to be treated with some caution, since there are assumptions that require more careful consideration than I've given them here, but it's good enough for an ad hoc approach.

So how crazy was the goal-fest last Saturday? Thanks to the Poisson distribution we can work it out: it's a pretty staggering 18,000 to 1. Interestingly, if you factor in Sunday's two 1-0 matches (which brings us up to a full week's worth of football), the odds are still a pretty healthy 760 to 1. This makes it a weekend you'd expect about once every 20 years - roughly the age of the Premier League itself.

As a statistician I'm obliged to have a favourite probability distribution and, as you might have guessed by now, this is it. The Poisson distribution starts from an incredibly simple formula, but it can describe data as diverse as horse kicks, football matches or even murders. It may not be a perfect fit, but it certainly helps justify the quote "all models are wrong, but some are useful".

Bookmark and Share

Comment on this article

Submit your comment
  1. Image of unique ID

Comments

gunner 4 lif3

The record may be broken this weekend, coyg

reply to this comment

Karen Ford

That's quite a lot, 41 goals each day? Wow, it's just like buying alvarez vs mayweather tickets in a near store.

reply to this comment

Karen Ford

Informative article about the boxing history in Cuba. Cuban boxer nowadays are increasingly becoming known for their amazing talents and styles inside the ring. I'm pretty sure Cuban boxing fans will buy mayweather vs alvarez tickets and watch the fight this year.

reply to this comment

Ernest

Like the content of your article!


I have been doing some research on calculating fair odds by expected goals, poisson distribution and placing value bets lately and would like to recommend the following articles (for beginners like myself):

reply to this comment

Michael

Great stuff! As a keen gambler on goal scoring markets I found this really interesting! I tend to bet on 7 goals or more and this was a great weekend! Looks like I may have to wait another 20 years for another weekend like this! Ha,ha!

reply to this comment

Dennis

Hi!Thanks for site, sorry for bad EnglishYou use only average_score in last 6 match?What about ievrece goal? not use?poison distr 1.83/0.5 average goal = 17.8% score 1:0. But your is 22.56%. Why?Best regards>Now perform Poasson distribution 2 time.>calc_poisson(1, 1.83);>calc_poisson(0, 0.5);>The result from the 2 calculations are multiple and the result is >calculated percent for result 1-0.>Iterate all mayor scores and we can calculate 1d72 probability.>On the pictures above you can see that the result 1-0 is 22.56%>to be final

reply to this comment

Michael Wallace

Quote:

Excellent analysis - if you can be bothered have there been more goals recently - ie does the analysis stand if you just used data from (say) 2001?

A quick tweak of my analysis program has shown that there doesn't seem to be much evidence of an increase in goals more recently. Taking 2001 as a cut-off, for instance, produces near-identical results to those above.

reply to this comment

Matthew

Excellent analysis - if you can be bothered have there been more goals recently - ie does the analysis stand if you just used data from (say) 2001?

reply to this comment

Skip to Main Site Navigation / Login

Site Search Form

Site Search