The BBC reported last week that "the number of people out of work fell by 50,000 to 2.53 million in the three months to August”. That sounded like a small change in an estimate from a sample survey, so wondering about the statistical significance of the finding I looked it up on the Office for National Statistics' (ONS) website. Indeed, it says "there were 2.53 million unemployed people, down 50,000 from March to May 2012 and from a year earlier."
The headline did not contain any information about the sampling error associated with deriving this estimate from the Labour Force Survey (LFS), but in one of the underlying tables I found that the quarterly change was -50,000 +/-89,000. To my eyes that looks like a statistically insignificant change, from which I would infer that we can't be sure whether unemployment has gone up, down, or stayed the same. This wasn't the story reported by ONS however, and there was no nuance to the debate at Prime Minister's Questions where government and opposition alike quoted the 50,000 fall with absolute certainty.
Unemployment rate (aged 16+), seasonally adjusted. Source: Labour Force Survey - ONS.
Unemployment rates for the United Kingdom and the European Union, seasonally adjusted. Source: Labour Force Survey - ONS, Eurostat.
It turns out that the unemployment time series is a very noisy one, because the quarterly change is rarely larger than the sampling error. This important fact didn't become clear to me until I engaged in an email conversation with the ONS labour market team, so I'm sure it wasn't obvious to our politicians either. It would be useful for such an important consideration to be communicated in the headlines of such a report. It might make the headline more difficult to communicate but would caution users against becoming over-excited by small changes. It would also make an important contribution to raising the levels of statistical literacy in public debate.
It is a favourite pastime of statisticians and the wider 'numbers community' to complain about statistical gaffes by politicians and the media. The BBC radio programme 'More or Less' is full of entertaining examples. But producers of statistics have a crucial role in improving the understanding of statistics, a role which is sometimes neglected. For various reasons - wanting to provide clear answers, or believing statistical ideas to be too complex to communicate - basic concepts such as the existence, source and size of sampling error are sometimes glossed over. Statisticians should be confident of their ground and be willing to communicate these ideas.
So why are ONS reporting a fall in unemployment which is statistically insignificant? Well, it turns out that my initial inference - we don't know what has happened to unemployment - was wrong. Probably. Here's my stab at explaining the ONS approach. The point estimate for the change in unemployment is -50,000, with a 95% confidence interval of -139,000 to 39,000. Because more of the confidence interval is below zero than above, we can infer that unemployment is more likely to have fallen than risen (you can check the official version of this explanation by following this link http://www.statistics.gov.uk/hub/labour-market/people-in-work/employment/index.html - click 'Technical Data').
I'm not entirely convinced of this, which evidently reflects a gap in my understanding. In any case, the way that ONS are interpreting statistically insignificant changes in an important time series wasn't clear to me from their report. I reckon there are hundreds of users of LFS and other statistics wondering what to do with statistically insignificant changes in their time series. I've always ignored such changes, and maybe I've been wrong to do so. Either way, by communicating their methods ONS have an opportunity to improve statistical understanding and the interpretation of official statistics. An opportunity which I hope they will take advantage of.