Big data in statistics came of age this week. The Shaw Prize is known as the Nobel Prize of Asia. Almost as prestigious, it was founded by Hong Kong film mogul and philanthropist Run Run Shaw in 2003 - Shaw is currently 105 years old and retired last year - is open to everyone, and includes a $1 million award for breakthroughs in science and mathematics. This year one of the three prizes has gone to a statistician: David Donoho, a professor of statistics at Stanford University.
Donoho’s prize is recognition of his work to get a more detailed analysis out of large numerical data sets. Specifically, the citation noted his "profound contributions to modern mathematical statistics and in particular the development of optimal algorithms for statistical estimation in the presence of noise and of efficient techniques for sparse representation and recovery in large data sets."
What that actually means is that he has pioneered ways of extracting information from the huge amounts of data that are generated by all the sensors, image-takers, tweets and commercial transactions of the digital age. Statisticians used to deal with a few hundred, perhaps a few thousand, numbers at a time; now digital data routinely runs into billions of items; finding patterns in them is a whole new enterprise - needles in haystacks are trivial in comparison - but it is an enterprise with huge practical rewards. Whole cities can be run more efficiently - see Significance of August last year, the Big Data special issue, for more. Just for one example, traffic lights can be coordinated with weather forecasts, traffic forecasts, images from traffic cameras, tweets telling how many people are intending to go to a football match or the beach, ticket sales information telling how many are intending to go to a pop concert, and so on – and thus ease the jams. Statisticians used to have to extract as much information as they could from limited data. Now the problem is to find the information that is lurking in incredible quantities of the stuff. Which is what ‘extracting sparse information from large data sets’ means. This is the statistics of the future. And this is where Donoho has been a pioneer.
- Tony Chan, president of the Hong Kong University of Science and Technology
Imagine if you could go into an MRI machine, which usually takes about 30 minutes, and be there for a tenth of that time, but get the same result. That's what his algorithms help do.
“He is one of the most well-known and influential mathematical statisticians alive today," said Tony Chan, president of the Hong Kong University of Science and Technology. A prime objective of Dr. Donoho's research is to apply mathematical and statistical tools to solve real-life problems. One area where his work has had huge beneficial effects is in medical imaging. "Imagine if you could go into an MRI machine, which usually takes about 30 minutes, and be there for a tenth of that time, but get the same result. That's what his algorithms help do." Chan described the process as akin to taking a low-resolution image on a digital camera – which itself contains millions of bits of information - and being able to reconstruct all the extra missing information to create a higher-resolution image.
Similarly, modern global communication often involves voice signals having to go through several networks as they are transmitted; they acquire interference as they go. Donoho uses statistics to develop algorithms that recover or reconstruct the original signal as much as is possible.
The internet age is turning into the age of Big Data. Learning to handle it will be one of the key areas of progress, and it will need a whole new kind of statistics. David Donoho is one of the pioneers.