The New Year is three days away. As we near the December 31st newspapers and magazines are full of reminders of events of the year that is passing. Statistics also contributed to the news, and to breakthroughs in science, in politics, and in human progress. Today, and every day this week, we bring you, two each day, and in no particular order, 10 of the most noteworthy stories of the year and the statistics that made them possible. Today, a new understanding of life in the oceans - and how computers can recognise what we are saying.
5. The Census of Marine Life. The first census of the world's seas came to an end in 2010. When the idea for the census was first being formulated, it had been 160 years since Edward Forbes, the pioneer of deepsea research, and later Professor of Natural History at Edinburgh, made his first studies of marine biology using dredges and nets during his voyage on HMS Beacon in the Mediterranean in 1841. There had been little progress in understanding marine life biodiversity since. Less than 0.1% of the world's oceans had been sampled; 90% of the global biosphere was essentially unexplored. In launching a global collaboration of over 2,700 scientists and 80 countries ten years ago the Marine Census hoped to make up for lost time. Simply coordinating the international research activity and data collection of more than 538 field explorations was a laudable feat. Facilitated by National and Regional Implementation Committees (NRIC) the international ensemble of investigative findings were synthesized and preliminary conclusions were summarized in a special summer issue of PloS One (and in a cover article in Significance magazine, December 2010 - click on 'related articles', above right).
Included in the census reports were estimates of the number of known, unknown and unknowable species for each taxa - fish, molluscs, cetaceans and the rest - and for each major world region. These statistics were based on a collocation of multiple data sources, including established registries, published studies and expert surveys. A main conclusion, as might have been expected from the magnitude of the sampling problem being undertaken, was that even after 10 years of new research efforts and the discovery of 1200 species, most species remain unknown. In regions of the greatest species richness, Australia, Japan, Mediterranean deep-sea, New Zealand, and South Africa, an estimated 25%–80% of species remained to be described. The researchers estimate that there may be 1 to 1.4 million marine species living on Earth and that 70-80% of these are still undiscovered. The work continues. Looking to the 2020 census of the seas, statisticians will have much work to do in accurately characterizing the uncertainties of the estimates made from the sparse and disparate data of underwater life.
6. Death of Frederick Jelinek. For smartphone users who are not familiar with the work of Frederick Jelinek, open your speech-recognition Internet browser and say: FRED-ur-rik JEL-eh-nek. Some of the first results you should find are Jelinek's faculty web site at Johns Hopkins, a dedicated Wiki page describing the life and work of this pioneering computer scientist and statistician, and announcements in the NY Times of his death on September 14 of this year. The performance of your voiced search is in large part owing to Jelinek, who after escaping Nazi rule in Czechoslovakia settled with the surviving members of his family in New York, received three advanced degrees in electrical engineering at MIT, spent 21 years at IBM's Watson Research Center and another 20 years at Johns Hopkins University researching computer speech-recognition technology.
The traditional approach had been to use linguistic models to essentially make electronic devices think like a human. Jelinek was a revolutionary in the field of voice recognition because he realized that statistical models could be used to teach machines how to understand human speech. Jelinek recognized that - although still a challenging problem - an easier solution would be to take large data bases of real human speech and use probabilistic models based on patterns from thousands of these examples to extract meaning from sound. In the actual development of these ideas, Jelinek combined his expertise in computer science with Bayesian statistics; he used Markov chain processes with something called the Viterbi algorithm, which essentially traces out the most likely path that leads to a sequence of observed events - in this case the most likely series of words that leads to the sequence of sounds heard by the system. In this way he built the foundation of a voice recognition system – a foundation based on statistics rather than any man-made dictionary of sounds and their meanings.
Jelinek's innovations have made modern speech-to-text technologies possible; the applications are innumerable. Voice recognition devices automate transcription, allow the illiterate to write, pilots to fly by voice commands, and give the rest of us a way to google with no hands. The speech-to-text challenge is just one type of translation problem. Jelinek showed that the same statistical strategies in voice recognition technology could be useful in tackling a broader set of meaning conversion problems – notably translations from one language to another. So, when with a single click of Google Translator you can learn how to say `That is amazing' in Jelinek's native tongue, you can thank him for that too. To je úžasný!