Max H. Stein Professor of Statistics and Biostatistics at Stanford, Bradley Efron, is one of today's greatest statisticians. During his long career he has been awarded many honours, served as President of the American Statistical Association and of the Institute of Mathematical Sciences, has been editor of the Journal of the American Statistical Association and is founding editor of the Annals of Applied Statistics. As a matter of fact, Professor Efron is most widely known among statisticians for inventing the bootstrap. However, in the last few years he has been working on large-scale inference, the art of making statistical inferences from large, huge, actually, datasets. Now, after more than ten years working in this burgeoning area, Efron has produced a new presentation of large-scale simultaneous inference theory and practice, exposing its limitations and providing hints for new developments.

For those of you not well acquainted with multiple testing, you might consider reading this article and/or listening to this presentation by Professor Efron as an introduction before addressing the book. The book begins with an introduction to empirical Bayes inference and the James-Stein estimator (Chapter 1), to large-scale Bayesian hypothesis testing (Chapter 2), and a review of frequentist simultaneous hypothesis testing (Chapter 3). Next, false discovery rate methodology is presented in the form of the more frequentist Benjamini-Hochberg’s Control Algorithm (Chapter 4) and local false discovery rates (Chapter 5), its Bayesian alternative. From that point on the book becomes more technical (especially Chapters 7 through to 10) and deals with some interesting methodological questions and limitations: how to choose the null distribution and why the theoretical null may not be the best choice (Chapter 6), how to calculate estimation accuracy (Chapter 7) and correlations (Chapter 8). Next, the focus is on the analysis of sets of cases, also known as enrichment (Chapter 9), and when and whether it is convenient to combine or separate cases in a single analysis (Chapter 10). The book ends with prediction, the estimation of the effect size and a treat, a presentation of one of the earliest works by Efron on the ‘missing species problem’ and Shakespeare’s word knowledge (Chapter 11).

The book is actually good reading. The mathematical level of the material presented is not really high, and an undergraduate student with a second course in statistics could follow the text quite easily, even if some linear algebra is necessary to get on with some of the more technical parts. Demonstrations are found in the end of sections so as not to stop the reading flow, whereas examples and exercises are very well used as devices to keep the reader focused and interested. Indeed, Efron makes good use of examples, coming from his own experience, to illustrate theoretical results (although some readers might have problems with reading the small font in some of the charts). And I really liked his use of exercises, freely integrated in the text and not very difficult, as a sort of guided reading aids that do not let the readers’ attention waver. The reader is asked to complete or demonstrate some theoretical result, to ponder and discuss some theoretical or practical question, or to apply the theory to the examples data, helping the reader to improve their comprehension of the subject and get a good command of it.

In summary I think this book is a pretty good gateway to the statistics of the future for the future Fishers and Neymans.

Skip to Main Site Navigation / Login