Steven M. Stigler - The Seven Pillars of Statistical Wisdom

Steven M. Stigler 2016. The Seven Pillars of Statistical Wisdom. Kindle edition. Cambridge, Massachutetts: Harvard University Press, 240 pp.

This excellent little book about the history and development of the fundamental ideas of statistics is so historically thorough and so effortless in explaining fundamental concepts that I soon became curious about the author. Stephen M. Stigler is at the University of Chicago and his career has been an “Investigation of the history of the development of statistical methods …” . Many of the topics that comprise his stated research interests are developed in the book.

The title is derived from Seven Pillars of Wisdom by T.E. Lawrence who in turn sourced the Old Testament. Stigler’s seven pillars are:

Aggregation is the idea that some kind of statistical summary of cases, observations or other data is informative. This idea is traced to efforts to summarise variability of compass needle readings from around 1580-1630. Sumerian tablets include tables of crop yields much earlier (~3000 BCE) but they apparently didn’t summarise data. And apparently nor did the Egyptians, Greeks, Arabians or anyone else until Pythagoras and his followers documented the arithmetic, geometric and harmonic means about 280 BCE.

Information follows naturally. What is gained from increasing the number of observations? Stigler traces this to the Greeks, but their approach, typically, was philosophical (when does a gradual accumulation of sand grains become a “heap”?). The first recognisably statistical treatment seems to be the late 1200s when the standard for testing legitimate currency was documented as falling within acceptable limits of variability. By the early 1700s it was known that variability about the mean did not increase in proportion with the number of observations. During the period 1730-1830 publications by de Moivre, Laplace and Poisson developed the concepts of inflection points (on a binomial distribution) and the Central Limit Theorem that would become the standard deviation. (Much later Francis Galton independently developed the standard deviation.) This section also contains cautionary examples of concluding periodicities to be valid when in fact they are spurious and are an artefact of rounding errors and sampling bias in preparing the data. Thus “periodic” reversals of the Earth’s magnetic field and of extinctions are not what they seem.

Likelihood is about the use of probability to estimate how much we should be surprised by a given observation and how we might go about making comparisons and tests of significance. Again, ideas began to advance in the early 1700s: in 1710 John Arbuthnot published on the probability of variations from an even ratio of male and female babies in birth data. In 1748 David Hume published on the extreme improbability of miracles, and the much greater probability that “miracles” were merely incorrect reports. Very interesting to read here that Thomas Bayes (the Reverend Bayes) when he wrote his own more famous essay was very likely responding directly to Hume in seeking to show that miracles were not as unlikely as Hume had said. Bayes’ essay was published posthumously in 1764. In 1827 Pierre Simon Laplace published a forerunner of what would become P-values in calculating the chance of detecting a tide in Paris. Apparently astronomer Simon Newcomb was the first, in 1860, to use what is now known as a Poisson point process (to calculate the probability of 6 stars occurring randomly in a 1° square of the sky).

Intercomparison - is about the idea, dating from 1875 (Francis Galton) and more fully in the 1920s, that comparisons between sets of observations can be made solely on the basis of statistical variation in the data. The external context is not relevant. Other contributors include Karl Pearson, William Gossett (“Student”) on t-tests and R.A. Fisher on the more general case of multiple comparisons.

Regression has a long explanation about Francis Galton and his search for a biological or mathematical explanation for why the variability required by Charles Darwin’s theory of natural selection was not actually observed in successive generations. The explanation was mathematical, not biological: regression towards the mean (imperfect correlation between generations). In 1888 Galton also introduced the correlation coefficient. Modern recognition that “correlation does not imply causation” seems to be due to Karl Pearson but there is a version by philosopher George Berkeley in 1710. Stephen Stigler also credits Galton with the first use of Bayesian inference using posterior distributions. However it is not clear whether Galton was using Bayes’ work directly or (more likely?) based on the method of Laplace.

Design is about designing experiments and how to collect observations, a slippery idea that depending on how broadly it is interpreted can be traced at least back to ideas in the Old Testament and written by Arabic scholars. More interesting is the recognition that randomisation is necessary in making inference. Stephen Stigler traces this idea to Charles S. Peirce in the 1880s.

Residual is traced to John Hershel in A preliminary Discourse on the Study of Natural Philosophy (1831):

” … when the effects of all known causes are estimated with exactness and subducted, the residual facts are constantly appearing in the form of phenomena altogether new, and leading to the most important conclusions”

It is clear from this history that most of these fundamental ideas in statistical analysis began their development in the 1700s and 1800s. However I was left with the impression that Stephen Stigler thinks that not much of fundamental importance has been developed since the early work of R.A. Fisher in the 1920s. It would be interesting to read Stigler address that question explicitly, but he does not do so, at least not in this book.

Most of the references are original sources dating from the 1800s and earlier, and these are dominated by British and European publications.