[Update: Bloomberg View has a story on this same topic, with a couple of useful quotes:
- “Statistical significance in medicine and social science is expressed as a p value, which represents the odds that a result would occur by chance if there’s no effect from the diet pills or artichokes being tested.”
- “Science is a way of seeing the world more as it is, and less as we’d like it to be. Statistical techniques were invented by people who dreamed that the power of physics and chemistry might extend to a world of previously unpredictable phenomena, including human behavior. There may yet be something to it, once people work out the kinks.”
People who want to publish articles in scientific journals always want very small p-values attached to their findings, because it lends authenticity. The trouble is that people figured out how to adjust data so they could get a small p-value, which thereby allows them to get published, taken seriously, and even hired/tenured at a University, without necessarily contributing to the stock of useful knowledge. And that’s a problem.]
When I finally got around to taking college seriously (life begins at 30!) I had already decided that I wanted to study economics, and since I had heard that the key to studying economics was knowing lots of math, I tried to take math classes. I never made it past the second semester of calculus – math is, in its way, fascinating and wonderful, but I never had the fortitude to grind through problems for hours at a time. I adore philosophy and history and language, but numbers aren’t my bag. Nevertheless, I know a little bit about statistics, because you can’t study economics without knowing something about stats. And I do like data – I’m all about Noah Smith’s project to get empirical methods into the Econ 101 curriculum – but all the same, my command of statistical methods doesn’t go much past your average undergrad student.
Despite my fair-to-middling skill with stats themselves, I do know something about the conversation around statistical methods. This is mostly because I spent a lot of time taking classes with, and writing a Master’s thesis under, and generally hanging around, Professor Stephen Ziliak at Roosevelt University. He contributed to the recent American Statistical Association’s Statement on P-Values, in which they formally condemn the use of p-values as the standard of publication.
What has been sometimes odd to me is that p-values should be such a big deal. I know some statisticians, and they don’t seem all that hung up on p-values. In the world of statistics, they’re just another tool in the toolbox. There’s the Bayesian/Frequentist debate, and a friend of mine (who has a PhD in Stats) once told me that otherwise staid, buttoned-down statisticians can get positively apoplectic over the whole thing, although she doesn’t really feel one way or the other about it. Keynes wrote a whole book on probability early in his career that puts him in the Bayesian camp, I know, so I guess I’d fall on that side of the debate, but I’m not confident I could participate if the opportunity ever came up.
Anyways, the reason p-values are a big deal is because they’ve become a sort of unofficial standard in the scientific publishing world, and that leads into a rabbit hole of issues. The main thing to grasp is that science is basically how modern societies decide whether or not things are true – or least, its supposed to be. Science has not stopped a great many people from believing fervently in creationism, UFOs, astrology, and all manner of other nonsense. But since Western society decided that religion no longer had a lock on the truth – an unavoidable conclusion after centuries of killing each other over religious differences – science has become the standard bearer of verity.
The good folks at Fivethirtyeight recently put up a really wonderful article called “Science Isn’t Broken” that features an interactive app that demonstrates how easy it is to adjust your data so that you get a very low p-value, which then ensures publication in a peer reviewed (who’s reading this stuff anyways?) journal, which in turn you can put on your CV so that people will a) take you seriously as a scientist, and b) hire you, so that you can make a living. Capitalism, you’ve betrayed us again! But seriously, anyone who’s ever tried to argue about anything controversial in an internet forum knows (or should know) just how easy it is to find statistics or scientific “proof” of whatever point you care to make. This can become insanely frustrating very, very quickly. For example, the world’s most viewed online resource regarding climate change and global warming is devoted to disproving that humans have anything to do with it (which is to say, we should stop trying to do anything about it, and definitely stop legislating with regards to it). But if you telescope out a little further, you might notice that, if you can adjust the parameters of science to prove whatever you want, how are non-scientists supposed to trust the scientists? And that’s a pretty serious problem.
Psychology, for example, is currently undergoing a replication crisis. Scientific studies, it turns out, might just demonstrate the myriad biases of their designers – that is, they just tell us what we want to hear. I can tell you for sure this is a problem in economics – there are plenty of conservative business – I’m sorry, “free market” – oriented think tanks (like this one, and this one, and this one, and this one, and let’s not forget this one) that will pay you good money to write reports about how everything the government does constrains the economy, and the only thing to do is hand over all political power to technocrats and businessmen, democracy be damned.
In the world of biology and medicine, there is an epidemic of p-value hacking. A recently published paper in the Journal of the American Medical Association, which analyzed nearly 850,000 articles from the scholarly biomedical research literature published between 1990 and 2015 concluded “almost all abstracts and articles with P values reported statistically significant results, and, in a subgroup analysis, few articles included confidence intervals, Bayes factors, or effect sizes. Rather than reporting isolated P values, articles should include effect sizes and uncertainty metrics.”
So it would seem as if my dear Professor Ziliak, and his dear Professor turned co-author Deirdre McCloskey, were really on to something when they published “The Standard Error of Regressions” back in 1996. This original paper was later followed by further articles, including “The Cult of Statistical Significance” – which would be expanded into a book of the same name. Small wonder Ziliak ended up working with the ASA on their statement regarding p-values. Fivethirtyeight’s coverage of the statement’s publication even begins with one of Ziliak’s haiku!
Oddly, one of the more compelling parts of Ziliak’s research on p-values, at least for me, is how it lead to the extremely interesting history of the Guinness brewing company’s involvement in the development of statistical methods. Back around 1900 or so, they were working hard to develop beer as a globally exported good. It used to be that beer was consumed in the same place it was produced, but then some Bavarian monks figured out that if you added hops in the brewing process, you could store the beer for longer periods of time. This actually led to the domestic cultivation of hops in Bavaria and parts of eastern Europe – the town of Budweis, for example – and later on to the development of beer as a commodity good. The original hopped Bavarian beer was (and still is) called lager, from the German word lagern, meaning “to store.” Before the use of hops in beer, people put all manner of stuff in their brew – fruit, spices, chicken, whatever – but afterwards beer became just water, barley, yeast, and hops.
The expansion of the British Empire in the 19th century meant the creation of export markets for beer, and brewers were keen to exploit this opportunity. The good folks over at Guinness started hiring all manner of University trained scientists to figure out how to make a consistent product capable of maintaining quality over the course of long sea voyages (hence, India Pale Ale, which has lots of hops to help it keep). Among the egg heads working in the Guinness brewery was one William Gosset, who published articles in Statistics journals under the name “Student” and invented statistical significance, p-values, and the t-test. Ziliak, in his research, ended up discovering Gosset and his influence on the famous statistician R.A. Fisher, who included Gosset’s work in his formalization of statistical methodology. Personally, I find this sort of archaeological stuff totally fascinating. The world of biomedical publishing is now ruled by a method developed for the purpose of brewing beer!
For me, the whole story kind of underlines the pervasive influence of capitalism. There are a lot of unexpected turns in the story of the modern world – and it makes me eager to find out what happens next. While I was writing this post, I recalled a passage from Nietzsche’s Beyond Good and Evil where he’s talking about how, after Kant, who discovered that synthetic a priori judgements were possible by the virtue of a faculty, all the young theologians went running off into the bushes looking for faculties. In the modern world, Professors all must “Publish or Perish!” so off they go, looking for something to publish. And publishers, they need something to cut through all the noise, the sheer volume of submissions – and p-values are, I suppose, as good a standard as any. And so you get all these p-hacked papers! But what does Nietzsche tells us, regarding the unfortunate, naive theologians of the early 19th century?
One can do no greater wrong to the whole of this exuberant and enthusiastic movement, which was really youthfulness, however boldly it disguised itself in hoary and senile concepts, than to take it seriously, or worse, to treat it with moral indignation.