A core mission of JOTE is closing the gap between what is research and what is published. In essence, we’re asking the very real question “when is research meaningful?” and coming out on the other end to say that any research that is meaningful deserves to be read, whether it yields a positive result or not. Implicit in this view, however, is the notion that we’re operating within the same binary we’re trying to fight (just choosing the other side): the good vs. the bad, the positive vs. the negative, the significant vs. the non-significant. This couldn’t be further from the case!

As many researchers know, publication bias often involves screening for statistically significant results and ignoring other results (also known as the file drawer problem). This mistaken belief that non-significant results are unimportant or not meaningful has elsewhere been referred to as the zero fallacy: if a result doesn’t have the three little asterisks, its effect is essentially zero (Kline, 2013, p. 100). (One of) the issue(s) with this view is that it creates a dichotomy between significant and non-significant results; it purports that data are all-or-nothing: they either do or do not explain some phenomenon.  This view is, of course, incorrect, because many effects can be useful to explain some things, without necessarily being statistically significant.

However, just like non-significant results can be meaningful, statistically significant results can be absolutely meaningless! There are many reasons for this, but to name just a few:

  • Very large sample sizes will always yield significant results, even if the numbers are random (Kline, 2013, p. 12). This is because as you collect more data in your sample, the value you need to call a result statistically significant decreases, whereas the number it is being compared against (the test statistic) increases. If someone has enough money to test lots of people, they will always get a statistically significant result.
  • Statistical significance is estimated through the use of p-values, which require very strict assumptions depending on the test they are used with. Many researchers don’t bother to check these assumptions in advance, which result in data that is essentially uninterpretable from the perspective of significance testing. Some of these assumptions, like that scores must be measured without any error (they must be perfect), are so unrealistic that nearly every neuroscientific or psychological study is in violation of at least one of them.
  • Even if the assumptions of significance testing are met, p-values are fundamentally misunderstood. Many, many, researchers consistently misunderstand what is being said when a p-value is less than a given value (for a rundown of some p-value myths, see Lambdin, 2012).
  • What’s more, even if the assumptions of significance testing are met and the p-values are correctly interpreted, a statistically significant result doesn’t tell us much about the data at all! In fact, all a statistically significant result says is that two numbers are probably different, given certain assumptions. But, by how much do they differ and why does it matter anyway (Ziliak & McCloskey, 2008)? If I were to say that taking a 200-ounce pill fives times a day made your risk of cancer different than not doing so, you’d likely want to know more about what it did to your chances and if taking such an inconvenient treatment was practically worth it. Significance testing alone cannot answer these questions.

All of this is to say that our mission at JOTE is not only to spotlight results that fail to achieve statistical significance – because trial and error are essential to the scientific process – but also to scrutinize positive results and ask the broader question: “why does this result matter?” A healthy science requires a place where results are treated equally and the tyranny of the dichotomy between significant and non-significant results is lifted. JOTE is this place. 

References

Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences. American Psychological Association.

Lambdin, C. (2012). Significance tests as sorcery: Science is empirical—significance tests are not. Theory & Psychology, 22, 67-90. doi: 10.1177/0959354311429854

Ziliak, S., & McCloskey, D. N. (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. University of Michigan Press.

Leave a Reply

Your email address will not be published. Required fields are marked *