Sometime in the 2012-2014 range, as the reproducibility crisis was heating up in my particular corner of science, I fell in love with a new (for me) approach to research. I fell in love with the research question.
Before this time, most of my research had involved making directional predictions that were at least loosely based on an existing theoretical perspective, and the study was designed to confirm the prediction. I didn’t give much thought to whether, if the study did not come out as expected, it would help to falsify the prediction and associated theory. (The answer was usually no: When the study didn’t come out as expected, I questioned the quality of the study rather than the prediction. In contrast, when the study “worked,” I took it as evidence in support of the prediction. Kids at home: Don’t do this: It’s called motivated reasoning, and it’s very human but not particularly objective or useful for advancing science.)
But at some point, I began to realize that science, for me, was much more fun when I asked a question that would be interesting regardless of how the results turned out, rather than making a single directional prediction that would only be interesting if the results confirmed it. So my lab started asking questions. We would think through logical reasons that one might expect Pattern A vs. Pattern B vs. Pattern C, and then we would design a study to see which pattern occurred.* And, whenever we were familiar enough with the research paradigm and context to anticipate the likely researcher degrees of freedom we would encounter when analyzing our data, we would carefully specify a pre-analysis plan so that we could have high confidence in results that we knew were based on data-independent analyses.
One day, as my student was writing up some of these studies for the first time, we came across a puzzle. Should we describe the studies as “exploratory,” because we hadn’t made a clear directional prediction ahead of time? Or as “confirmatory,” because our analyses were all data-independent (that is, we had exactly followed a carefully specified pre-analysis plan when analyzing the results)?
This small puzzle became a bigger puzzle as I read more work, new and old, about prediction, falsification, preregistration, HARKing, and the distinction between exploratory and confirmatory research. It became increasingly clear to me that it is often useful to distinguish between the goal of theory falsification and the goal of Type I error control, and to be clear about exactly what kinds of tools can help achieve each of those goals.
I wrote a short piece about this for PNAS. Here’s what it boils down to:
1. If you have the goal of testing a theory, it can be very useful to preregister directional predictions derived from that theory. In order to say that your study tested a theory, you must be willing to upgrade or downgrade your confidence in the theory in response to the results. If you would trust the evidence if it supported your theory but question it if it contradicted your theory, then it’s not a real test. Put differently, it’s not fair to say that a study provides support for your theory if you wouldn’t have been willing to say (if the results were different) that the same study contradicted your theory.**
2. If you have the goal of avoiding unintentional Type I error inflation so that you can place higher confidence in the results of your study, it can be very useful to preregister an analysis plan in which you carefully specify the various researcher decisions (or “researcher dfs”) that you will encounter as you construct your dataset and analyze your results. If your analyses are data-independent and if you account for multiple testing, you can take your p-values as diagnostic about the likelihood of your results.***
Why do I think this distinction is so important? Because thinking clearly about WHY you are preregistering (a) helps ensure that the way you preregister actually helps achieve your goals and (b) answers a lot of questions that may arise along the way.
Here’s an example of (a): If you want to test a theory, you need to be ready to update your confidence in the theory in EITHER direction, depending on the results of the study. If you can’t specify ahead of time (even at the conceptual level) a pattern of evidence that would reduce your confidence in the theory, the study is not providing a true test...and you don’t get to say that evidence consistent with the prediction provides support for the theory. (For instance: If a study is designed to test the theoretical prediction that expressing prejudice will increase self-esteem, you must be willing to accept the results as evidence against the theory if you find that expressing prejudice decreases self-esteem.)
Here’s an example of (b): You might preregister before running a study and then find unexpected results. If they aren’t very interesting to you, do you need to find a way to publicize them? The answer depends on why you preregistered in the first place. If you had the goal of combatting publication bias and/or theory testing, the answer is definitely YES. But if your goal was solely to constrain your Type I error rate, you’re done—deciding not to publish obviously won’t increase your risk of reporting a false positive.
Read the (very short) PNAS letter here. You can also read a reply from Nosek et al. here, and two longer discussions (Part I and Part II) that I had with the authors about where we agree and disagree, which I will unpack more in subsequent posts.
*The best part of this approach? We were no longer motivated to find a particular pattern of results, and could approach the design and analysis of each study in a much more open-minded way. We could take our ego out of it. We get to yell "THAT'S FASCINATING!" when we analyze our data, regardless of the results.
**In philosophy of science terms, tests of theoretical predictions must involve some risk. The risk in question derives from the fact that the theoretical prediction must be “incompatible with certain possible results of observation” (Popper, 1962, p. 36)—it must be possible for the test to result in a finding that is inconsistent with the theoretical prediction.
***DeGroot (2014) has a great discussion of this point.
***DeGroot (2014) has a great discussion of this point.