Mystery boxes are like that. Full of mystery and promise. We imagine they could be anything. We imagine them as their best selves. We imagine that they will Make Life Better, or in science, that they will Make Science Better.
Lately, I’ve been thinking a lot about what’s inside the mystery box of preregistration.
Why do I think of preregistration as a mystery box? I’ll talk about the first reason now, and come back to the second in a later post. Reason the First is that we don’t know what preregistration is or why we’re doing it. Or rather, we disagree as a field about what the term means and what the goals are, and yet we don’t even know there’s disagreement. Which means that when we talk about preregistration, we’re often talking past each other.
Here are three examples that I gave in my talk last week at SPSP.
Back in the olden days, in June 2013, Chris Chambers and colleagues published an open letter in the Guardian entitled “Trust in Science Would be Improved by Preregistration." They wrote: “By basing editorial decisions on the question and method, rather than the results, pre-registration overcomes the publication bias that blocks negative findings from the literature.” In other words, they defined preregistration as peer-review and acceptance of articles before the results are known, and they saw the major goal or benefit of preregistration as combatting publication bias.
Fast forward to December 2017: Leif Nelson and colleagues publish a blog post on “How to Properly Preregister a Study.” They define preregistration as “time-stamped documents in which researchers specify exactly how they plan to collect their data and to conduct their key confirmatory analyses,” and they explain that the goal is “to make it easy to distinguish between planned, confirmatory analyses…and unplanned exploratory analyses.”
Then just a few weeks ago, out of curiosity, I posted the following question on PsychMAP: A researcher tweets: "A new preregistered study shows that X affects Y." What would you typically infer about this study? By far the most popular answer (currently at 114 votes, compared to the next most popular of 44) was that the main hypothesis was probably a priori (no HARKing). In other words, many psychologists seem to think preregistration means writing down your main hypothesis ahead of time, and perhaps that a primary goal is to avoid HARKing.
So if I tell you that I preregistered my study, what do I mean? And why did I do it—what was my goal?
I think we are in desperate need of a shared and precise language to talk about preregistration. Just like any other scientific construct, we’re not going to make much headway on understanding it or leveraging it if we don’t have a consensual definition of what we are talking about.
It seems to me that there are (at least) four types of preregistration, and that each one serves its own potential function or goal. These types of preregistration are not mutually exclusive, but doing any one of them doesn’t necessarily mean you’re doing the others.
Notice that I said POTENTIAL benefit or function. That’s crucial. It means that depending on how you actually implement your preregistration, you may or may not achieve the ostensible goal of that type of preregistration. Given how important these goals are for scientific progress, we need to be paying attention to how and when preregistration can be an effective tool for achieving them.
Let’s say you want to combat publication bias, so you preregister your study on AsPredicted or OSF. Who is going to be looking for your study in the future? Will they be able to find it, or might the keywords you’re using be different from the ones they would search for? Will they be able to easily find and interpret your results? Will you link to your data? If so, will a typical researcher be able to understand your data file at a glance, or did you forget to change the labels from, say, the less-than-transparent VAR1043 and 3REGS24_2?
Let’s say you have a theory that’s ready to be tested.* So you record your hypothesis ahead of time: “Stereotyping will increase with age.” But what kind of stereotypes? Measured how? The vagueness of the prediction leaves too much wiggle room for HARKing later on—and here I mean HARKing in the sense of retroactively fitting the theory to whatever results you happen to find. If you find that racial stereotypes get stronger but elderly stereotypes get weaker as people age, your vague a priori prediction leaves room for your motivated human mind to think to itself “well of course, the prediction doesn’t really make sense for stereotypes of the elderly, since the participants are themselves elderly.” Obviously, adjusting your theory in light of the data is fine if you’re theory building (e.g., asking “Do various stereotypes increase with age, and if so, when?”), but you wouldn’t want to do it and then claim that you were testing a theory-derived hypothesis in a way that allowed for the theory to be falsified. [Update: As Sanjay Srivastava pointed out in a PsychMAP discussion about this post, it's important to recognize that often, researchers wishing to provide a strong and compelling test of a theory will want to conduct a preregistration that combines elements 2 and 3—that is, specific, directional predictions about particular variables that are clearly derived from theory PLUS clear a priori constraints on researcher degrees of freedom.]
Or let’s say that want to be able to take your p-value as an indicator of the strength of evidence for your effect, in a de Grootian sense, and so you preregister a pre-analysis plan. If you write down “participants have to be paying attention,” it doesn’t clearly constrain flexibility in data analysis because there are multiple ways to decide whether participants were paying attention (e.g., passing a particular attention check vs. how long participants spent completing the survey vs. whether a participant clicked the same number for every item on a survey). If you want to avoid unintentional researcher degrees of freedom (or “HARKing” in the sense of trying out different researcher decisions and selecting only the ones that produce the desired results), you need to clearly and completely specify all possible researcher decisions in a data-independent way.**
In fact, the registered report is really the only kind of preregistration on here that’s straightforward to implement in an effective way, because much of the structure of how to do it well has been worked out by the journal and because the pieces of the preregistration are being peer reviewed.
Which brings me to the second reason why I call preregistration a mystery box: What’s inside a preregistration when it HASN’T been part of a peer reviewed registered report? Maybe not what you would assume. Stay tuned.
---
*Many of us don’t do a lot of theory testing, by the way—we might be collecting data to help build theory, or asking a question that’s not very theoretical but might later end up connecting speculatively to some theories in potentially interesting ways (the sort of thing you do in a General Discussion)—but we’re not working with a theory that generates specific, testable predictions yet.
**Yeah, so, we’ve been using “HARKing” to mean two things…sometimes we use it to mean changing the theory to fit the results, which hampers falsifiability, and sometimes we use it to mean reporting a result as if it’s the only one that was tested, which hampers our ability to distinguish between exploratory and confirmatory analyses. (In his 1998 article, Kerr actually talked about multiple variants of HARKing and multiple problems with it.)***
***We’ve also been using “exploratory/confirmatory” to distinguish between both exploratory vs. confirmatory PREDICTIONS (do you have a research question or a directional prediction?) and exploratory vs. confirmatory ANALYSES (are your analyses data-dependent or data-independent/selected before seeing the data).****
****Did I mention that our terminology is a hot mess?