Mystery boxes are like that. Full of mystery and promise. We imagine they could be anything. We imagine them as their best selves. We imagine that they will Make Life Better, or in science, that they will Make Science Better.
Lately, I’ve been thinking a lot about what’s inside the mystery box of preregistration.
Why do I think of preregistration as a mystery box? I’ll talk about the first reason now, and come back to the second in a later post. Reason the First is that we don’t know what preregistration is or why we’re doing it. Or rather, we disagree as a field about what the term means and what the goals are, and yet we don’t even know there’s disagreement. Which means that when we talk about preregistration, we’re often talking past each other.
Here are three examples that I gave in my talk last week at SPSP.
Back in the olden days, in June 2013, Chris Chambers and colleagues published an open letter in the Guardian entitled “Trust in Science Would be Improved by Preregistration." They wrote: “By basing editorial decisions on the question and method, rather than the results, pre-registration overcomes the publication bias that blocks negative findings from the literature.” In other words, they defined preregistration as peer-review and acceptance of articles before the results are known, and they saw the major goal or benefit of preregistration as combatting publication bias.
Fast forward to December 2017: Leif Nelson and colleagues publish a blog post on “How to Properly Preregister a Study.” They define preregistration as “time-stamped documents in which researchers specify exactly how they plan to collect their data and to conduct their key confirmatory analyses,” and they explain that the goal is “to make it easy to distinguish between planned, confirmatory analyses…and unplanned exploratory analyses.”
Then just a few weeks ago, out of curiosity, I posted the following question on PsychMAP: A researcher tweets: "A new preregistered study shows that X affects Y." What would you typically infer about this study? By far the most popular answer (currently at 114 votes, compared to the next most popular of 44) was that the main hypothesis was probably a priori (no HARKing). In other words, many psychologists seem to think preregistration means writing down your main hypothesis ahead of time, and perhaps that a primary goal is to avoid HARKing.
So if I tell you that I preregistered my study, what do I mean? And why did I do it—what was my goal?
I think we are in desperate need of a shared and precise language to talk about preregistration. Just like any other scientific construct, we’re not going to make much headway on understanding it or leveraging it if we don’t have a consensual definition of what we are talking about.
It seems to me that there are (at least) four types of preregistration, and that each one serves its own potential function or goal. These types of preregistration are not mutually exclusive, but doing any one of them doesn’t necessarily mean you’re doing the others.
Notice that I said POTENTIAL benefit or function. That’s crucial. It means that depending on how you actually implement your preregistration, you may or may not achieve the ostensible goal of that type of preregistration. Given how important these goals are for scientific progress, we need to be paying attention to how and when preregistration can be an effective tool for achieving them.
Let’s say you want to combat publication bias, so you preregister your study on AsPredicted or OSF. Who is going to be looking for your study in the future? Will they be able to find it, or might the keywords you’re using be different from the ones they would search for? Will they be able to easily find and interpret your results? Will you link to your data? If so, will a typical researcher be able to understand your data file at a glance, or did you forget to change the labels from, say, the less-than-transparent VAR1043 and 3REGS24_2?
Let’s say you have a theory that’s ready to be tested.* So you record your hypothesis ahead of time: “Stereotyping will increase with age.” But what kind of stereotypes? Measured how? The vagueness of the prediction leaves too much wiggle room for HARKing later on—and here I mean HARKing in the sense of retroactively fitting the theory to whatever results you happen to find. If you find that racial stereotypes get stronger but elderly stereotypes get weaker as people age, your vague a priori prediction leaves room for your motivated human mind to think to itself “well of course, the prediction doesn’t really make sense for stereotypes of the elderly, since the participants are themselves elderly.” Obviously, adjusting your theory in light of the data is fine if you’re theory building (e.g., asking “Do various stereotypes increase with age, and if so, when?”), but you wouldn’t want to do it and then claim that you were testing a theory-derived hypothesis in a way that allowed for the theory to be falsified. [Update: As Sanjay Srivastava pointed out in a PsychMAP discussion about this post, it's important to recognize that often, researchers wishing to provide a strong and compelling test of a theory will want to conduct a preregistration that combines elements 2 and 3—that is, specific, directional predictions about particular variables that are clearly derived from theory PLUS clear a priori constraints on researcher degrees of freedom.]
Or let’s say that want to be able to take your p-value as an indicator of the strength of evidence for your effect, in a de Grootian sense, and so you preregister a pre-analysis plan. If you write down “participants have to be paying attention,” it doesn’t clearly constrain flexibility in data analysis because there are multiple ways to decide whether participants were paying attention (e.g., passing a particular attention check vs. how long participants spent completing the survey vs. whether a participant clicked the same number for every item on a survey). If you want to avoid unintentional researcher degrees of freedom (or “HARKing” in the sense of trying out different researcher decisions and selecting only the ones that produce the desired results), you need to clearly and completely specify all possible researcher decisions in a data-independent way.**
In fact, the registered report is really the only kind of preregistration on here that’s straightforward to implement in an effective way, because much of the structure of how to do it well has been worked out by the journal and because the pieces of the preregistration are being peer reviewed.
Which brings me to the second reason why I call preregistration a mystery box: What’s inside a preregistration when it HASN’T been part of a peer reviewed registered report? Maybe not what you would assume. Stay tuned.
---
*Many of us don’t do a lot of theory testing, by the way—we might be collecting data to help build theory, or asking a question that’s not very theoretical but might later end up connecting speculatively to some theories in potentially interesting ways (the sort of thing you do in a General Discussion)—but we’re not working with a theory that generates specific, testable predictions yet.
**Yeah, so, we’ve been using “HARKing” to mean two things…sometimes we use it to mean changing the theory to fit the results, which hampers falsifiability, and sometimes we use it to mean reporting a result as if it’s the only one that was tested, which hampers our ability to distinguish between exploratory and confirmatory analyses. (In his 1998 article, Kerr actually talked about multiple variants of HARKing and multiple problems with it.)***
***We’ve also been using “exploratory/confirmatory” to distinguish between both exploratory vs. confirmatory PREDICTIONS (do you have a research question or a directional prediction?) and exploratory vs. confirmatory ANALYSES (are your analyses data-dependent or data-independent/selected before seeing the data).****
****Did I mention that our terminology is a hot mess?
Hi Alison. Thanks for leading this discussion about the risks of fractured communication. I know this is being discussed on facebook and twitter too, I wonder what the best way forward is to help establish shared vocabulary and reduce confusion, so that expectations can be reasonably managed. All of the points you made are spot on, and there is even more confusion in some areas (for example, some Registered Reports don't have an associated preregistration, some preregistrations are not actually in a registry or a functional equivalent of one, and on and on!). We're happy to help amplify your message and want to do what you think is necessary to meet the goal of increasing clarity (and by extension, rigor!).
ReplyDeleteThanks, David...I agree that "what's the best way forward?" is exactly the question we need to be asking -- how do we clarify our language to reduce confusion, but also how do we make sure that we're distinguishing between the different potential goals that preregistration can serve, and how do we design effective strategies for creating and reviewing preregistrations that ensure we are actually getting the benefits we want from them? It would be great to collaborate with COS on possible ways to move in that direction!
DeleteGood points. It would be terrific if wise folks did a reboot on the vocab in this domain.
ReplyDeletePsych Science recently changed to a system in which the action editor vets the author's badge requests, including for the prereg badge. I hope that will contribute to improving the quality of the preregs associated with articles that get that badge. But just as important is ensuring that researchers have a good grasp on the purpose of preregistration (to differentiate exploration from hypothesis testing).
BTW, I would not be surprised if even RRs sometimes fail to anticipate all relevant researcher degrees of freedom.
Steve
I think we need not just a reboot on the vocab but also some careful thinking about the different problems that we're hoping preregistration can help us solve. One point I was trying to make is that because different people have used "preregistration" to mean different things over the last few years, we've lumped together all the different problems that these different things can solve. So now, we assume that preregistration almost automatically confers a whole range of benefits, from combating publication bias (which was touted as a benefit of "preregistration" meaning registered reports, but isn't very connected to "preregistration" meaning uploading a document with some predictions in it so one or another repository), to differentiating studies that test a priori directional predictions from studies that ask open-ended research questions, to differentiating analyses that were data-independent from analyses that were data-dependent.
DeleteI agree that RRs may sometimes or even often fail to anticipate all relevant researcher dfs. But I would predict that articles published via a registered report mechanism are more likely than articles published as a regular journal article with a preregistered study to transparently and clearly disclose all deviations from the preregistered analysis plan. In other words, I suspect that because the RR format usually requires researchers to write up their planned methods section and planned analyses and then to disclose any and all changes to the editor and potential reviewers, there is more accountability built into that system, and authors are more likely to be careful about clarifying any necessary changes that emerged to the analysis plan. (Of course, this prediction of mine could and arguably should be tested empirically!)