Sometime in the 2012-2014 range, as the reproducibility crisis was heating
up in my particular corner of science, I fell in love with a new (for me)
approach to research. I fell in love with the research question.
Before this time, most of my research had involved making directional
predictions that were at least loosely based on an existing theoretical
perspective, and the study was designed to confirm the prediction. I didn’t
give much thought to whether, if the study did not come out as expected, it
would help to falsify the prediction and associated theory. (The answer was
usually no: When the study didn’t come out as expected, I questioned the
quality of the study rather than the prediction. In contrast, when the study “worked,”
I took it as evidence in support of the prediction. Kids at home: Don’t do
this: It’s called motivated reasoning, and it’s very human but not particularly
objective or useful for advancing science.)
But at some point, I began to realize that science, for me, was much
more fun when I asked a question that
would be interesting regardless of how the results turned out, rather than
making a single directional prediction that would only be interesting if the
results confirmed it. So my lab started asking questions. We would think
through logical reasons that one might expect Pattern A vs. Pattern B vs.
Pattern C, and then we would design a study to see which pattern occurred.* And,
whenever we were familiar enough with the research paradigm and context to
anticipate the likely researcher degrees of freedom we would encounter when
analyzing our data, we would carefully specify a pre-analysis plan so that we
could have high confidence in results that we knew were based on
data-independent analyses.
One day, as my student was writing up some of these studies for the
first time, we came across a puzzle. Should we describe the studies as
“exploratory,” because we hadn’t made a clear directional prediction ahead of
time? Or as “confirmatory,” because our analyses were all data-independent
(that is, we had exactly followed a carefully specified pre-analysis plan when
analyzing the results)?
This small puzzle became a bigger puzzle as I read more work, new and
old, about prediction, falsification, preregistration, HARKing, and the
distinction between exploratory and confirmatory research. It became
increasingly clear to me that it is often useful to distinguish between the
goal of theory falsification and the
goal of Type I error control, and to
be clear about exactly what kinds of tools can help achieve each of those
goals.
I wrote a short piece about this for PNAS. Here’s what it boils down
to:
1. If you
have the goal of testing a theory, it
can be very useful to preregister directional predictions derived from that
theory. In order to say that your study tested
a theory, you must be willing to upgrade or
downgrade your confidence in the theory in response to the results. If you
would trust the evidence if it supported your theory but question it if it
contradicted your theory, then it’s not a real test. Put differently, it’s not
fair to say that a study provides support for your theory if you wouldn’t have
been willing to say (if the results were different) that the same study
contradicted your theory.**
2. If you
have the goal of avoiding unintentional
Type I error inflation so that you can place higher confidence in the
results of your study, it can be very useful to preregister an analysis plan in
which you carefully specify the various researcher decisions (or “researcher
dfs”) that you will encounter as you construct your dataset and analyze your
results. If your analyses are data-independent and if you account for
multiple testing, you can take your p-values as diagnostic about the likelihood
of your results.***
Why do I think this distinction
is so important? Because thinking clearly about WHY you are preregistering (a) helps
ensure that the way you preregister
actually helps achieve your goals and (b) answers a lot of questions that may
arise along the way.
Here’s an example of (a): If you want to test a theory, you need to be
ready to update your confidence in the theory in EITHER direction, depending on
the results of the study. If you can’t specify ahead of time (even at the
conceptual level) a pattern of evidence that would reduce your confidence in
the theory, the study is not providing a true test...and you don’t get to say that evidence consistent with the prediction
provides support for the theory. (For instance: If a study is designed to
test the theoretical prediction that expressing prejudice will increase
self-esteem, you must be willing to accept the results as evidence against the
theory if you find that expressing prejudice decreases self-esteem.)
Here’s an example of (b): You might preregister before running a study
and then find unexpected results. If
they aren’t very interesting to you, do you need to find a way to publicize
them? The answer depends on why you preregistered in the first place. If
you had the goal of combatting publication bias and/or theory testing, the
answer is definitely YES. But if your goal was solely to constrain your Type I
error rate, you’re done—deciding not to publish obviously won’t increase your
risk of reporting a false positive.
Read the (very short) PNAS letter here. You can also read a
reply from Nosek et al. here, and two
longer discussions (Part I
and Part
II) that I had with the authors about where we agree and disagree, which I will
unpack more in subsequent posts.
---
*The best part of this approach? We were no longer motivated to find a particular pattern of results, and could approach the design and analysis of each study in a much more open-minded way. We could take our ego out of it. We get to yell "THAT'S FASCINATING!" when we analyze our data, regardless of the results.
**In philosophy of science terms, tests of theoretical predictions must
involve some risk. The risk in question derives from the fact that the
theoretical prediction must be “incompatible with certain possible results of
observation” (Popper, 1962, p. 36)—it must be possible for the test to result
in a finding that is inconsistent with the theoretical prediction.
***DeGroot (2014) has a great discussion of this point.
***DeGroot (2014) has a great discussion of this point.
No comments:
Post a Comment