Limitations of RCTs: a post script

I wrote a few posts recently on the implications of politics and context for randomized controlled trials (RCTs). See here, here and here. I’ve received a few offline comments about them. Some clarifying points are in order.

“Limitations” of RCTs?

Yeah, just limitations. I could write a dozen blog posts about how RCTs are a great method for research and evaluation, and why their use is good for development practice and policymaking. But many others have already written about that. In fact, there are two recent books on the topic: Poor Economics by Banerjee and Duflo, and More Than Good Intentions by Karlan and Appel. I haven’t read either yet but I recommend them anyway. These books also explore the limitations of RCTs: a review of Poor Economics notes that the book goes beyond RCTs, and uses other methods to investigate why and how certain things work.

Similarly, I tried to explore some factors that are often glossed over in discussions of what RCTs mean for development. An RCT does certain things very well. But there are boundaries. That doesn’t change the fact that the RCT is the gold standard. As an analogy: gas chromatography-mass spectrometry is the gold standard for identifying substances, but it only works if the substance can be converted into a gas. It’s pretty important to know that. If you can’t convert the substance into a gas, you’ll have to use another method.

Failing to understand the limits of RCTs can result in to two kinds of errors: 1) believing we know more about the efficacy of a intervention than we do; or 2) concluding that an intervention is ineffective unless it’s been affirmed by an RCT. Either of these errors could lead policymakers and practitioners to inefficient allocations of the resources available for development. Let’s take each in turn.

Error 1: Overconfidence in RCT results. (Or: Academics gone wild?)

At the end of the second post, I made a comment about humility. Here’s what I wrote:

Science brings a potentially inflated sense of our own expertise. RCTs, and the development industry as a whole, would benefit from less certainty and greater humility.

Two different people who regularly work on RCTs contacted me to share their thoughts on this issue. Both argued that academic researchers are the last people in the world to be over-confident. They are so worried about being critiqued by other academics that they cover their bases and fill their articles with caveats galore.

But the risk isn’t really that academics would go too far (though I can think of a few professors with large egos). My main worry is the policymakers and practitioners who read the top-line results without delving into the nuance. The caveats get stripped away as you get closer to the decision maker. That’s why the first section of most reports is called the “executive summary” — the person calling the shots on resource allocations often doesn’t read the whole thing.

Those who recognize the nuance have a responsibility to reign in the advocates, policy makers and others in order to ensure the research findings are used well.

Error 2: Underconfidence in other methods, and in the interventions they evaluate.

There are certain types of interventions that are not conducive to rigorous testing through RCTs. There are several reasons this might be the case. Perhaps the intervention’s unit of impact is too large to make randomization possible; e.g. doing an RCT on macroeconomic policies would require random assignment of entire countries. Perhaps the intervention takes too long or the expected impacts will only be measurable in the distant future, making an RCT incredibly expensive; e.g. some peacebuilding or community development projects.

We need other methods to evaluate these interventions, policies and programs. In the gas chromatography-mass spectrometry analogy: these issues are the substances that cannot be converted into gases. The “gold standard” test simply doesn’t apply. In development, these other methods include quasi-experimental evaluations (i.e. no random assignment). We should also look further afield to methods based on case studies of significant changes, such as participatory video evaluations and action research.

Proponents of RCTs would argue that these other methods don’t give us the same level of certainty. I agree. We will never be able to measure the impact of these interventions with as much certainty as we can have for the simpler interventions. However, a lack of certainty is not the same as a lack of impact. There’s no correlation between our ability to be certain about a result, and the marginal benefit of investing in it.

In an industry that is increasingly looking to demonstrate impact (as it should be), the risk of highlighting RCTs as the “gold standard” is that we might under-invest in those projects which cannot be evaluated by RCT. Tyler Cowen put it this way:

The main danger with RCTs is that, in development economics, they will lead to an excess focus on social engineering as a driver of development.  They also will lead people to focus on problems which are amenable to success by piecemeal social engineering, which in turn will lead to biases in our understanding and a neglect of big picture questions about economic growth.

I concur, though I would amend Cowen’s “big picture questions” to include economic growth as well as politics, governance, human rights, conflict resolution, peacebuilding, and a whole lot more. The fact that we have less certainty when measuring our impact on these issues does not mean we should invest less in them.


Want to know more about RCTs? Check out the books I mentioned above: Poor Economics by Banerjee and Duflo, and More Than Good Intentions by Karlan and Appel. Also, there’s a new Development Impact blog that promises to be very interesting.