(This is the last in a three-part series on the limitations of randomized controlled trials (RCTs). See the first and second posts. The series responds to a recent set of essays in the Boston Review.)
The last post discussed how context and politics confound our ability to make measurements of a program’s full impact. This reduces (and perhaps eliminates) our ability to draw accurate generalizations about an intervention’s impact, even if we repeat a trial in multiple contexts.
This post will focus on a different complication that context raises for RCTs: the fact that most programs adapt to context during the intervention. Before I get there, let’s start here:
Context matters. But what exactly is “context”?
Much of the learning in development over the past few decades has hammered in the principle that context matters. Cookie-cutter policy solutions and programs that ignore local conditions make about as much sense as a business that doesn’t respond to market conditions. You could try it, but you’ll probably fail.
What do we mean by “context”? As far as I can tell, it’s not a well-defined analytical category. “Context” just means all the factors that exist outside the program, and especially those factors that make a particular setting unique. Here’s a partial list of things we might consider as part of “context”:
- culture: norms, traditions, etc.
- economics: livelihoods, job prospects, etc.
- politics: local leadership, national structures, power relationships, etc.
- health: disease, sanitation, etc.
- education: literacy, schooling, etc.
- environmental factors: climate, seasons, agricultural possibilities, etc.
- and much more…
This list should make two things clear. First, “context” is complex and nuanced. It’s hard to fully describe a given context, regardless of whether you’re an outsider or a local. There are too many factors involved. Second, a lot of context will be unknown to program planners.
So how do you deal with context in programming?
Good programs adapt to context — both before and during implementation.
Accounting for context could occur in one of two ways: either you do it in advance or you do it along the way. In most cases, it’s a bit of both. Continuing with the business analogy, you need market research to write a business plan, but even great market research won’t allow you to write a perfect business plan. There are limits to how perfectly you can know the context and plan your strategy in advance. Good businesses adapt along the way, both in response to changing conditions but also in response to their own learning.
Good development programs do the same. At a minimum, programs rely on the skills of their staff to interpret and execute the plan. But beyond that, staff might also adjust the plan as they learn what works and what doesn’t. This might occur in response to formal program monitoring or just informal feedback loops. Sometimes log-frames are written to allow such flexibility. Often the program on the ground doesn’t reflect the log-frame at all.
The point is that there are many unknown factors which only become known over the course of a program. So the context matters principle for program implementation applies throughout the program cycle — not just in the initial design.
RCTs only allow for adaptation that happens before the implementation.
I think the discourse around RCTs misses the fact that doing good development often means adapting the program during implementation. In discussing context, Glennerster and Kremer acknowledge that lentils may be a better incentive for program participation in Rajasthani than in Boston. However, the adjustments needed are much more complicated than swapping lentils for chowder. As described above, adaptation during implementation allows practitioners to incorporate what they learn about the unknown factors in that particular context.
In RCTs, unknown factors within a trial area are something to be boxed and set aside. This isn’t a slander: the scientific genius of RCTs rests in controlling for unknown factors. If an evaluation doesn’t involve random assignment to program and control groups (i.e. it’s not an RCT), you might end up with meaningful differences between the two groups before the intervention even starts. That’s called selection bias. You might try to avoid it by making sure the two groups match one another (e.g. same mix of gender, ethnicity, income levels, etc.). But unless subjects are randomly assigned to program and control groups, there will always be the risk of unknown differences. With a large enough sample size, random assignment allows us to assume that any unknown variations in the control group match those in the program group.
So randomization allows a researcher to evaluate program impact while controlling for unknown variations in the population. However, this means that the intervention itself must be defined in advance. Each study subject must receive the same intervention. Implementing the same intervention across large program and control groups precludes program staff from figuring things out as they go. A relevant personal story: I was once on the implementation end of a program experiment (though not an RCT) in which the program was struggling in our city. We could clearly see that the target population wasn’t responding to our efforts. We held a late-night session to discuss how we could change course, and presented a new strategy to our bosses. HQ put the kibosh on our new plan. You can imagine what that did for staff morale and the quality of our efforts going forward.
The kind of adaptation that many programs use is not an option with RCTs. As mentioned above, many of those unknown factors could become known over the course of the program. Most program implementers would take advantage of this new knowledge to adjust their intervention. The fact that RCTs are unable to do this limits their applicability to testing relatively simple interventions.
Perhaps some RCTs do change the intervention mid-study. I’m not aware of any like this, but I could be wrong. Perhaps you could build such adaptation into the intervention you’re testing. As I mentioned above, programs are often designed with flexibility. But then a great deal depends on the quality of the execution, management, staff decisions, etc. Assessing such intangibles would certainly reduce the scientific rigor of an RCT.
Variations within the program area matter too.
The last point I want to make is that good programs may also adapt to variations within the program area. That is, program implementers might adapt their approach to different members of the target group based on their individual characteristics. For example, perhaps gaining acceptance of a health program in one village requires a different incentive from another, or perhaps variations in the local economy call for varied approaches for an educational or vocational program.
In other words: Context is not just about the differences between Rajasthani and Boston, but about the differences within Rajasthani.
The challenge that context poses to RCTs goes beyond concerns over external validity. The challenge goes to the very heart of how good development programs are implemented. Randomized controlled trials can show that a program works, ceteris paribus. But ceteris is never paribus.