Randomization with learning is better than haphazard implementation without learning.

RCT proponents use variations on this refrain to defend against ethical questions about randomization. While I’m sure there are people who are troubled by the very thought of randomization, I’ve never met one of them. Most practitioners I know have larger questions about the efficacy and politics of RCTs than about the ethics. The ethical issue is a bit of a straw man: I see the defense far more than I see the critique. I’m getting tired of it, so I’ve decided to actually make the critique — but from the other direction.

In its most common form, the defense rests on an implicit assumption that the program (intervention, distribution, whatever) was going to be conducted in a haphazard manner among a larger population (people, households, whatever) than it could possibly serve. So why not spend a little bit of the funds to randomize and study the impact? The ethical tradeoff is a few people unserved, but massive learning that could lead to wiser program decisions in the future. In most cases, that’s a net positive.

Proponents often put a cap on the argument like so:

Let me be blunt: This is the way the Heifers of the world fool themselves. When you give stuff to some people and not to others, you are still experimenting in the world. You are still flipping a coin to decide who you help and who you don’t, it’s just an imaginary one.

Or like so:

That’s why we do experiments – because we want to know the best way to help the poor. Don’t Heifer International’s donors want to know the same?


But what if we remove the implicit assumption? What if we reframe the ethical comparison from “haphazard selection into the program” and instead recognize that NGOs and service providers often have very specific reasons for working with the individuals they do? Instead of spending funds on a randomization study, they take steps to target their work better: at those who need it most, or who stand the greatest chance of benefiting, or who can be served the most cost effectively, or who have been identified by the community, or who simply have to be served for some broader political or cultural reason.

This is what it means to respond to context in programs, and it might be at odds with randomization.

Despite accusations that they’re evidence-less or careless in this targeting, many NGO staff and donors incorporate the latest academic research into their program designs. Are RCT proponents suggesting that implementers should be less rigorous in their program design and targeting so that the pool remains large enough to have people left out, and then randomize?

If you use a well-targeted program as your ethical comparison — rather than lazily implying haphazard administration as your baseline — then it’s no longer merely a question of learning-or-not. It becomes a question of how-much-impact. And that’s an ethical question worth arguing over.

  1. I’m not sure I buy this argument. I think it rests on the assumption that NGOs are in fact able to know very precisely who the most needy are. If yes, the organization can say, we are able to exactly rank people in poorness from #1-#1,000 but only have funds for 500 people, then by all means, only give it to the 500 poorest. However, if you’re more or less sure of the 1,000 or so neediest people, but only have a vague idea for a large segment who is actually more deserving, by choosing not to evaluate you absolutely are “experimenting with your eyes closed.” [Full disclosure: I work at IPA, so while these views are solely my own, I’m perhaps unlikely to be an unbiased source regarding the value of evaluation].

    For instance, I worked on an RCT of a CGAP Graduation-style project in one of the poorest counties of Kenya. The average income per household member per day amounted to something like 30 cents. Furthermore, the organization only had funds to support approximately 600 individuals every six months. Yes, we could have looked very carefully, and chosen to give funds to those living on 25 cents a day first, then when we finished with them, given the funds to those who lived on 30 cents, and then finally, given money to those living on 35 cents. However, in this case, (and for many other programs) I don’t think we can say with certainty who is poorer or more deserving of entry into our program. It isn’t because we are “evidence-less or careless” in our efforts, simply that even the best tools of targeting can only take you so far. I think the benefits of learning about whether the program works, and why, far outweigh the benefits of choosing first those we think are most fitting at a very marginal level, and sacrificing the chance to have a quality evaluation.

    Furthermore, there are ways to still offer the program to everyone and to conduct an RCT that gives you critical knowledge. In our case, we chose to do a phase-in, where we randomly determined the time that people would enter the program. We recognized that our tools for targeting were imperfect, and used that limitation as a chance to learn.

    And while perhaps more generally, I absolutely think the ethical argument against RCTs is made very regularly. In fact, in the Heffer & GiveDirectly story you reference, that exact argument is made by Heffer. The person in the interview explains:

    “I mean, it sounds like an experiment, and we’re not about experiments. These are lives of real people and we have to do what we believe is correct. We can’t make experiments with peoples’ lives. They’re just — they’re people. It’s too important.”

    I’ve personally heard the argument made on many occasions: “We don’t have time to do things like this, we KNOW what’s working, it would be unethical to help people we know we can be helping.”

    I guess I would conclude that overall, like in anything else, there are trade-offs. For the few NGOs with such fantastic levels of targeting that they can tell you with confident the level of deservedness of everyone they are considering, yes, by all means, choose the most deserving. However, in cases where there is uncertainly about who is most deserving (which I would argue is the majority of cases), there is a real value to expanding your targeting, and learning through the process.


  2. Nate, good point about the reliability of targeting. I agree that there are tradeoffs: more resources spent on targeting will have diminishing marginal returns in terms of program impact in any circumstance, but the shape of that curve will vary widely. Though on the flip side, if an intervention has been studied before, more resources spent on a new RCT will have diminishing marginal returns on the new knowledge generated. Where those two curves meet will be different in every circumstance. It doesn’t take “very precise” or “fantastic levels” of targeting to make the targeting efforts worthwhile — just enough information to make a better decision in that particular case. And it is very case-specific. There’s no single answer. We need to recognize those tradeoffs, which the standard ethical defense of RCTs fails to do.

    On the Heifer representative: I find it funny how all the defenders of RCTs in the blogosphere reference that piece of her quote, but leave out the next part. She goes on to critique the ability of survey data to accurately capture program impacts. It’s not wonderfully articulated, but it suggests to me that she’s making more the epistemological argument about RCT efficacy (“do RCTs actually give us the knowledge they claim to?”) rather than the purely ethical one (“experimenting is always wrong”). Even the argument you paraphrase is more epistemological than ethical (“…we know we can be helping”).

    Again, it’s trade-offs: if you’re a believer that RCTs are the one true way to know more about development impact, then any amount of reduced programmatic impact in the short-term is worth the long-term gain. But if you’re a skeptic about their ability to help us better understand the world, then the reduced immediate impact might not be worth it. That’s where it becomes an ethical question, and one that can’t be answered universally or in generality. The proponents of RCTs (and for that matter, of UCTs) go too far when they suggest that it can be.


    1. Right, I think I agree with most of your points. As you mention, every case is specific, so arguing in the abstract about the trade-off between more precise targeting and evaluating probably isn’t particularly fruitful. However, I think that in general, in spite of the vast number of RCTs out there, a very significant number of prominent ideas in development have not been adequately tested. In the case of Heifer for instance, it is my belief that yes, giving people cows and training will in most cases lead to an increase in income. However, I don’t think we have an especially good understanding of the value of the intervention relative to the cost. In this case, I don’t think it would be especially difficult to find twice as many people worthy of cows in Western and Nyanza, and to conduct an RCT to evaluate the benefit versus the cost. Again of course, every context is different, but I think that the number of interventions that have been studied to the point where the marginal benefit of learning is exceeded by the marginal cost of less precise targeting is a very, very small segment of development interventions.

      With regard to the Heifer quote, I am one of the many guilty people who focused on the first part rather than the second. Since it’s such a strong example of the straw man argument you mention, it’s hard not to immediately focus on that.However, the second piece probably does deserve a critical look. She talks about a household who had been extremely empowered as a result of the program, and that such a change cannot be measured. Here again, I would disagree. Empowerment is a worthwhile end in itself, and to the extent that we care most about the program participant’s feelings inside, perhaps we cannot measure that.

      However, I think that most of us also hope that empowerment leads to some sort of change in behavior or action. For instance, does the wife have more control over family resources? Is she more likely to feel comfortable traveling to the nearby village to buy goods without her husband’s permission? Are female children in the household more likely to attend school now? Admittedly, these are tougher to measure, and I’m not sure randomistas have fully mastered how to evaluate more intangible changes of this sort. However, I think her argument ought to be a call for impact evaluators to improve their means of measuring these issues, rather than a reason to reject evaluation altogether.

      Overall, I do concede that the hypothetical case can exist where the reduced programmatic impact would outweigh the cost of learning through an RCT. However, I think in practice, there are many interventions where the number of deserving people, and moreover, the number of more-or-less equally deserving people, far outweighs the amount of people that can be beneficiaries. (In fact, I might go so far as to say I’m not sure I’ve ever seen a program where there were more resources available than the number of deserving program participants). In all programs, yes, the resources needed for the evaluation should absolutely be weighed against the opportunity cost in terms of improved targeting or having more beneficiaries. The person who ignores this tradeoff is equally naive as the person who doesn’t think it’s ethical to randomize. However, I think the number of cases (and again, I know I’m guilty of arguing in the abstract) where people fail to consider the costs of evaluating is far exceeded by the number of people who don’t consider the possibility of an RCT or evaluation. There is a balance that needs to be reached, but I think the current balance still tips far too heavily towards those who do not evaluate.


  3. I work in M&E for a large international NGO, and I have to say I don’t recognize what you say about how precisely interventions are targeted. In my experience it’s more common for the decision over where to implement a project to be based on existing relationships with communities, on government or donor priorities, or on untested assumptions about the best targeting strategy. It’s very unusual for a project to serve all of those who could potentially benefit from it: even in cases where (say) a whole district is covered by a project, who’s to say that people in the next district couldn’t also benefit?

    I know that there are *some* cases where targeting is very accurate and where the whole potential beneficiary population is being served. I understand that may be true in general in your organization. However, I’ve seen plenty of cases where “haphazard” does seem the appropriate word, and even some cases where random selection would seem to be an improvement over what was actually done – as well as giving us the opportunity to learn something.


    1. I never said NGOs were getting it right all the time… Though I would argue that relationships with communities and government priorities can be very good targeting justifications.

      Also, from a purely utilitarian standpoint, serving all those who could *potentially* benefit is not terribly compelling. Maximizing the overall benefit might rather involve serving those who would benefit most.


      1. I just think we rarely know who those people that could benefit most are.

  4. Dave, you’ve probably seen this but just in case, ‘Context matters for size’ http://www.cgdev.org/sites/default/files/context-matters-for-size_0.pdf and a businessweek article on the study :http://buswk.co/163wAUg


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: