Development that Works: RCTs (Methodology)

Written by  //  December 14, 2011  //  Economic & Social Policy  //  2 Comments

For my first set of posts for Critical Twenties (thank you to Suhrith and Arghya for bringing me on board), I thought it apposite to write about a subject that I have been fortunate to be associated with: randomized controlled trials in development economics. From press in The Economist to two recent books, Poor Economics and More Than Good Intentions, randomized controlled trials, or RCTs, have carved out a large niche in the development economics academia. The work of people like Abhijit Banerjee and Esther Duflo at MIT’s Jameel Poverty Action Lab, much of it conducted in India, has gone a long way towards helping us understand what “works” in development. In this first post, I outline the methodology and motivation for RCTs. In a follow-up post, I shall look at RCTs in practice and some substantive findings coming out of recent experiments in India. In the future, I also hope to address some of the issues with RCTs and ultimately to provide a holistic sense of their uses and limitations specific to development in India. Readers are asked to bear with this methodology-heavy post; while it may not be of as much interest as the substantive findings in subsequent posts, it is critical to understanding how we arrive at such results.

RCTs are borne out of the innate desire to establish causality, necessary to any form of positivist social science. A lot of money is pumped into development and aid, but it is still unclear as to how these funds could be best utilized. With funds being limited in many cases, it is imperative that aid interventions be evaluated rigorously. However, such evaluation is not always easy. Even blockbuster projects like Jeff Sachs’ Millennium Villages Project have been heavily criticized for failing to incorporate appropriate evaluation mechanisms in the project design.

The motivation behind conducting a randomized controlled trial is a simple one: to isolate the effect of a particular policy intervention. The scope of these interventions is wide-ranging, from the provision of a commitment contract to help smokers quit smoking, to the provision of microcredit to low income families. Each of these interventions seeks to find a causal explanation for the effects of a certain policy intervention, or treatment i.e. to answer the question “what is the difference in outcomes for a subject that receives a certain treatment X as opposed to the outcome for the same subject who does not receive said treatment X?” in the formulation of economists Joshua Angrist and Jorn-Steffan Pitschke. The standard approach involving observational data typically involves a multiple regression, in which various controls are included along with the independent variable of interest: thus, to find the impact of the stimulus package on total employment, one might run a regression with employment as the dependent variable and the stimulus amount as the key independent variable, along with controls for say cyclical employment fluctuations, other exogenous shocks, etc. One problem that readily becomes apparent is: Are all the unobservables  controlled for here? Despite including a whole host of controls, researchers may miss certain controls that have potentially significant confounding effects on the causal relationship between the stimulus package and employment. Given the sheer multitude of unobservables that may be present (along with data limitations), it is entirely plausible that certain controls will be left out. This worry compromises the internal validity (i.e. the solidity of the context-specific findings) of a non-experimental study, casting doubt on the causal relationships identified.

To borrow the parlance of medical trials, one can think of subjects receiving a policy intervention as the “treatment” subjects and subjects not receiving the policy intervention as the “control” subjects. The controls provide one with a counterfactual scenario; without this counterfactual, one cannot be certain that treatment and control groups are not exhibiting different outcomes by virtue of factors that are unrelated to the actual treatment/policy being enacted. In the context of the stimulus example, researchers need to ask if the effects on employment would have been observed in the absence of a stimulus package as well, a question that calls for a comparison control group.

In order to evaluate the effects of a policy intervention/treatment in a rigorous manner, experimentalists advocate conducting targeted policy experiments. Such experiments usually involve the identification of a target population, followed by a representative sample within said population. Subjects receiving the policy intervention (the “treatment” group) are then randomly chosen from within the sample, with the non-treated sample subjects forming the control group. Randomization ensures that unobservables are controlled for ex ante, since a treatment subject i is the same as a control subject j in expectation. Although one cannot be sure of what all the unobservables are, randomization ensures that the unobservables are balanced- in theory- between the two groups. It is our best approximation of including a countably infinite number of control variables in one’s regression. The difference in outcomes between the treatment and control groups is known, in formal terms, as the Local Average Treatment Effect, or LATE. (As a short aside: The central role played by randomization here prompted venerable development economist Angus Deaton to dub practitioners of such experiments as “randomistas”. In response to the critiques of Deaton and others, fellow economist Guido Imbens was moved to write a rebuttal with the title “Better LATE Than Nothing“.)

By controlling for multiple unobservables, RCTs thus give one a tremendous amount of internal validity in one’s studies: the randomization ensures that the parameter of interest (the policy intervention, or “treatment”) is the only variable to vary across the two groups, allowing one to isolate the context-specific effects of a particular policy intervention.

Consider the following illustrative example, an experiment conducted by economists Dean Karlan and Jonathan Zinman on expanded microenterprise credit access in Manila. In partnership with a lender in Manila, the researchers identified a pool of marginal creditworthy applicants; owing to lender constraints, only a limited number of applicants could be provided with a loan. The treatment group, comprising recipients of loans, was randomly chosen and the average impact of the loans was compared to the average impact for the control group of those for whom a loan could not be provided. The researchers found that “the canonical case for microcredit- that access increases profits, business scale, and household consumption- is not supported on average…in all, [the] results suggest that microcredit may work broadly through risk management and investment at the household level, rather than directly through the targeted businesses.” A similar methodology was used by another set of researchers in looking at the impact of expanded loan access for marginal applicants in Hyderabad.

As a concluding note, it should be emphasized that RCTs will not, nor were they ever intended to, provide us with a silver bullet or an all-encompassing prescription to “end” poverty. Furthermore, it is clear that RCTs cannot be used to answer every question: using an example from above, one would be hard pressed to run a stimulus experiment. However, one could argue that for those questions that RCTs are equipped to answer, they do a good job of giving us a handle on context-specific causality, as with the case of expanded loans access in amongst marginal applicants in Manila. The hope is that by building our knowledge of what works in small steps, from specific policies in specific contexts to broader policies in broader contexts (see Innovations for Poverty Action’s list of “Proven Impact Initiatives”, initiatives that research has shown to be broadly applicable across multiple contexts in a cost-efficient way), we can work towards a better approach to development. Issues persist: how does one generalize from a specific context to broader applicability? How confident can one be of one’s LATE estimates? Can such incremental steps ever translate into change on a grand scale? These are issues that I hope to work through in subsequent, hopefully less dry, posts.

Watch this space!
——————————————————————————————-
In the interests of full disclosure: Dean Karlan (co-author of More Than Good Intentions and the aforementioned Manila study) is one of the professors that I work for and founder of Innovations for Poverty Action, the organization that currently employs me. However, views expressed in this, and all posts to come, are mine alone.

About the Author

Rohit completed a BA in economics and philosophy from a university in Midwest, USA. He subsequently returned for an MA in political science before moving on to his current position as a data monkey for development economics professors in the greater New York area. When he isn't plugging away at Stata, or downing a pint of some fine craft brew while taking in a cricket match, he can be found on twitter as @Noompa.

View all posts by

2 Comments on "Development that Works: RCTs (Methodology)"

  1. Arghya December 17, 2011 at 11:58 am · Reply

    Hi Rohit,

    Great article and thanks for demystifying the idea of a randomised control trial for types like me who keep hearing the words without knowing what they mean. I have an intuitive problem though with some such trials and the backing that they provide for decisions to be made. I think trials which are the foundation of what is now known as evidence-based policy-making can often act as shrouds to push through certain ideological points of view, backed by seemingly ‘objective’ data. Now this may not always be the case but looking at the way governments keep bandying data around, as if they are gospel truths and seeing that they do enjoy a fair degree of unquestioning credibility, makes me worried. I think if a political judgment is taken, we should know it is a political judgment and be able to question it on that basis rather than coming up against a brick wall of inaccessible data. Perhaps, and this is my question after this long ranting detour, we should limit such trials and by extension this type of data collection to aid decision-making, to certain kinds of decisions and not others? Not sure where the distinction can be drawn currently and what a reasonable classification would be, but I intuitively feel that in certain situations where RCTs are used currently, intuition may be better than argument.
    Thanks, Arghya.

  2. Rohit December 18, 2011 at 12:19 pm · Reply

    Thanks Arghya, those are all very important points. I absolutely agree that RCTs have their niche and that they cannot answer every question. Indeed, one of the objections leveled at RCTs is that ultimately, the questions that they CAN answer are far too narrow when one adds in the various caveats. Ultimately, they help us answer a specific policy question in a specific context, that is all. However, in so far as the development project is a large, heterogeneous one, well-conducted RCTs can form a very useful part of the overall program.

    RCTs currently are either motivated by specific questions that researchers/academics have, or policies that NGOs and aid groups would like to get some rigorous evidence on. Based on my experience, while government-assistance and compliance is a crucial part of facilitating RCTs, they are not usually the originators of the research. However, they certainly form a large part of the target audience, so it is essential that we realize the uses and limitations in presenting our data.

    In future posts, I hope to work through these issues a bit more, so keep the questions coming!

Leave a Comment

comm comm comm