> Posted by Guy Stuart, Lecturer in Public Policy, Harvard Kennedy School

Karlan and Appel’s book More Than Good Intentions is an easy, and often compelling, read.  Initially, you are drawn in by the anecdotes that start each chapter, which Karlan and Appel use to illustrate a broader development issue that they then tackle in the rest of the chapter.  But the stories that really drive the chapters forward, and make the book compelling, are the ones which start with Karlan describing how he teamed up with people with “good intentions” – people trying to deliver goods and services to poor individuals and families in developing countries, to gather data in an effort to address a problem they both want to solve.  In essence, this is the strength of the work Karlan and Appel describe in this book: For researchers to help in efforts to solve the problems confronting poor people, they have to begin with the problems they and the people who are trying to help them face, and gather evidence regarding what works and what does not work in solving those problems so that they then can act on that evidence.

This is a pretty simple message, but it is an important message, because this is not how research is necessarily done.  Professional researchers tend to want to answer a particular question that fits their research agenda or the agenda of the organization for which they work.  It may be the case that these agendas coincide with the problems facing the poor, generally, but even if that is the case, it is unlikely that the research will be specific enough to be useful, nor will the results of the research be reported in such a way that anyone can act on them.

Karlan and Appel want us to believe that Randomized Control Trials (RCTs) are an essential tool in identifying solutions to problems through research.  In fact, there is clear evidence that they think this is the only way to do research.  They continuously, and condescendingly, equate “rigor” with the use of RCTs, which begs the question: Are all the people who generate evidence about what works and what does not work without using RCTs not doing rigorous work?  Are RCTs the only way to get to the truth of the matter?

Given that Karlan and Appel devote so many chapters to microfinance, we can dig deeper into the truth of their assertion about RCTs by looking at what they have learned about microfinance through RCTs.  The first thing to note is that, for those who have been in the field a long time, many of the stories Karlan and Appel tell are not new.

For example, Karlan and Appel report that requiring microcredit borrowers to spend their loan proceeds on a particular item, say business inventory, is a bad idea because it, at the very least, makes liars of your borrowers, and, at worst, forces them to engage in deceptive practices that undermine their ability to repay.  This is not news to many people in the field, and is discussed extensively by J.D. von Pischke in Finance at the Frontier, published in 1991.  Data from Financial Diaries I have done in India, Malawi, and Kenya and data from the book Portfolios of the Poor all show that loan money is fungible, and reveal what a sisyphian task it is to try to monitor its use.

Karlan and Appel also make a big deal out of the fact that microcredit has relied heavily for many years on peer lending to manage the risk inherent in lending in an information-poor environment, one in which most microfinance clients live.  But a cursory review of the history of microfinance tells us that MFIs have not singularly relied on the group lending method to manage risk.  A case study that I wrote on Sogesol in Haiti, one that I have been teaching from for almost 10 years, documents how that MFI makes use of informal knowledge systems to lend to low-income Haitians in Port-au-Prince on an individual basis.  Another case study, this time of Compartamos’ efforts to open a branch in metropolitan Mexico City, revealed the failure of the group-lending method in reaching low-income residents of the metro area, and prompted the organization to change course and offer individual loans.  And, of course, we must not forget that the largest MFI in the world, Bank Rakyat Indonesia (BRI), has been making individual loans since the inception of its KUPEDES program in 1983.

Given how much we know about what works and what does not work in microfinance in the absence of RCTs, we are forced to ask: What insights do RCTs add? Let’s dig deeper into the discussion of peer lending.  As Karlan and Appel note, the evidence from Karlan’s RCT on group lending shows that the joint liability requirement of peer lending is not essential to risk management – in the absence of its enforcement the lender is still able to collect.  Again, those who have been in this field for some time already know joint liability is not always enforced fully, because of the real concerns of practitioners that it puts too much pressure on the borrowers.  Furthermore, as Karlan and Appel note, one of the main proponents of this requirement, Grameen Bank, dropped the practice in 2002 when they instituted their Grameen II reforms.  Did they do so in the wake of the findings from an RCT?  No, Grameen did not use an RCT to make that decision.  Nor was it facing a catastrophic cascade of losses as one group after another collapsed under the weight of the joint liability, which is something that Karlan and Appel argue is a likely consequence of the join liability requirement.   Rather, Grameen talked to its clients and its employees and worked out that individual loans in a group setting could work, and would better serve its clients.

Though this process of deciding how to change the product Grameen was offering its clients did not involve an RCT, it was evidence-based.  But was this evidence good evidence?  Did Grameen just get lucky that Grameen II worked?  Maybe, but more likely, it was able to generate an accurate picture of the needs of the clients and the potential dynamics ensuing from its product changes and on this basis made the changes.  The ability to be accurate is not a given.  First, it requires having the data, which, in this case, were based on years of experience of being in daily contact with clients.  Second, it requires that the organization is willing and able to learn from its data.  Note how these two requirements are key to what Karlan and Appel are arguing for: Have the data; and analyze and learn from it.  The only thing that is different is the nature of the data – experience versus differences in measured results for a treatment and control group.

This seems like a big difference, and in many ways it is.  But, I would argue, even here there is also commonality.  How does a researcher chose what question to answer, what solution to a problem to test, using an RCT?  There are three answers: 1) theory; 2) practice; and 3) a combination of theory and practice, coming out of a conversation between researchers and practitioners.  Karlan is interested in theory, and is especially enamored with behavioral economics, though he also gives a nod to network theory.   But he is also as interested in practice, and speaking to practitioners about the problems they are encountering in delivering goods and services to the poor and what can be done about them.  If Karlan had been hired by Grameen to work with them on reforming their microcredit operations it would have been as likely as not that the bank would have suggested to him that the joint liability requirement needed to be looked at, rather than the other way round.

So, again, what does the RCT bring to the table?  Let’s pretend that Grameen did hire Karlan to help them with their microcredit operations, what would have happened differently.  I assume that Karlan would have suggested: Before you roll out a peer-lending program without a joint liability requirement, let’s do an RCT to see whether removing the requirement does any harm.  So he would have assigned groups to a treatment group, which, in this case, would have been groups no longer bound by joint liability (the “intervention” being the release of the requirement), and a control group, those still bound by joint liability.  One assumes that he would also have insisted that those in the control group be subjected to the proper enforcement of the joint liability requirement.  Over a suitable period of time, Karlan would have observed the performance of the two sets of groups using loan performance records from the bank, and been able to tell Grameen whether removing joint liability was a problem or not.

This seems like it would give a definitive answer, in very much the same way as many of the stories Karlan and Appel tell in their book.  For one thing, those in the treatment group, including the credit officers who served them, would have known that things had changed – unlike medical trials, social science RCTs are very difficult to make “blind” in the sense that the person receiving the intervention does not know they are getting something different.  So if the data had come back showing that removing the joint liability had no effect on repayment rates, one could not conclude that it was because the removal itself was not a problem, but that the removal was not a problem in a trial setting where everyone was “on their best behavior” because they wanted it to work.  So it is likely that the organization would roll out the new process to other parts of the organization, but keep a close eye on the performance data to make sure they had not been misled by the results of the RCT.

It turns out that this approach is not that much different from what an organization might do if it was piloting a new process before rolling it out fully.  In such a scenario, the organization rolls out the new process as a pilot, knowing that people in the pilot might be on their “best behavior,” and then, if no problems emerged in the pilot, gradually roll out it out across the rest of the organization, learning from and adapting to problems as they arose.  This alternative approach does not require an external researcher to work out what is going on, just a managerial team with the capacity to collect and analyze real-time data from their operations.  Sure, a manager trained in doing RCTs would want to choose the site of the pilot randomly, which would be a sensible thing to do.

The discussion thus far has focused on the way in which RCTs might help a manager interested in tweaking their processes to improve customer service.  What about the grander claims made for RCTs?  The study that has received most attention in this regard is the Spandana study, which sought to evaluate the impact of microcredit on poverty – essentially taking on Muhammand Yunus’ claim that microcredit can alleviate poverty.  If nothing else, the study revealed the complexity of trying to evaluate societal effects using any sort of experimental methods.  That does not mean we should not try, but we should be more modest in our approach – not try to answer such a grand question such as “does microcredit alleviate poverty?” with one study.

If we look outside microfinance to a study cited by Karlan and Appel, we can see what a more modest approach might look like.  The Progresa/Oportunidades conditional cash transfer program received a lot of accolades for incorporating an evaluation process into its roll out.  The initial evaluation of the program focused on whether the cash transfers got kids to school, into health clinics, and changing their nutritional intake, rather than whether they were better educated or more healthy.  In other words, they initially focused on operational outputs.  It is only in more recent years, after considerable experience with the program, and after it had been rolled out extensively, did focus turn to the ultimate impacts of the program.  And the most recent evaluation report notes that there are real concerns that Oportunidades is not generating long-term results because of the poor quality of the Mexican educational and health systems.  One could argue that Oportunidades got ahead of the evidence, but it is also clear that without the type of engagement between the practical realities of the program and good research (which, it should be noted, was not limited to RCTs), the Mexican government would have been none-the-wiser as to how to tackle endemic poverty.

Finally, I cannot complete this review without a note on theory.  One of the things researchers bring to the table when they enter into conversations with practitioners is a theoretical perspective.  It may seem odd to say this, as researchers are often criticized for being overly theoretical and unconcerned with practical matters.  But this is more a matter of presentation than of content – my own experience in interacting with practitioners in workshops and other sorts of training programs is that practitioners truly appreciate being given new ways to think about an old problem.  What they want to understand is the relevance of the theory and how it can be used to sort through what they already know.  Karlan and Appel seem to appreciate this fact, and their use of theory in addressing the practical concerns of the people they are working with seems appropriate – at least from this distance.

Ironically for researchers, they fail to draw any theoretical conclusions from Karlan’s studies.  They offer us no coherent way to think about problems of development – or even rules of thumb that might help us tackle specific issues in different contexts.  Even their recommendations at the end seem to miss the boat in some cases.  For example, in their list of “things that work” they highlight de-worming of kids as a way to increase school attendance.  It seems to me that a more general conclusion that one can draw from the experiments the authors describe is “healthy kids are more likely to go to school,” and then a more general list of cost-effective interventions that have been proven to keep kids healthy.  If the work that Karlan’s IPA is doing is going to amount to much more than a large number of isolated experiments, he needs to show that he can draw some general conclusions that might be helpful to practitioners, at least as a starting point for how to think about a problem they are facing.

In sum, this is a highly readable book, if a little glib sometimes.  The main lesson for me was to reaffirm the importance of continuously engaging in the development process as a series of problems to be solved using concrete evidence to determine what works and what does not.  The authors want us to believe that RCTs are the only way to produce concrete evidence, but clearly their own writing suggests that there are other ways.  Furthermore, they want to scale up the use of RCTs.  It is not obvious that this should be done by expanding the work of organizations like IPA.  Rather, they should be in the business (and I believe IPA would argue it is in the business) of training people in developing countries to do this work.  I would go one step further and specify that those being trained in these methods in these countries should be mid-level managers in development organizations, who should be inculcated in the ideas of sound data collection and in the efficacy of a learning organization that is not afraid to test its ideas.  Furthermore, I would recommend that individual and institutional donors look for organizations that are continuously engaged in working to solve the problems faced by the poor in developing countries, and are set up to learn from their mistakes.