Category Archives: Uncategorized

Do randomized controlled trials engage in less specification searching?

Originally posted on Eva Vivalt’s blog.

An excerpt from on-going work, working from AidGrade’s database of impact evaluation results in development economics.

These are results from caliper tests which essentially compare the number of results just above a critical threshold (t=1.96) with those just below a critical threshold. You can vary the width of the band; for example, a 5% caliper would look at the range 1.862 – 2.058. If you see a jump at 1.96, you might suspect specification searching is going on, in which researchers only report the results they like, biasing the results.

   Over     Under   p-value   * 
All studies        
2.5% Caliper 45 26 0.02 <0.05
5% Caliper 73 51 0.03 <0.05
10% Caliper 127 117 0.28  
15% Caliper 182 185 0.58  
20% Caliper 220 231 0.71  
2.5% Caliper 24 14 0.07 <0.10
5% Caliper 35 28 0.22  
10% Caliper 64 68 0.67  
15% Caliper 97 107 0.78  
20% Caliper 119 134 0.84  
2.5% Caliper 21 12 0.08 <0.10
5% Caliper 38 23 0.04 <0.05
10% Caliper 63 49 0.11  
15% Caliper 85 78 0.32  
20% Caliper 101 97 0.42

Okay, there seems to be a jump. Possibly more among quasi-experimental studies than among RCTs.

Overall, though, this jump is actually quite small. Gerber and Malhotra did the same kinds of tests for political science and sociology. They used different selection criteria when gathering their papers, essentially maximizing the probability they would see a jump, but take a look at their numbers:

Political science:

   Over     Under   * 
A. APSR      
Vol. 89-101      
10% Caliper 49 15 <0.001
15% Caliper 67 23 <0.001
20% Caliper 83 33 <0.001
Vol. 96-101      
10% Caliper 36 11 <0.001
15% Caliper 46 17 <0.001
20% Caliper 55 21 <0.001
Vol. 89-95      
10% Caliper 13 4 0.02
15% Caliper 28 12 0.008
20% Caliper 21 6 0.003
B. AJPS      
Vol. 39-51      
10% Caliper 90 38 <0.001
15% Caliper 128 66 <0.001
20% Caliper 165 95 <0.001
Vol. 46-51      
10% Caliper 56 25 <0.001
15% Caliper 80 45 0.001
20% Caliper 105 66 0.002
Vol. 39-45      
10% Caliper 34 13 0.002
15% Caliper 48 21 <0.001
20% Caliper 60 29 <0.001


   Over     Under   * 
ASR (Vols. 68-70)      
5% Caliper 15 4 0.01
10% Caliper 26 15 0.06
15% Caliper 47 17 <0.001
20% Caliper 54 19 <0.001
ASJ (Vols. 109-111)      
5% Caliper 16 4 0.006
10% Caliper 25 11 0.01
15% Caliper 41 14 <0.001
20% Caliper 48 18 <0.001
TSQ (Vols. 44-46)      
5% Caliper 13 4 0.02
10% Caliper 22 7 0.004
15% Caliper 26 11 0.01
20% Caliper 30 20 0.1
Combined (recent vols.)      
5% Caliper 44 12 <0.001
10% Caliper 73 33 <0.001
15% Caliper 114 42 <0.001
20% Caliper 132 57 <0.001
ASR (Vols. 58-60)      
5% Caliper 17 2 <0.001
10% Caliper 22 5 <0.001
15% Caliper 27 11 0.007
20% Caliper 30 15 0.02

Wow! Economics is not doing so badly after all! (Some public health papers are also included, but results are comparable if you break it down.) To match Gerber and Malhotra, these are all reporting number of results rather than number of papers, and sometimes papers report more than one result, so there are some subtleties here that I get into in the longer working paper. Data are still being gathered, and there is much more to be said on this topic. If you’d like to see more of this kind of work on research credibility, please support us in the last few days of our Indiegogo campaign!

Want to make a difference in development?

Update: please see revised deadline of January 10, 2014.

Do you know someone interested in development? It’s that time of year again – with some people moving on to bigger and better things, we are looking for new analysts and interns interested in research and public outreach.

As a very small organization, you will find you have a lot of autonomy working with us. While everyone here wears multiple hats, we are particularly looking for people to work in the two following areas:

Research: One of the main components of conducting a meta-analysis is reading through academic papers and coding up different characteristics of the papers. You would likely learn this process and progress to reconciling the coding work that others have done. Depending on your skills, you might also be involved in some analysis and writing papers.

Publicity: We also have a need to improve our social media presence and publicity and are seeking a social media intern and a director of development to improve our outreach, manage donor relations, and lead our fundraising activities.

There is a preference for applicants in Washington, DC, New York, or San Francisco, but work can quite successfully be done remotely. We have internships available in both these categories for people who would like to learn more but have less experience. To apply, please send CV and letter of interest indicating the role(s) for which you would like to be considered to by January 10. Early application is encouraged.

What do we know in development? What don’t we know?

Post by Eva Vivalt, @evavivalt.

AidGrade is starting another crowdfunding effort. You might wonder: why now? You’re not new so why should I support you?

There have been a lot of interesting things coming out of this work. Dare I say, more interesting than when we launched last year.

For one, we are looking more closely at the issue of generalizability. How much can you generalize from an impact evaluation’s results and are there any factors that improve your ability to make out-of-sample predictions?

This is a huge, important topic, and without having painstakingly collected the data from hundreds of impact evaluations, I’m not sure how you could answer it. Preliminary working paper which contains errors here. By adding impact evaluations to the data set, we’ll be able to say a lot more.

We’re expanding our previous analyses to look more closely at study quality and will put out white papers on each of the topics.

And we are looking at the issue of specification searching and publication bias. So far, it seems like it’s not as bad as you would think.

So, yes, we’re asking for more money. It’s because we’re doing a whole lot more than we did when we started (even more at the link). And frankly, it’s not a lot of money relative to the importance of the questions answered, and you will not find such a cost-effective group elsewhere.

A lot of people seem to like to focus on what we know. I’d like to focus for a second on what we don’t know. Particularly on the big, important questions which we can, with your help, answer.

Individual charities

Post by Eva Vivalt, @evavivalt.

We’ve stayed away from recommending specific charities. AidGrade gathers data from impact evaluations and analyzes that data, focusing more broadly on the effects of different types of programs and how they vary across contexts. Very few NGOs do any kind of impact evaluation, so we’re naturally a bit distant from that. Further, other organizations already have a lot to say about individual charities. Whenever someone asked me about how to best contribute to the relief efforts in the Philippines, for example, I would recommend they check out this old GiveWell post.

But people keep asking and asking for ways they can help, so we’re now letting you click through the “donate” links under “Examine a Program” and “Compare Programs by Outcome” (under the “Donors” tab), linking every type of intervention to a specific charity.

The links aren’t exact. How would you go about donating to a conditional cash transfer program, for example, when they are typically run by governments? You might think that some kinds of child sponsorship programs or scholarship programs are in effect conditional cash transfers, but there remain some similarities. In the absence of a good match, AidGrade is instead directing those interested in conditional cash transfer programs to GiveDirectly, which provides unconditional cash transfers.

With this caveat, how did we come to decide which organizations to feature for each intervention?

We followed several rules of thumb:

1. It should be an organization that does the work itself instead of being largely focused on advocacy. Advocacy work can be highly important, so we may revisit this in the future, but the concern is that it’s very hard to measure when advocacy is having a real effect and our findings are on programs themselves and not on advocacy for programs. It’s simply a closer match.

2. It should be a one-program organization as much as possible, to avoid the fungibility problem whereby if an organization does 10 things, and you donate to support 1 of them, they end up redirecting funds to support the other 9.

3. Where possible, go with the clear frontrunners. For example, regarding insecticide-treated bed nets to prevent malaria, the Against Malaria Foundation is one of the top charities of any kind recommended by both GiveWell and Giving What We Can. We look at their recommendations for charities that focus on a particular type of program.

4. All else equal, the organization should care about evaluation, like Evidence Action does.

These rules of thumb can sometimes conflict, but we adhered as closely as we could. The image attached to this post is misleading: nothing is so pristine as the straight road pictured. Undoubtedly, some of our matches are inexact, but it’s a step in the right direction. Bear in mind, since we are trying to link all the programs to specific NGOs, we are even providing links for those programs which don’t seem to be the most effective at achieving a particular goal.

That said, here are the matches we came up with. Would you suggest anything different?

Bed nets: Against Malaria Foundation
Conditional cash transfers: GiveDirectly
Deworming: Evidence Action: DeWorm the World
Improved cookstoves: Global Village Energy Partnership
Microfinance: Kiva
Safe Water Storage: Evidence Action: Dispensers for Safe Water
Scholarships: Pratham
School Meals: World Food Programme
Unconditional cash transfers: GiveDirectly
Water Treatment: Evidence Action: Dispensers for Safe Water

Coming soon: new topics!

Progress report

We are making good progress on the next set of topics and hope to release them one by one over the fall. This set of topics includes:

Contract teachers
Financial literacy
HIV education
Micronutrient supplementation
Micro-health insurance
Mobile phones
Performance pay
Rural electrification
Women’s empowerment programs

Several of these topics actually comprise more than one intervention that are separately grouped (for example, different kinds of micronutrient supplementation). So far, eight of the topics have been fully coded and reconciled.

Also, remember to try our new and improved meta-analysis app! You can now view results by paper and download them. Special thanks to our web development genius Alex Robson!