Some friendly concerns with GiveWell

Post by Eva Vivalt.

Let me preface this by saying GiveWell was really great at bringing attention to the issue of effective donations. It really spearheaded this movement and improved the conversation about aid.

However, there are some things that could be improved:

– Their literature reviews frequently use “vote counting” when results of meta-analyses are not available.[1] What is vote counting? Suppose there are 10 studies on a subject, 3 of which find a positive, significant effect and 7 of which find no statistically significant effect. One might think those 7 studies are evidence of no effect and come to the conclusion that the intervention is not effective, or not as effective as would be suggested by the 3 studies alone. But perhaps the studies’ sample size was simply not large enough. Those 7 studies, if combined through meta-analysis, might actually provide evidence in favour of the intervention having a positive effect.

Given that a review of the literature suggests many if not most studies do not have the sample size in order to be likely to find an effect has statistical significance, this is a major flaw. Vote counting makes an error akin to “accepting the null”, when one can only either reject the null or not reject the null.

– They aren’t in a good position to evaluate studies that did not use randomization. Despite stating that they appreciate quasi-experimental methods, their Aug. 23, 2012 summary of methods of causal attribution doesn’t even include differences-in-differences or matching: It also oddly puts instrumental variables at the top of its list,[2] and it invents a new form of causal identification: “visual and informal reasoning”. Economists will be delighted to hear that no longer do they have to bother with finding a valid counterfactual – they need merely follow these steps for causal attribution:

Visual and informal reasoning. Researcher sometimes make informal arguments about the causal relationship between two variables, by e.g. using visual illustrations. An example of this: the case for VillageReach includes a chart showing that stock-outs of vaccines fell dramatically during the course of VillageReach’s program. Though no formal techniques were used to isolate the causal impact of VillageReach’s program, we felt at the time of our VillageReach evaluation that there was a relatively strong case in the combination of (a) the highly direct relationship between the “stock-outs” measure and the nature of VillageReach’s intervention (b) the extent and timing of the drop in stockouts, when juxtaposed with the timing of VillageReach’s program. (We have since tempered this conclusion.)

We sometimes find this sort of reasoning compelling, and suspect that it may be an under-utilized method of making compelling causal inferences.

– While they agree with the idea that people may wish to support different things (e.g. health, education), in the end they provide their own list of recommended organizations and interventions implicitly based on what they find important or what they assume others might find important.

In contrast, AidGrade doesn’t try to make comparisons without solid grounding. It doesn’t impart judgments about the value of a year of life versus the value of an education but focuses on specific outcome variables separately. In order to say anything about the relative value of these different outcomes, one needs a theory of well-being (read more on this here). GiveWell does look at DALYs, which is one way of aggregating health outcomes, but this doesn’t really apply to other things one might care about such as education or income. When you start by focusing on outcomes separately, you can always aggregate them up again later, and work is underway to provide a variety of tools for people to do so themselves rather than making that decision for them.

Again, I have a lot of respect for the people there, and they have been the best game in town for the past few years. AidGrade’s positioning in this space is different: it’s jumping to the end of the spectrum in terms of statistical rigor, even if that means that its statements are all couched in terms of the limits of the data.

These views are my own. (Main blog:, twitter: @evavivalt.)

[1] See, for example, their most recent (2008) review of microfinance: (accessed Jan. 3, 2013). The review contains a warning to refer to for up-to-date content, but this latter summary does not cite any impact evaluations or meta-analyses itself and if you follow the links you are led back to more old vote counting.

[2] Instrumental variables are rarely used and have generally become viewed with suspicion; their heyday was the 1980s.

Comments are closed.