This is the accessible text file for GAO report number GAO-12-673 entitled 'President's Emergency Plan For Aids Relief: Agencies Can Enhance Evaluation Quality, Planning, and Dissemination' which was released on May 31, 2012. This text file was formatted by the U.S. Government Accountability Office (GAO) to be accessible to users with visual impairments, as part of a longer term project to improve GAO products' accessibility. Every attempt has been made to maintain the structural and data integrity of the original printed product. Accessibility features, such as text descriptions of tables, consecutively numbered footnotes placed at the end of the file, and the text of agency comment letters, are provided but may not exactly duplicate the presentation or format of the printed version. The portable document format (PDF) file is an exact electronic replica of the printed version. We welcome your feedback. Please E-mail your comments regarding the contents or accessibility features of this document to Webmaster@gao.gov. This is a work of the U.S. government and is not subject to copyright protection in the United States. It may be reproduced and distributed in its entirety without further permission from GAO. Because this work may contain copyrighted images or other material, permission from the copyright holder may be necessary if you wish to reproduce this material separately. United States Government Accountability Office: GAO: Report to Congressional Committees: May 2012: President's Emergency Plan For Aids Relief: Agencies Can Enhance Evaluation Quality, Planning, and Dissemination: GAO-12-673: GAO Highlights: Highlights of GAO-12-673, a report to congressional committees. Why GAO Did This Study: Why GAO Did This Study PEPFAR, reauthorized by Congress in fiscal year 2008, supports HIV/AIDS prevention, treatment, and care overseas. The reauthorizing legislation, as well as other U.S. law and government policy, stresses the importance of evaluation for improving program performance, strengthening accountability, and informing decision making. OGAC leads the PEPFAR effort by providing funding and guidance to implementing agencies, primarily CDC and USAID. Responding to legislative mandates, GAO (1) identified PEPFAR evaluation activities and examined the extent to which evaluation findings, conclusions, and recommendations were supported and (2) examined the extent to which PEPFAR policies and procedures adhere to established general evaluation principles. GAO reviewed these principles as well as agencies’ policies and guidance; surveyed CDC and USAID officials in 31 PEFAR countries and 3 regions; and analyzed evaluations provided by OGAC, CDC, and USAID. What GAO Found: The Department of State’s (State) Office of the U.S. Global AIDS Coordinator (OGAC), the Department of Health and Human Services’ (HHS) Centers for Disease Control and Prevention (CDC), and the U.S. Agency for International Development (USAID) have evaluated a wide variety of President’s Emergency Plan for AIDS Relief (PEPFAR) program activities, demonstrating a clear commitment to evaluation. However, GAO found that the findings, conclusions, and recommendations were not fully supported in many PEPFAR evaluations. Agency officials provided nearly 500 evaluations addressing activities ongoing in fiscal years 2008 through 2010 in all program areas relating to HIV/AIDS treatment, prevention, and care. GAO’s assessment of a selected sample of seven OGAC-managed evaluations found that they generally adhered to common evaluation standards, as did most of a selected sample of 15 evaluations managed by CDC and USAID headquarters. Based on this assessment, GAO determined that these evaluations generally contained fully supported findings, conclusions, and recommendations. However, based on a similar assessment of a randomly selected sample taken from 436 evaluations provided by PEPFAR country and regional teams, GAO estimated that 41 percent contained fully supported findings, conclusions, and recommendations, while 44 percent contained partial support and 15 percent were not supported. Table: Extent to Which Findings, Conclusions, and Recommendations Were Supported in Selected Evaluations: Source of PEPFAR evaluation (number assessed): OGAC-managed evaluations (7 total); Fully supported: 7; Partially supported: 0; Not supported: 0. Source of PEPFAR evaluation (number assessed): CDC and USAID headquarters evaluations (15 total); Fully supported: 9; Partially supported: 6; Not supported: 0. Source of PEPFAR evaluation (number assessed): Country and regional team evaluations (436 total)[A]; Fully supported: 179 (41 percent); Partially supported: 190 (44 percent); Not supported: 67 (15 percent). Source: GAO analysis. [A] Numbers and percentages reported in this row are estimates based on analysis of 78 evaluations randomly selected from the 436 total. The margin of error associated with proportion estimates is no more than plus or minus 11 percentage points at the 95 percent level of confidence. The margin of error for totals is not more than 44 evaluations. [End of table] State, OGAC, CDC, and USAID have established detailed evaluation policies, as recommended by the American Evaluation Association (AEA). However, PEPFAR does not fully adhere to AEA principles relating to evaluation planning, independence and qualifications of evaluators, and public dissemination of evaluation results. Specifically, OGAC does not require country and regional teams to include evaluation plans in their annual operational plans, limiting its ability to ensure that evaluation resources are appropriately targeted. Further, although OGAC, CDC, and USAID evaluation policies and procedures provide some guidance on how to ensure evaluator independence and qualifications, they do not require documentation of these issues. GAO found that most PEPFAR program evaluations did not fully address whether evaluators had conflicts of interest and some did not include detailed information on the identity and makeup of evaluation teams. Finally, although OGAC, CDC, and USAID use a variety of means to share evaluation findings, not all evaluation reports are available online, limiting their accessibility to the public and their usefulness for PEPFAR decision makers, program managers, and other stakeholders. What GAO Recommends: GAO recommends that State work with CDC and USAID to (1) improve adherence to common evaluation standards, (2) develop PEPFAR evaluation plans, (3) provide guidance for assessing and documenting evaluators’ independence and qualifications, and (4) increase online accessibility of evaluation results. Commenting jointly with HHS’s CDC and USAID, State agreed with these recommendations and noted steps it will take to implement them. View [hyperlink, http://www.gao.gov/products/GAO-12-673]. For more information, contact David Gootnick at (202) 512-3149 or gootnickd@gao.gov. [End of section] Contents: Letter: Background: PEPFAR Agencies Have Evaluated a Broad Range of PEPFAR Programs, but Results Are Not Fully Supported in Many Evaluations: PEPFAR Policies and Procedures Do Not Fully Adhere to AEA Evaluation Principles Relating to Planning, Independence, and Dissemination: Conclusions: Recommendations for Executive Action: Agency Comments: Appendix I: Objectives, Scope, and Methodology: Appendix II: GAO Evaluation Definitions and Standards: Appendix III: Statistical Comparison of PEPFAR Evaluations: Appendix IV: Comments from the Department of State: Appendix V: GAO Contact and Staff Acknowledgments: Related GAO Products: Tables: Table 1: Level of Support for Evaluation Findings, Conclusions, and Recommendations in Selected OGAC-Managed Public Health Evaluations, as Indicated by Adherence to Common Evaluation Standards: Table 2: Level of Support for Evaluation Findings, Conclusions, and Recommendations in Selected CDC and USAID Headquarters-Managed Evaluations, as Indicated by Adherence to Common Evaluation Standards: Table 3: Estimated Extent to Which Country and Regional Teams' Evaluations Contained Fully Supported Findings, Conclusions, and Recommendations, as Measured by Adherence to Common Evaluation Standards: Table 4: PEPFAR Program Area Descriptions: Table 5: Questions Included in the GAO Evaluation Assessment Tool: Table 6: Statistical Analysis of Support for Findings in CDC and USAID Evaluations, by Agency, Methods Used, and Type of Evaluation: Table 7: Odds Ratios from Logistic Regression Models, Where Support for Findings Was Regressed on Agency, Methods Used, and Type of Evaluation: Figures: Figure 1: Approved Funding for PEPFAR Prevention, Treatment, Care, and Other Program Areas, Fiscal Years 2009 through 2011: Figure 2: Program Evaluations Provided by PEPFAR Country and Regional Teams, by PEPFAR Program Area: Abbreviations: 2008 Leadership Act: Tom Lantos and Henry J. Hyde United States Global Leadership Against HIV/AIDS, Tuberculosis, and Malaria Reauthorization Act of 2008: AEA: American Evaluation Association: ARV: antiretroviral: DEC: Development Experience Clearinghouse: DGHA: Division of Global HIV/AIDS: HHS: Department of Health and Human Services: OHA: Office of HIV/AIDS: OGAC: Office of the U.S. Global AIDS Coordinator: PEPFAR: President's Emergency Plan for AIDS Relief: CDC: Centers for Disease Control and Prevention: PHE: public health evaluation: State: Department of State: USAID: U.S. Agency for International Development: [End of section] United States Government Accountability Office: Washington, DC 20548: May 31, 2012: Congressional Committees: Through the multibillion-dollar President's Emergency Plan for AIDS Relief (PEPFAR), the United States has supported significant advances in global HIV/AIDS prevention, treatment, and care. Since the program was first authorized in 2003, the estimated number of new HIV infections and AIDS-related deaths has steadily declined while millions of people in low-and middle-income countries have received antiretroviral treatment. Yet for every person placed on treatment, an estimated two people are newly infected with HIV, and the number of people living with HIV expanded from about 28 million in 2001 to 34 million in 2010. Congress reauthorized PEPFAR in 2008 through passage of the Tom Lantos and Henry J. Hyde United States Global Leadership Against HIV/AIDS, Tuberculosis, and Malaria Reauthorization Act of 2008 (2008 Leadership Act),[Footnote 1] which sets multiyear targets for prevention, treatment, care, and health systems strengthening programs supported through PEPFAR through fiscal year 2013.[Footnote 2] The 2008 Leadership Act stated, among other things, that assistance provided to combat HIV/AIDS shall expand impact evaluation and other research and analysis efforts to improve accountability, increase transparency, measure the outcomes and impacts of interventions, ensure the delivery of evidence-based services, and identify and replicate effective models.[Footnote 3] The Government Performance and Results Act of 1993, amended in 2010 as the Government Performance and Results Modernization Act, also encourages evaluation of federal programs. Moreover, since 2002, the Office of Management and Budget has set expectations for agencies to conduct program evaluations as essential tools for improving program design and operations, determining whether intended outcomes are achieved effectively, and informing decision making. Responding to requirements in the Consolidated Appropriations Act of 2008 and the 2008 Leadership Act to review global HIV/AIDS program monitoring,[Footnote 4] this report (1) identifies PEPFAR evaluation activities and examines the extent to which evaluation findings, conclusions, and recommendations are supported and (2) examines the extent to which PEPFAR policies and procedures adhere to established general principles for the evaluation of U.S. government programs. To address these objectives, we reviewed the American Evaluation Association's (AEA) An Evaluation Roadmap for a More Effective Government (AEA Roadmap)[Footnote 5] as well as policies and guidance developed by the Department of State (State), State's Office of the U.S. Global AIDS Coordinator (OGAC), the Department of Health and Human Services' (HHS) Centers for Disease Control and Prevention (CDC), and the U.S. Agency for International Development (USAID). We conducted interviews with officials at OGAC, USAID, and CDC. We also surveyed CDC and USAID headquarters officials as well as CDC and USAID officials in the 31 countries and 3 regions that had PEPFAR annual operational plans in fiscal year 2010[Footnote 6] about which of their PEPFAR-funded activities operating in fiscal years 2008 through 2010 had ongoing or completed evaluations. In addition, we obtained electronic copies of completed evaluations for programs operating during this time period from CDC and USAID officials at headquarters and in the PEPFAR countries and regions. Using a standard assessment tool, we systematically assessed the level of support for findings, conclusions, and recommendations in samples of these evaluations, as indicated by the degree to which they were conducted in adherence with selected common evaluation standards. We assessed judgmental samples of evaluations submitted by OGAC and by CDC and USAID headquarters. We assessed a randomly selected sample of the evaluations submitted by PEPFAR country and regional teams, in order to generalize our assessment results to all of the submitted evaluations. Finally, we assessed State, OGAC, CDC, and USAID policies and practices against selected general principles of evaluation defined in the AEA Roadmap. (See appendix I for a detailed description of our objectives, scope, and methodology.) We conducted this performance audit from October 2011 to May 2012 in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives. Background: OGAC establishes overall PEPFAR policy and program strategies and coordinates PEPFAR program activities. In addition, OGAC allocates PEPFAR resources from the Global Health and Child Survival account to PEPFAR implementing agencies, primarily CDC and USAID.[Footnote 7] The agencies execute PEPFAR program activities through agency headquarters offices[Footnote 8] and interagency teams consisting of PEPFAR implementing agency officials in the countries and regions with PEPFAR- funded programs (PEPFAR country and regional teams). OGAC coordinates these activities through its approval of operational plans, which serve as annual work plans and document planned investments in, and the anticipated results of, HIV/AIDS-related programs. OGAC provides annual guidance on how to develop and submit operational plans. In fiscal years 2009 through 2011, OGAC approved operational plans representing $11.7 billion in PEPFAR program activities. These activities fall primarily in three broad program areas--prevention, treatment, and care--and 18 related program areas.[Footnote 9] Program activities aimed at preventing HIV infection and at treating those infected each represented about 30 percent of approved PEPFAR funding, while activities aimed at caring for AIDS patients represented about 20 percent. The remaining approximately 20 percent funded a variety of other program areas, such as health systems strengthening and building laboratory infrastructure. Figure 1 summarizes approved funding for these program areas in fiscal years 2009 through 2011. Figure 1: Approved Funding for PEPFAR Prevention, Treatment, Care, and Other Program Areas, Fiscal Years 2009 through 2011: [Refer to PDF for image: illustrated horizontal bar graph] Program area: Prevention; Prevention of mother-to-child transmission: (8.0%) $936.5 million; Abstinence/be faithful: (4.9%) $567.0 million; Other sexual prevention: (7.0%); $819.5 million; Blood safety: (1.4%) $167.3 million; Injection safety: (0.6%) $71.4 million; Medical male circumcision: (1.8%) $214.1 million; Prevention among injecting and non-injecting drug users: (0.6%) $69.6 million; Testing and counseling: (5.4%) $628.5 million; Total $3.474 billion (29.7%). Program area: Care: Adult care and support: (8.3%) $975.3 million; Pediatric care and support: (1.4%) $159.7 million; Orphans and vulnerable children: (8.7%) $1.011 billion; TB/HIV: (3.8%) $439.5 million; Total $2.586 billion (22.1%). Program area: Treatment: Antiretroviral drugs: (9.6%) $1.126 billion; Adult treatment: (17.2%) $2.016 billion; Pediatric treatment: (2.7%) $319.2 million; Total $3.461 billion (29.6%). Program area: Other: Laboratory infrastructure: (5.2%) $610.3 million; Strategic information: (5.0%) $580.0 million; Health systems strengthening: (8.4%) $983.1 million; Total $2.173 billion (18.6%). PEPFAR approved funding for prevention, treatment, care, and other programs, FY 2009-2011: $11.7 billion Source: GAO analysis of OGAC data reported in FY 2009-2011 PEPFAR operational places. Note: Numbers do not always add to totals because of rounding. These OGAC data were reported in PEPFAR operational plans for fiscal years 2009 through 2011. [End of figure] To carry out activities in these program areas, CDC and USAID use implementing mechanisms--grants, cooperative agreements, and contracts--with a variety of implementing partners.[Footnote 10] These partners include partner country governments, nongovernmental and international organizations, and academic institutions. CDC and USAID used more than 3,000 implementing mechanisms in fiscal years 2008 through 2010. CDC and USAID offices employ a wide variety of individuals and organizations to conduct PEPFAR evaluations, including implementing agency officials, consultants, and academic institutions as well as partner government organizations and implementing partners. Evaluation teams sometimes comprise representatives from several of these organizations. OGAC coordinates, and PEPFAR implementing agencies also engage in, several related activities that support evaluation, such as oversight of implementing partners,[Footnote 11] routine performance planning and reporting,[Footnote 12] biological and behavioral health surveillance,[Footnote 13] baseline studies and needs assessments, and development of health management information systems.[Footnote 14] PEPFAR evaluations are subject to common evaluation standards defined in various agency-specific and governmentwide guidance. This guidance includes CDC's Framework for Program Evaluation in Public Health [Footnote 15] and USAID's evaluation policy[Footnote 16] and Automated Directives System guidance.[Footnote 17] In addition, GAO published guidance on designing evaluations and assessing social program impact evaluations.[Footnote 18] Also, in September 2010, the AEA published a framework to guide the development and implementation of federal agency evaluation programs and policies. The framework offers a set of general principles intended to facilitate the integration of evaluation activities with program management. These principles include developing evaluation policies and procedures; developing evaluation plans; ensuring independence of evaluators in designing, conducting, and determining findings of their evaluations; ensuring professional competence of evaluators; and disseminating evaluation results publicly and in a timely fashion.[Footnote 19] PEPFAR Agencies Have Evaluated a Broad Range of PEPFAR Programs, but Results Are Not Fully Supported in Many Evaluations: OGAC, CDC, and USAID managed and conducted evaluations of a wide variety of PEPFAR programs that were ongoing during fiscal years 2008 through 2010. However, we found that many of these evaluations-- particularly evaluations managed by PEPFAR country and regional teams-- did not consistently adhere to common evaluation standards, in many cases calling into question the evaluations' support for their findings, conclusions, and recommendations. OGAC, CDC, and USAID Evaluated a Broad Range of Programs: OGAC, CDC, and USAID provided 496 evaluations addressing programs ongoing during fiscal years 2008 to 2010 in all PEPFAR program areas relating to HIV/AIDS treatment, prevention, and care. Of these 496 evaluations, 18 were public health evaluations (PHE), managed by OGAC; 42 were program evaluations provided by CDC and USAID headquarters officials; and 436 were program evaluations provided by CDC and USAID country and regional team officials. (For more information about these evaluations, see appendix III.) * OGAC-managed evaluations. OGAC provided 18 PHEs that CDC and USAID had completed as of November 2011 under an OGAC-managed approval, implementation review, and reporting process. The completed PHEs addressed the following program areas: prevention of mother-to-child transmission, testing and counseling, adult care and support, adult treatment, sexual prevention, and pediatric care and support.[Footnote 20] In addition, OGAC indicated that 82 other PHEs had been initiated as of November 2011. According to OGAC, PHEs are intended to assess the effectiveness and impact of PEPFAR programs; compare evidence- based program models in complex health, social, and economic contexts; and address operational questions related to program implementation within existing and developing health systems infrastructures. OGAC guidance states that these evaluations focus on strategies to increase program efficiency and impact to guide program development and inform the public, using rigorous quantitative or qualitative methods that permit broad generalization. For all PHEs, OGAC requires PEPFAR country and regional teams to submit evaluation concepts or protocols for approval by an interagency subcommittee[Footnote 21] and requires periodic progress and closeout reports. * CDC and USAID headquarters-managed evaluations. CDC headquarters officials provided 20 evaluations in the following program areas: blood safety, injection safety, adult treatment, pediatric treatment, and strategic information. USAID headquarters officials provided 22 evaluations in the following program areas: abstinence/be faithful, sexual prevention, orphans and vulnerable children, strategic information, and health systems strengthening programs. Four CDC and USAID headquarters evaluations addressed more than one program area. * Country and regional team-managed evaluations. CDC and USAID officials representing 31 PEPFAR country and 3 regional teams provided a total of 436 evaluations; CDC officials provided 185 evaluations, and USAID officials provided 251 evaluations. The evaluations addressed 18 program areas related to PEPFAR prevention, treatment, and care, with about one-fifth of the evaluations addressing activities in more than one program area (see figure 2). CDC and USAID officials also provided copies of evaluation protocols and statements of work, indicating that additional evaluations had been initiated. Further, based on our analysis of a randomly selected sample of 78 evaluations,[Footnote 22] we estimate that 51 percent of the evaluations used qualitative methods, 35 percent used quantitative methods, and 14 percent used a mix of quantitative and qualitative methods.[Footnote 23] In addition, evaluations provided by USAID tended to employ qualitative methods (32 of 48 evaluations), while those provided by CDC tended to use quantitative methods (20 of 30 evaluations). (See appendix III for additional results of our analysis.) Figure 2: Program Evaluations Provided by PEPFAR Country and Regional Teams, by PEPFAR Program Area: [Refer to PDF for image: illustrated horizontal bar graph] Program area: Prevention; Prevention of mother-to-child transmission: (4.4%) 19; Abstinence/be faithful: (2.3) 10; Other sexual prevention: (6.7%); 29; Blood safety: (1.4%) 6; Injection safety: (0.9%) 4; Medical male circumcision: (0.2%) 1; Prevention among injecting and non-injecting drug users: (0.9%) 4 Testing and counseling: (2.1%) 9; Total: 82 evaluations (18.8%). Program area: Care: Adult care and support: (3,2%) 14; Pediatric care and support: (0.7%) 3; Orphans and vulnerable children: (14.4%) 63; TB/HIV: (5.0%) 22; Total: 102 evaluations (23.4%). Program area: Treatment: Antiretroviral drugs: (1.6%) 7; Adult treatment: (11.2%) 49; Pediatric treatment: (2.8%) 12; Total: 68 evaluations (15.6%). Program area: Other: Laboratory infrastructure: (2.8%) 12; Strategic information: (6.7%) 29; Health systems strengthening: (11.9%) 52; Total: 93 evaluations (21.3%). Program area: More than one program area: Total: 91 evaluations (20.9%). Total number of evaluations: 436. Source: GAO analysis of evaluations submitted by 31 county and 3 regional PEPFAR teams. Note: Percentages do not always add to totals because of rounding. We initially identified 436 evaluations provided by CDC and USAID officials in 31 PEPFAR country and 3 regional teams. After examining a sample of 84 evaluations drawn from these 436 evaluations, we determined that a subset of these were outside the scope of our review. However, this figure presents counts by program area for all 436 evaluations. See appendix I for more information. [End of figure] Findings, Conclusions, and Recommendations Are Not Fully Supported in Many Evaluations: Our assessments of judgmental and randomly selected samples of PEPFAR evaluations indicate that many--particularly those managed by PEPFAR country and regional teams--contain findings, conclusions, and recommendations that are not fully supported. To determine the extent to which these elements are supported, we synthesized our assessments of the extent to which evaluations generally adhered to several common evaluation standards defined in guidance issued by CDC, USAID, and GAO. Specifically, we considered whether the evaluations describe the program to be evaluated and its objectives, the purpose of the evaluation, and the criteria used to reach conclusions about the achievement of the program's objectives. We also considered the extent to which evaluations incorporate appropriate designs, sample selection methods, measures, and data collection and analysis methods. All OGAC-managed PHEs that we reviewed generally adhered to these standards and thus their findings, conclusions, and recommendations were fully supported. We found similar results for most CDC and USAID headquarters' program evaluations we reviewed. However, PEPFAR country and regional teams' evaluations did not consistently adhere to common evaluation standards, and thus, in most cases, their findings, conclusions, and recommendations were not fully supported. OGAC-managed evaluations. Our assessment of seven OGAC-managed PEPFAR PHEs indicates that they all generally adhered to common evaluation standards, and thus their findings, conclusions, and recommendations were fully supported.[Footnote 24] All of the evaluations that we reviewed identified program and evaluation objectives and used appropriate measures, and most used appropriate evaluation designs and data collection and analysis methods. Three of the evaluations employed fully appropriate sampling methods. Table 1 summarizes our assessments of these evaluations. Table 1: Level of Support for Evaluation Findings, Conclusions, and Recommendations in Selected OGAC-Managed Public Health Evaluations, as Indicated by Adherence to Common Evaluation Standards: Findings, conclusions, and recommendations appear to be fully supported[A]; GAO assessments (n=7): Yes: 7; Partial: 0; No: 0. Common evaluation standards: Evaluation identifies program and evaluation objectives; GAO assessments (n=7): Yes: 7; Partial: 0; No: 0. Evaluation specifies why evaluation is needed; GAO assessments (n=7): Yes: 7; Partial: 0; No: 0. Evaluation identifies evaluation criteria; GAO assessments (n=7): Yes: 5; Partial: 2; No: 0. Evaluation design appears to be appropriate; GAO assessments (n=7): Yes: 5; Partial: 2; No: 0. Participant/sample selection methods and sample size appear to be generally appropriate; GAO assessments (n=7): Yes: 3; Partial: 4; No: 0. Measures used for this evaluation appear to be appropriate; GAO assessments (n=7): Yes: 7; Partial: 0; No: 0. Data collection and analysis methods appear to be appropriate; GAO assessments (n=7): Yes: 6; Partial: 1; No: 0. Source: GAO analysis. [A] Overall determinations are based on synthesis--but not tally--of assessments of adherence to common evaluation standards listed in this table. See appendix I for more information. [End of table] CDC and USAID headquarters-managed evaluations. Our assessment of 15 CDC and USAID headquarters-managed evaluations indicates that most generally adhere to common evaluation standards.[Footnote 25] As a result, we found that findings, conclusions, and recommendations were fully supported in 9 evaluations and partially supported in 6 evaluations. Most of the evaluations employed appropriate evaluation designs, measures, and data collection and analysis methods. However, 7 evaluations did not fully identify the evaluation criteria, and 8 did not employ fully appropriate sampling methods. Table 2 summarizes our assessments of these evaluations. Table 2: Level of Support for Evaluation Findings, Conclusions, and Recommendations in Selected CDC and USAID Headquarters-Managed Evaluations, as Indicated by Adherence to Common Evaluation Standards: Findings, conclusions, and recommendations appear to be fully supported[A]; GAO assessments (n=15): Yes: 9; Partial: 6; No: 0. Common evaluation standards; GAO assessments (n=15): Yes: [Empty]; Partial: [Empty]; No: [Empty]. Evaluation identifies program and evaluation objectives; GAO assessments (n=15): Yes: 13; Partial: 2; No: 0. Evaluation specifies why the evaluation is needed; GAO assessments (n=15): Yes: 14; Partial: 0; No: 1. Evaluation identifies evaluation criteria; GAO assessments (n=15): Yes: 8; Partial: 4; No: 3. Evaluation design appears to be appropriate; GAO assessments (n=15): Yes: 10; Partial: 4; No: 1. Participant/sample selection methods and sample size appear to be generally appropriate; GAO assessments (n=15): Yes: 7; Partial: 6; No: 2. Measures used for this evaluation appear to be appropriate; GAO assessments (n=15): Yes: 10; Partial: 2; No: 3. Data collection and analysis methods appear to be appropriate; GAO assessments (n=15): Yes: 12; Partial: 2; No: 1. Source: GAO analysis. [A] Overall determinations are based on synthesis--but not tally--of assessments of adherence to common evaluation standards listed in this table. See appendix I for more information. [End of table] Country and regional team-managed evaluations. We found that evaluations managed by country and regional teams, which make up the bulk of all PEPFAR program evaluations, did not consistently adhere to common evaluation standards. Based on our analysis of a randomly selected sample of country and regional team evaluations, we estimate that findings, conclusions, and recommendations were fully supported in 41 percent of all evaluations provided to us by country and regional teams, partially supported in 44 percent of these evaluations, and not supported in 15 percent of these evaluations. [Footnote 26] We estimate that 24 percent of these evaluations did not identify any evaluation criteria, and more than half did not employ evaluation designs, sampling methods, measures, or data collection and analysis methods that were fully appropriate.[Footnote 27] For example, an evaluation of activities for providing care to orphans and vulnerable children drew conclusions about results and made recommendations, based almost exclusively on favorable anecdotal information collected from selected program participants and beneficiaries. As a result, the objectivity and credibility of these evaluations' findings, conclusions, and recommendations are in question. Table 3 summarizes our assessments of these evaluations. Table 3: Estimated Extent to Which Country and Regional Teams' Evaluations Contained Fully Supported Findings, Conclusions, and Recommendations, as Measured by Adherence to Common Evaluation Standards: Findings, conclusions, and recommendations appear to be fully supported[A]; GAO assessments (n=436): Yes: 179 (41 percent); Partial: 190 (44 percent); No: 67 (15 percent); Not applicable: 0 (0 percent). Common evaluation standards: Evaluation identifies program and evaluation objectives; GAO assessments (n=436): Yes: 363 (83 percent); Partial: 73 (17 percent); No: 0 (0 percent); Not applicable: 0 (0 percent). Evaluation specifies why the evaluation is needed; GAO assessments (n=436): Yes: 375 (86 percent); Partial: 56 (13 percent); No: 6 (1 percent); Not applicable: 0 (0 percent). Evaluation identifies evaluation criteria; GAO assessments (n=436): Yes: 224 (51 percent); Partial: 106 (24 percent); No: 106 (24 percent); Not applicable: 0 (0 percent). Evaluation design appears to be appropriate; GAO assessments (n=436): Yes: 212 (49 percent); Partial: 184 (42 percent); No: 39 (9 percent); Not applicable: 0 (0 percent). Participant/sample selection methods and sample size appear to be generally appropriate; GAO assessments (n=436): Yes: 168 (38 percent); Partial: 123 (28 percent); No: 117 (27 percent); Not applicable: 28 (6 percent). Measures used for this evaluation appear to be appropriate; GAO assessments (n=436): Yes: 196 (45 percent); Partial: 140 (32 percent); No: 84 (19 percent); Not applicable: 17 (4 percent). Data collection and analysis methods appear to be appropriate; GAO assessments (n=436): Yes: 134 (31 percent); Partial: 229 (53 percent); No: 73 (17 percent); Not applicable: 0 (0 percent). Source: GAO analysis. [A] Overall determinations are based on synthesis--but not tally--of assessments of adherence to common evaluation standards listed in this table. See appendix I for more information. Notes: Numbers and percentages are based on analysis of a randomly selected sample of evaluations. The margin of error associated with proportion estimates is no more than plus or minus 11 percentage points at the 95 percent level of confidence. The margin of error for totals is not more than 44 evaluations. Numbers do not always add to totals because of rounding. See appendices I and III for more information about these assessments. [End of table] Further analysis of the results of our assessments showed that evaluations using qualitative methods were more likely to contain results that were partially supported or not supported than evaluations using quantitative methods. (See appendix III for additional results of our analysis.) PEPFAR Policies and Procedures Do Not Fully Adhere to AEA Evaluation Principles Relating to Planning, Independence, and Dissemination: State, OGAC, CDC, and USAID have developed policies and procedures that apply to evaluations of PEPFAR programs, as called for in the AEA Roadmap. However, they have not fully adhered to other AEA Roadmap principles regarding evaluation planning, independence and competence of evaluators, and dissemination of evaluation results. First, OGAC has not developed PEPFAR evaluation plans at the program level or required the development of such plans in individual countries and regions, limiting its own ability to ensure that evaluation resources are appropriately targeted. Second, State, OGAC, CDC, and USAID guidance does not specify how to document the independence and competency of evaluators, and almost half of the evaluations we reviewed did not provide sufficient information to fully determine whether evaluators were free of conflicts of interest. Finally, not all evaluation reports are available online, thus limiting their accessibility and usefulness to PEPFAR decision makers and other stakeholders. State, OGAC, and PEPFAR Implementing Agencies Have Issued Evaluation Policies and Procedures: In accordance with AEA principles, State, OGAC, CDC, and USAID have issued policies and procedures that are applicable to PEPFAR program evaluation.[Footnote 28] * State evaluation policy. In February 2012, State's Bureau of Resource Management issued an evaluation policy that applies to all State bureaus and OGAC.[Footnote 29] The policy provides a framework for implementing evaluations of State's various programs and projects and encourages evaluations for programs and projects at all funding levels. * OGAC operational plan guidance. According to OGAC officials, OGAC generally has deferred to implementing agency policies. OGAC also issues annual guidance to PEPFAR implementing agencies for preparation of their operational plans. OGAC's fiscal year 2012 operational plan guidance to PEPFAR country and regional teams, issued in August 2011, addresses some elements of evaluation. The guidance differentiates three types of evaluation and research: basic program evaluation, which focuses on descriptive and normative evaluation questions; operations research, which focuses on program delivery and optimal allocation of resources; and impact evaluation, which measures the change in an outcome attributable to a particular program.[Footnote 30] * CDC evaluation framework. In September 1999, the Program Evaluation Unit at CDC's Office of the Associate Director for Program issued an evaluation framework for CDC programs.[Footnote 31] The framework summarizes essential elements of program evaluation, clarifies program evaluation steps, and reviews standards for effective program evaluation, among other things. According to CDC's Chief Evaluation Officer, as of May 2012, CDC plans to issue evaluation guidelines and recommendations as well as additional guidance for using the evaluation framework. * USAID evaluation policy. In January 2011, USAID's Bureau for Policy, Planning, and Learning revised evaluation policy to supplement existing evaluation guidance in USAID's Automated Directive System. [Footnote 32] According to USAID, this revised policy was intended to address a decline in the quantity and quality of evaluation practice within the agency in the recent past. The policy clarifies for USAID staff, partners, and stakeholders the purposes of evaluation; the types of evaluations that are required and recommended; and USAID's approach for conducting, disseminating, and using evaluations. Among other things, the policy sets forth the purposes of evaluation, the roles and responsibilities of USAID operating units, and evaluation requirements and practices for all USAID programs and projects. The policy requires all USAID operating units to consult with program office experts to ensure that scopes of work for external evaluations meet evaluation standards. The policy also states that operating units, in collaboration with the program office, must ensure that evaluation draft reports are assessed for quality by management and through an in-house peer technical review.[Footnote 33] PEPFAR Lacks Evaluation Plans: OGAC has not yet developed a program-level PEPFAR evaluation plan or required implementing agencies or country and regional teams to develop evaluation plans as called for by the AEA Roadmap.[Footnote 34] * OGAC. State's recently issued evaluation policy requires that each State bureau, including OGAC, develop and submit a bureauwide evaluation plan that encompasses major policy initiatives and new programs as well as existing programs and projects. According to a senior OGAC official, at the time of our review, OGAC was discussing with State's Bureau of Resource Management how it will comply with this new requirement. * CDC and USAID headquarters. OGAC defers to implementing agencies to plan evaluations of their headquarters-managed PEPFAR program activities, but CDC and USAID have not developed evaluation plans for such activities included in recent headquarters operational plans. OGAC's 2011 guidance for developing the headquarters operational plan requires a plan for technical area program priorities but does not address evaluation planning. Similarly, the fiscal year 2012 guidance does not include a requirement for an evaluation plan. * Country and regional teams. OGAC defers to PEPFAR country and regional teams to plan evaluations of their program activities, but does not require that the teams develop and submit annual evaluation plans. OGAC's 2011 guidance on developing country and regional operational plans urges country and regional teams to prioritize program evaluation in order to make PEPFAR programs more effective and sustainable. In addition, OGAC's fiscal year 2012 guidance calls for country and regional teams to address monitoring and evaluation in describing individual implementing partners' activities. However, neither the 2011 guidance nor the 2012 guidance instructs all country teams to develop evaluation plans.[Footnote 35] We reviewed PEPFAR country and regional operational plans for fiscal year 2011 and found that they did not include evaluation plans.[Footnote 36] Instead, these documents generally included (1) descriptions of ongoing or planned evaluations and related activities (e.g., surveillance) in program area narrative summaries and (2) descriptions of monitoring and evaluation activities in implementing partner activity narratives. In our analysis of information provided by country and regional teams, as well as CDC and USAID headquarters, we did not detect an evaluation rationale or strategy. Based on responses to our survey of CDC and USAID officials in 31 PEPFAR country and 3 regional teams,[Footnote 37] we calculated that evaluations had been conducted or were ongoing for about one-third of these countries' program activities in fiscal years 2008 through 2010. In addition, based on these officials' responses, we found similar percentages of ongoing and completed evaluations across the broad program areas of prevention, treatment, and care.[Footnote 38] We also analyzed CDC and USAID headquarters officials' responses to our survey and found that evaluations had been conducted or were ongoing for about half of the PEPFAR program activities managed by agencies' headquarters and implemented during fiscal years 2008 to 2010.[Footnote 39] However, we found no relationships between the percentages of program activities with ongoing or completed evaluations and budgets at the country, program area (i.e., prevention, treatment, or care), or program activity levels. Evaluator Independence and Qualifications Are Not Consistently Documented: State, CDC, and USAID policies and procedures address the independence of evaluators but do not consistently require that evaluation reports identify the evaluation team or address whether there are any potential conflicts of interest.[Footnote 40] In addition, some agency policies and procedures address the need to ensure that evaluators have appropriate qualifications, but none require that evaluations document those qualifications or certify that they are adequate. [Footnote 41] * State. State's recently issued evaluation policy addresses evaluator independence and integrity, stating that evaluators should be free from program managers and not subject to their influence. This policy does not address evaluator qualifications. * OGAC. OGAC's operational plan guidance to country and regional teams does not address the independence or professional qualifications of evaluators. According to OGAC officials, OGAC defers to implementing agency evaluation policies. * CDC. CDC's evaluation framework addresses the need to assemble an evaluation team with the needed competencies, highlighting the importance of ensuring that evaluators have no particular stake in the results of the evaluation. The CDC evaluation framework also discusses appropriate ways to assemble an evaluation team. * USAID. USAID's evaluation policy recommends that most evaluations be external and requires a disclosure of conflicts of interest for all evaluation team members. In addition, USAID's evaluation policy requires that evaluation-related competencies be included in staffing selection policies. Our analysis of a randomly selected sample of evaluations submitted by 31 PEPFAR country and 3 regional teams found that the evaluations often did not address whether evaluators have potential conflicts of interest, as called for by the AEA Roadmap. We estimate that 27 percent of the evaluations fully addressed potential conflicts of interest, 59 percent partially addressed the issue, and 14 percent did not address the issue. In addition, while we were unable to determine whether potential conflicts of interest existed with the information provided in some of the evaluation reports, it appeared that there were evaluations in which potential conflicts of interest existed but were not addressed. For example, one evaluation report, relating to strengthening a partner country's nongovernmental HIV/AIDS organizations, indicated that the evaluation team was employed by the program activity's implementing partner, but the report did not address potential conflict of interest. Furthermore, some country and regional program evaluations sometimes did not provide enough identifying information about evaluators to allow an assessment of evaluator independence or qualifications. We estimate that 86 percent of the evaluations fully identified the evaluators, while 14 percent provided either partial or no information. For example, an evaluation report we reviewed relating to HIV prevention program activities in one region named the organization that conducted the evaluation but did not provide any information on the evaluation team members. Moreover, we were unable to find any information about this organization in an online search based on the limited information available in the report. PEPFAR Evaluations Are Not All Publicly Disseminated: Agency policies and procedures generally support dissemination of evaluation results, but OGAC, CDC, and USAID have not ensured that evaluation methods, data, and evaluation results are made fully and easily accessible to the public.[Footnote 42] * State. State's newly released evaluation policy requires bureaus to submit evaluations to a central repository. * OGAC. OGAC officials told us that the office supports dissemination of the results of important global HIV/AIDS research and evaluations to a variety of stakeholders. For example, OGAC officials noted that the PEPFAR website contains information on PEPFAR results as well as monitoring and evaluation guides. OGAC officials also noted that dissemination strategies are a common component of evaluation protocols and the procurement mechanisms that fund them. In addition, OGAC maintains an intranet site, which is accessible to PEPFAR implementing agency officials and contains information about evaluation. However, OGAC does not have a mechanism for publicly and systematically disseminating evaluation results.[Footnote 43] * CDC. CDC policy advises that effort is needed to ensure that evaluation findings are disseminated appropriately but does not require online dissemination of evaluation reports.[Footnote 44] CDC officials told us that they recently made changes to CDC's public website, which, as of April 2012, includes some information on program evaluations. In addition, CDC's Division of Global HIV/AIDS (DGHA) Science Office maintains a catalog of published journal articles coauthored by DGHA officials. However, CDC does not maintain a complete online inventory of evaluations. * USAID. USAID's policy states that evaluation findings should be shared as widely as possible with a commitment to full and active disclosure. USAID requires submission of completed evaluations to the Development Experience Clearinghouse (DEC), the agency's online repository of research documentation,[Footnote 45] but does not enforce this requirement. In 2010, USAID reported that practices for disseminating evaluation results were generally limited, that dissemination practices varied across the agency, and that the requirement to submit completed evaluations to the DEC had not been fully enforced. Additionally, USAID found that documents in the DEC were sometimes difficult to find.[Footnote 46] In February 2012, USAID also found that missions had reported submitting only 20 percent of their evaluations to the DEC in fiscal year 2009.[Footnote 47] Although documents submitted by 31 PEPFAR country and 3 regional teams showed that CDC and USAID have disseminated evaluation findings within these countries and regions in several ways, we found no publicly accessible and easily searchable Internet source for PEPFAR program evaluations. We received abstracts from annual meetings and conferences, presentations to partner government officials and stakeholders, published journal articles, and periodic agency reports, which may be publicly accessible via the Internet.[Footnote 48] However, as of the time of our review, our searches of five key websites generated far fewer PEPFAR evaluations than the 496 evaluations we received from country teams, CDC and USAID headquarters, and OGAC.[Footnote 49] We searched PubMed, the U.S. National Library of Medicine's online database, but a search using "PEPFAR" and "evaluation" as search terms generated seven results. Likewise, as of April 2012, our search of USAID's DEC, using "HIV/AIDS" and "evaluation" as search terms, generated 87 results, including some that were not evaluations, but USAID officials, in response to our request, later provided us nearly 300 evaluations. We also found some evaluations at two USAID-maintained websites, OVCsupport.net and AIDStar-One, but neither site was comprehensive or fully searchable.[Footnote 50] In addition, a website called Global HIV M&E Information provides a repository of voluntarily submitted monitoring and evaluation resources; however, we found few evaluations of PEPFAR programs. Conclusions: PEPFAR's authorizing legislation emphasizes the importance of program evaluation as a tool for OGAC to ensure, among other things, that funds are spent on programs that show evidence of success. State, CDC, and USAID have demonstrated a clear commitment to program evaluation by conducting a wide variety of program evaluations that address at least one activity in each PEPFAR program area. However, many evaluations managed by PEPFAR country and regional teams lack fully supported findings, conclusions, and recommendations, evidenced by a lack of general adherence to common evaluation standards. Without fully supported findings, conclusions, and recommendations, these PEPFAR program evaluations have limited usefulness as a basis for decision making and may supply incomplete or misleading information for managers' and stakeholders' efforts to direct PEPFAR funding to programs that produce the desired outcomes and impacts. State, CDC, and USAID have demonstrated their commitment to program evaluation by developing policies and procedures that apply to evaluations, in accordance with established general principles. However, without a requirement that country and regional teams prepare and submit annual evaluation plans--for example, as a component of operational plans--OGAC is unable to ensure that program activities are subject to appropriate levels of evaluation. Moreover, without documentation of the independence and competence of PEPFAR program evaluators, OGAC, agency program managers, and other stakeholders have limited assurance that evaluation results are unbiased and credible. Finally, unless evaluation results are publicly and systematically disseminated and made easily searchable online, program officials and public health researchers may be unable to assess the credibility of their findings or use them for program decision making. Recommendations for Executive Action: We recommend that the Secretary of State direct the U.S. Global AIDS Coordinator to take the following four actions in collaboration with CDC and USAID to enhance PEPFAR evaluations: 1. develop a strategy to improve PEPFAR implementing agencies' and country and regional teams' adherence to common evaluation standards; 2. require implementing agency headquarters and country and regional teams to include evaluation plans in their annual operational plans; 3. provide detailed guidance for implementing agencies and country and regional teams on assessing, ensuring, and documenting the independence and competence of PEPFAR program evaluators; and: 4. increase the online accessibility of PEPFAR program evaluation results. Agency Comments: We provided a draft of this report to State, HHS's CDC, and USAID. Responding jointly with CDC and USAID, State OGAC provided written comments (see appendix IV). CDC and USAID also provided technical comments, which we incorporated as appropriate. In its written comments, State agreed with our recommendations and, emphasizing the interagency nature of the PEPFAR program, indicated that it will coordinate with PEPFAR agencies to implement our recommendations. First, State explained that it will work with PEPFAR implementing agencies to carry out the agencies' evaluation policies and practices, which State deemed generally consistent with AEA principles, and will develop strategies to ensure the appropriate application of common evaluation standards. Second, State responded that it will work through PEPFAR interagency processes to develop PEPFAR program evaluation plans, which it noted could be included in annual PEPFAR operational plans. Third, State will work with PEPFAR implementing agencies to put in place guidance to document program evaluators' independence and qualifications. Fourth, State affirmed that OGAC will collaborate with PEPFAR implementing agencies to develop strategies for improving dissemination of evaluation results and will use PEPFAR's public website to link to agencies' online resources. We are sending copies of this report to the Secretary of State, the Office of the U.S. Global AIDS Coordinator, U.S. Agency for International Development's Office of HIV/AIDS, the Department of Health and Human Services' Office of Global Affairs, the Centers for Disease Control and Prevention's Division of Global HIV/AIDS, and appropriate congressional committees. In addition, the report is available at no charge on the GAO website at [hyperlink, http://www.gao.gov]. If you or your staffs have any questions about this report, please contact me at (202) 512-3149 or gootnickd@gao.gov. Contact points for our Offices of Congressional Relations and Public Affairs may be found on the last page of this report. GAO staff who made major contributions to this report are listed in appendix VI. Signed by: David Gootnick: Director: International Affairs and Trade: List of Committees: The Honorable John Kerry: Chairman: The Honorable Richard G. Lugar: Ranking Member: Committee on Foreign Relations: United States Senate: The Honorable Patrick Leahy: Chairman: The Honorable Lindsey Graham: Ranking Member: Subcommittee on State, Foreign Operations, and Related Programs: Committee on Appropriations: United States Senate: The Honorable Ileana Ros-Lehtinen: Chairman: The Honorable Howard L. Berman: Ranking Member: Committee on Foreign Affairs: House of Representatives: The Honorable Kay Granger: Chairwoman: The Honorable Nita M. Lowey: Ranking Member: Subcommittee on State, Foreign Operations, and Related Programs: Committee on Appropriations: House of Representatives: [End of section] Appendix I: Objectives, Scope, and Methodology: This report (1) identifies President's Emergency Plan for AIDS Relief (PEPFAR) evaluation activities and examines the extent to which evaluation results are supported and (2) examines the extent to which PEPFAR policies and procedures adhere to established principles for the evaluation of U.S. government programs. To identify PEPFAR program evaluations and examine the extent to which they generated supported evaluation results, we collected and analyzed program evaluation documents provided by Centers for Disease Control and Prevention (CDC) and U.S. Agency for International Development (USAID) officials in the 31 PEPFAR countries and 3 regions with PEPFAR country or regional operational plans in fiscal year 2010, as well as the Department of State's (State) Office of the U.S. Global AIDS Coordinator (OGAC) and CDC and USAID headquarters officials. To examine the extent to which PEPFAR program evaluation policies and procedures adhered to principles in the American Evaluation Association's (AEA) An Evaluation Roadmap for a More Effective Government (AEA Roadmap), we reviewed the general principles for conducting federal government program evaluations, as well as OGAC, State, USAID, and CDC policies and guidance. In addition, we surveyed CDC and USAID officials in the 31 PEPFAR countries and 3 regions with PEPFAR annual country or regional operational plans in fiscal year 2010, as well as CDC and USAID headquarters officials, regarding ongoing and completed evaluations. Finally, we conducted interviews with OGAC, CDC, and USAID officials in Washington, D.C., and Atlanta, Georgia. Survey of PEPFAR Officials: To survey PEPFAR country and regional team officials, we took the following steps: 1. We consulted with OGAC and CDC and USAID headquarters officials and decided to use implementing mechanism[Footnote 51] as a proxy for a program activity. We determined that using implementing mechanisms was the only viable unit of analysis to estimate the percentage of PEPFAR programs with evaluations because (1) OGAC officials maintained updated data on implementing mechanisms and (2) PEPFAR officials regularly used and understood data on implementing mechanisms. However, in some of these cases, if the broader program was evaluated, not all implementing mechanisms under the larger program were necessarily evaluated. We also recognized that evaluations may not be appropriate for all implementing mechanisms (such as those that provide funding for staffing costs). To the extent possible, we eliminated these implementing mechanisms from our analysis. 2. We obtained lists of program activities for fiscal years 2008 through 2010 from OGAC for each country and region. We then analyzed program activities by country (or region) and agency; the lists included identification numbers, names, and partner names for each of the program activities. Each survey tool then contained a list of program activities relevant to the country or regional team. 3. Based on GAO and OGAC guidance, we developed the following working definition of evaluation: Evaluations are systematic studies to assess how well a program is working. Evaluations are often conducted by experts external to the program, either inside or outside the agency. Types of evaluations include process, outcome, impact, or cost-benefit analysis. 4. We developed a survey tool for ongoing and completed evaluations of PEPFAR programs. We consulted with OGAC and CDC and USAID headquarters officials about the survey tool and made revisions as appropriate. For example, based on input from CDC and USAID headquarters officials, we determined that some PEPFAR evaluations could address several implementing mechanisms. In addition, in some of these cases, if a broader program (e.g., national treatment program) was evaluated, not all implementing mechanisms under the broader program were necessarily evaluated. In response, we included questions in our survey prompting PEPFAR officials to indicate whether an implementing mechanism has been evaluated as part of a broader evaluation of several implementing mechanisms. 5. We tested the survey tool with officials in two PEPFAR countries-- Angola and Ethiopia--and finalized the survey tool based on discussions with these officials. 6. We sent the final survey tool to PEPFAR country contacts (PEPFAR coordinators and CDC and USAID officials) identified by OGAC and CDC and USAID headquarters. The survey tool instructed CDC and USAID country or regional team officials to provide "yes" or "no" responses to the following questions for each implementing mechanism in the country's (or region's) agency-specific lists: * Is this one of your agency's fiscal year 2008-2010 country or regional operational plan program activities? * Has at least one evaluation specific to this implementing mechanism been completed? * Is at least one evaluation specific to this implementing mechanism ongoing? * Has at least one evaluation covering, but broader than, this implementing mechanism been completed? * Is at least one evaluation covering, but broader than, this implementing mechanism ongoing? We also prompted the country or regional officials to provide additional information for each implementing mechanism, such as explanations for program activities that do not belong to the agency and identification of duplicate program activities. Officials were instructed to either e-mail the completed surveys to GAO or upload them to a website regularly used by OGAC and country and regional teams for submitting and sharing planning and reporting documents. In some cases, we met with country or regional team officials via telephone, or corresponded via e-mail, to clarify the purpose of the survey, the questions themselves, and the evaluation document request as well as to correct anomalies and ask follow-up questions. One GAO analyst also attended the May 2011 PEPFAR implementing agency annual meeting in Johannesburg, South Africa, to provide information about the survey and evaluation document request to PEPFAR country and regional team officials also attending the annual meeting. We received responses from all 31 PEPFAR countries and 3 regions with fiscal year 2010 operational plans. Using a similar survey tool, we also conducted surveys of CDC and USAID headquarters officials regarding program activities managed by agency headquarters and listed in PEPFAR headquarters operational plans for 2008 through 2010. Analysis of Survey Responses: To analyze country and regional teams' survey responses, we made the following assumptions regarding the survey responses: * If officials did not provide a response to the question "Is this one of your agency's fiscal year 2008-2010 country or regional operational plans program activities?" we included that implementing mechanism in the analysis. Program activities with responses of "no" or "duplicate" were eliminated from the analysis. * If officials did not respond to any of the four questions regarding ongoing or completed evaluations, we assumed that there were no ongoing or completed evaluations for that implementing mechanism. In addition, we reviewed narrative comments provided by country and regional team officials. We recognized that evaluations may not be appropriate for all implementing mechanisms (such as those that provide funding for staffing costs). To the extent possible, we eliminated these implementing mechanisms from our analysis. Based in part on our review of the narrative comments, we flagged and eliminated implementing mechanisms with evidence indicating that the implementing mechanism was either "to be determined" (i.e., the agency had yet to make an award to an implementing partner), related to staffing costs, related to strategic information and monitoring and evaluation, recently begun, a duplicate of another implementing mechanism, or listed in error. Once the survey responses were ready for analysis, we calculated the summary statistics that are reported in the body of the report. We also included the survey responses provided by officials in CDC and USAID headquarters in the analysis. To check the reliability of the data analysis, a second independent analyst reviewed the statistical programs used to analyze the data for accuracy. Program Evaluation Document Collection: In addition to our survey of CDC and USAID officials in the 31 countries and 3 regions with fiscal year 2011 operational plans, we requested program evaluation documents. To do this, the survey tool instructions prompted CDC and USAID officials to provide documentation of completed and ongoing evaluations. Specifically, for implementing mechanisms where officials indicated that at least one evaluation had been completed, we requested documentation--such as an evaluation report--of all such completed evaluations. For implementing mechanisms where officials indicated that at least one evaluation was ongoing, we requested documentation--such as terms of work or an evaluation plan. We generally advised country and regional team officials to err on the side of inclusion when in doubt about whether to submit documentation of ongoing and completed evaluations. We instructed these officials to e-mail, or, in some cases, mail electronic versions of the program evaluation documents to GAO, or to upload them to a website regularly used by OGAC and country and regional teams for submitting and sharing planning and reporting documents. In response to this document request, we received more than 1,350 documents. For example, we received documentation of ongoing or planned evaluations, such as statements of work or evaluation protocols and protocol approval forms. We also received meeting minutes, trip reports, financial review and audit documents, presentation slides, abstracts, and conference posters. To determine which documents met our definition of evaluation, we reviewed each of these documents and categorized them as meeting the definition of evaluation or not, following a set of decision rules. For example, we included data quality assessments, costing studies that compared costs and explained cost differences, and analyses of surveillance data pre- and post-intervention. We excluded surveillance studies that simply reported the results of a surveillance activity (but did not link it to a specific program or intervention); needs assessments, baseline studies, and situation analyses; trip and site visit reports; and pre- and post-event (e.g., workshop) questionnaires or surveys. We identified and eliminated duplicate documents. This categorization was checked by a second analyst and yielded 436 program evaluations. We believe that this final set of evaluations constitutes an essentially full universe of PEPFAR country and regional program evaluation documents. In addition to the program evaluation documents collected from CDC and USAID officials in PEPFAR countries and regions, we requested documents from OGAC related to PEPFAR public health evaluations. We also requested evaluation documents related to PEPFAR program managed by CDC and USAID headquarters from officials at each agency's headquarters. OGAC provided copies of 18 completed public health evaluations, CDC headquarters provided copies of 22 completed evaluations, and USAID headquarters provided copies of 24 completed evaluations. We reviewed the program evaluation documents submitted by PEPFAR country and regional teams as well as CDC and USAID headquarters officials. We identified whether each program evaluation was ongoing or completed as well as which program area or areas (e.g., prevention, treatment, care, or other) were evaluated. To do this, we used program categories defined by OGAC's fiscal years 2011 operational plan guidance, resulting in the program areas and related areas reported in the report. This categorization was checked by a second analyst. Table 4 provided descriptions of the PEPFAR program areas. Table 4: PEPFAR Program Area Descriptions: Prevention: Program area: Prevention of mother-to-child transmission (PMTCT); Description: Activities aimed at preventing mother-to-child HIV transmission, including antiretroviral prophylaxis for HIV-infected pregnant women and newborns and counseling and support for maternal nutrition. Program area: Abstinence/be faithful; Description: Activities to promote abstinence, including delay of sexual activity or secondary abstinence, fidelity, reducing multiple and concurrent partners, and related social and community norms that impact these behaviors. Activities address programming for both adolescents and adults. Program area: Other sexual prevention; Description: Activities aimed at preventing HIV transmission, including purchase and promotion of condoms, management of sexually transmitted infections, and programs to reduce other risks of persons engaged in high-risk behaviors. Prevention services should be focused on target populations such as alcohol users; at risk youth; men who have sex with men; mobile populations, including migrant workers, truck drivers, and members of military and other uniformed services; and persons who exchange sex for money, other goods, or both with multiple or concurrent sex partners, including persons engaged in prostitution, transactional sexual partnerships, or both. Program area: Blood safety; Description: Activities supporting a nationally coordinated blood program to ensure a safe and adequate blood supply, including infrastructure and policies; donor-recruitment activities; blood collection, testing for transfusion-transmissible infections, component preparation, storage, and distribution; appropriate clinical use of blood, transfusion procedures, and hemovigilance; training and human resource development; monitoring and evaluation; and development of sustainable systems. Program area: Injection safety; Description: Policies, training, waste-management systems, advocacy, and other activities to promote medical injection safety, including distribution/supply chain, cost and appropriate disposal of injection equipment, and other related equipment and supplies. Program area: Medical male circumcision (MC); Description: Policy, training, outreach, message development, service delivery, quality assurance, and equipment and commodities related to male circumcision. All MC services should include the minimum package; HIV testing and counseling provided on-site; age-appropriate pre-and postoperative sexual risk reduction counseling; active exclusion of symptomatic sexually transmitted infections and syndromic treatment when indicated; provision and promotion of correct and consistent use of condoms; circumcision surgery in accordance with national standards and international guidance; counseling on the need for abstinence from sexual activity during wound healing; wound care instructions; and postoperative clinical assessments and care. Program area: Prevention among injecting and noninjecting drug users; Description: Activities including policy reform, training, message development, community mobilization, and comprehensive approaches, including medication assistance therapy to reduce injecting drug use. Procurement of methadone and other medical-assisted therapy drugs should be included under this program area budget code. Programs for prevention of sexual transmission within injecting drug users should be included in this category. Program area: Counseling and testing; Description: Activities in which both HIV counseling and testing are provided for those who seek to know their HIV status or provider- initiated testing and counseling. Care: Program area: Adult care and support; Description: All facility-based and home/community-based activities for HIV-infected adults and their families aimed at extending and optimizing quality of life for HIV-infected clients and their families throughout the continuum of illness through provision of clinical, psychological, spiritual, social, and prevention services. Clinical care should include prevention and treatment of opportunistic infections (excluding TB) and other HIV/AIDS-related complications, including malaria and diarrhea (providing access to commodities such as pharmaceuticals, insecticide-treated nets, safe water interventions, and related laboratory services); pain and symptom relief; and nutritional assessment and support, including food. Psychological and spiritual support may include group and individual counseling and culturally appropriate end-of-life care and bereavement services. Social support may include vocational training, income- generating activities, social and legal protection, and training and support of caregivers. Prevention services may include "prevention for positives" behavioral counseling and counseling and testing of family members. Program area: Pediatric care and support; Description: All health facility-based care for HIV-exposed children aimed at extending and optimizing quality of life for HIV-infected clients and their families throughout the continuum of illness through provision of clinical, psychological, spiritual, social, and prevention services. Clinical care should include early infant diagnosis, prevention and treatment of opportunistic infections (excluding TB) and other HIV/AIDS-related complications, including malaria and diarrhea (providing access to commodities such as pharmaceuticals, insecticide treated nets, safe water interventions, and related laboratory services); pain and symptom relief; and nutritional assessment and support, including targeted food interventions. Other services, such as psychological, social, spiritual, and prevention services, should be provided as appropriate. Program area: Orphans and vulnerable children (OVC); Description: Activities aimed at improving the lives of orphans and other vulnerable children affected by HIV/AIDS, and doing so in a measurable way. Services to children (0-17 years) should be based on the actual needs of each child and could include ensuring access to basic education (from early childhood development through secondary level); basic health care services; targeted food and nutrition support, including support for safe infant feeding and weaning practices; protection; mitigation of factors that place children at risk; legal aid; economic strengthening; training of caregivers in HIV prevention and home-based care; and so forth. Household-centered approaches that link OVC services with HIV-affected families (linkages with PMTCT, palliative care, treatment, etc.) and strengthen the capacity of the family unit (caregiver) are included along with strengthening community structures that protect and promote healthy child development (schools, churches, clinics, child protection committees, etc.) and investments in local and national government capacity to identify, monitor, and track children's well-being. Programs may be included that strengthen the transition from residential OVC care to more family-centered models. Program area: TB/HIV; Description: Exams, clinical monitoring, related laboratory services, treatment, and prevention of tuberculosis (including medications); HIV testing and clinical care of clients in TB service locations; TB screening; and diagnosis, treatment and prevention of TB in people living with HIV/AIDS. Funding for these activities, including commodities and laboratory, should be included in the TB/HIV budget code rather than other budget codes. The location of TB/HIV activities can include general medical settings, HIV/AIDS clinics, home-based care, and traditional TB clinics and hospitals. Treatment: Program area: Antiretroviral (ARV) drugs; Description: Procurement, delivery, and transport of ARV drugs, including all antiretroviral postexposure prophylaxis procurement for rape victims. Program area: Adult treatment; Description: Infrastructure, training clinicians and other providers, exams, clinical monitoring, related laboratory services, and community- adherence activities. Program area: Pediatric treatment; Description: Infrastructure, training clinicians and other providers, exams, clinical monitoring, related laboratory services, and community- adherence activities. Other: Program area: Laboratory infrastructure; Description: Development and strengthening of laboratory systems and facilities to support HIV/AIDS-related activities, including purchase of equipment and commodities and provision of quality assurance, staff training, and other technical assistance. Program area: Strategic information; Description: HIV/AIDS behavioral and biological surveillance; facility surveys; monitoring of partner results; reporting of results; support of health information systems; assistance to countries in establishing such systems, strengthening them, or both; and related analyses and data dissemination activities. Program area: Health systems strengthening; Description: Activities that contribute to national-, regional-, or district-level health systems by supporting finance, leadership and governance (including broad policy reform efforts, including addressing stigma, gender issues, etc.), human resources for health, institutional capacity building, supply chain or procurement systems, information systems, Global Fund programs, and donor coordination. Source: GAO synthesis of OGAC information provided in fiscal year 2012 country and regional operational plan guidance. [End of table] Development of Evaluation Assessment Tool: To determine the degree to which these evaluations were conducted in adherence with common evaluation standards, we used an assessment tool to systematically conduct in-depth analyses of a probability sample of the evaluations submitted by the PEPFAR country and regional teams and a nonprobability sample of the evaluations submitted by OGAC and CDC and USAID headquarters officials. Our PEPFAR evaluation assessment tool was based on an assessment tool used for a prior GAO report, which we updated using guidance on evaluation from USAID,[Footnote 52] CDC,[Footnote 53] the Organization for Economic Cooperation and Development (OECD),[Footnote 54] and GAO. We piloted the assessment tool with three PEPFAR program evaluation documents provided by CDC and USAID headquarters officials and revised the evaluation assessment as appropriate. After piloting and revising the tool, we finalized the tool and used it to conduct the in-depth analyses of program evaluation documents. Table 5 lists the questions and supporting questions included in the assessment tool. Table 5: Questions Included in the GAO Evaluation Assessment Tool: Assessment questions: Does the evaluation specify why the evaluation is needed? Supporting questions: * Is the hypothesis or rationale underlying the program identified? * Are any related evaluations, studies, or other documents (e.g., mid- term evaluation) identified? Assessment questions: Does the evaluation identify stakeholders? Supporting questions: [Empty]. Assessment questions: Does the evaluation identify program and evaluation objectives? Supporting questions: * Are the program or intervention objectives identified? * Are the evaluation objectives identified? * Is the reason (i.e., intended use or purpose) for deciding to conduct an evaluation identified? * Is the link between program and evaluation objectives identified? * Is any information provided on how evaluation results should be used for decision making? Assessment questions: Does the evaluation identify evaluation criteria? Supporting questions: * Have the criteria or standards that will be used to measure performance been identified? Assessment questions: Does the evaluation identify the evaluation team and any conflicts of interest? Supporting questions: * Is the evaluation team composition identified? * Are potential conflicts of interest identified and/or addressed? Assessment questions: Does the evaluation identify time frames for conducting the evaluation? Assessment questions: Does the evaluation design appear to be appropriate? Supporting questions: * Is the overall evaluation design identified? * Have the assumptions underlying the design been articulated? * Have design limitations been identified? If so, are the ways in which these limitations were addressed identified? * Overall, is the identified evaluation design appropriate to answer the evaluation questions? Assessment questions: Do participant/sample selection methods and sample size appear to be generally appropriate? Supporting questions: * What are the criteria for selecting or sampling participants, respondents, or other entities? * Is participant selection bias acknowledged? If so, was it addressed? * If probability sampling is used: - Is the sampling strategy appropriate? - Is the sampling frame appropriate? - Is the sampling unit described? * If nonprobability sampling is used, is the sampling strategy appropriate? * If this is a comparison study, does it address how participants, respondents, or other units are assigned to the comparison groups or selected more generally? * If the evaluation involves human subjects, have Institutional Review Board or other human subjects review approval procedures been identified? * Have sample size calculations (e.g., confidence intervals) or limitations been identified? Assessment questions: Do the measures used for this evaluation appear to be appropriate? Supporting questions: * Have the key measures--that is, input, output, outcome, and/or impact--been identified? * Are measures clearly linked to evaluation questions? * Do the identified measures appear to be appropriate for answering the evaluation questions? * For pre-, post-, or comparison group evaluations, is there parallel measurement for comparison groups--that is, were the same data collected for comparison and treatment groups? * Have possible confounding effects been identified, measured, and/or controlled for? * If an instrument (e.g., survey or data collection instrument) is used to measure key variables, does it appear to be reliable and valid? * Has the possibility of negative side effects or unintended outcomes been considered? * If appropriate, are alternative explanations of the measured impacts discussed? Assessment questions: Do the data collection and analysis methods appear to be appropriate? Supporting questions: * Are data collection methods and procedures discussed? * Are data analysis methods and procedures discussed? * Are data collection and/or database management controls identified? * Were any robustness checks on the methodology or sensitivity analysis conducted? * Are issues related to nonrespondents, dropouts, or missing data identified and/or addressed? Assessment questions: Are the evaluation results specified? Supporting questions: * Are the following clearly documented? - Evaluation findings/results; - Conclusions; - Recommendations; - Lessons learned; - Stakeholder comments; * What are the key evaluation findings/results? * What are the key evaluation conclusions? * What are the recommendations? Based on the analysis of the elements above, do the evaluation findings/results, conclusions, and recommendations appear to be supported? Source: GAO. [End of table] Sampling from Program Evaluation Documents Submitted by PEPFAR Country and Regional Teams: To allow us to generalize to the entire set of evaluations provided by PEPFAR country and regional teams, we randomly selected a sample of 84 of 436 evaluations submitted by CDC and USAID officials in 31 PEPFAR countries and 3 regions. The list of all evaluations was sorted by total approved operational plan budgets for each country or region for fiscal years 2008 through 2010, so that a systematic sample would ensure representation of countries with relatively large, medium, and small budgets for fiscal years 2008 through 2011. After sampling, 6 evaluations--including, for example, baseline and feasibility studies--were found to be out of scope, resulting in a final sample of 78. Results based on random probability samples are subject to sampling error. The sample we drew for our survey is only one of a large number of samples we might have drawn. Because different samples could have provided different estimates, we express our confidence in the precision of our particular sample results as a 95 percent confidence interval. This is the interval that would contain the actual population values for 95 percent of the samples we could have drawn. The margin of error associated with proportion estimates is no more than plus or minus 11 percentage points at the 95 percent level of confidence and estimates of totals have a margin of error no larger than 44 evaluations. For the 18 public health evaluations submitted by OGAC, as well as the 20 and 22 evaluations submitted by CDC and USAID headquarters, respectively, we selected a nonprobability sample based on the type of program (e.g., prevention, treatment, care, or other) evaluated as well as country or countries addressed by each evaluation. Because this is a nonprobability sample, the results of our assessments of these evaluations cannot be used to make inferences about all evaluations managed by OGAC and CDC and USAID headquarters. However, they do represent a mix of the types of evaluations managed by OGAC and CDC and USAID headquarters. Assessing Program Evaluation Documents: Using our evaluation assessment tool, we conducted in-depth analyses of the evaluation documents submitted by the PEPFAR country and regional teams and also those submitted by OGAC, USAID, and CDC headquarters. To do so, one analyst conducted an initial review of the evaluation document and then completed the evaluation assessment tool. The analyst also recorded basic information about each evaluation, including title, author, date of publication, and the country or countries included in the evaluation. For each of the questions in the assessment tool (see table 1), analysts were instructed to (1) respond using "yes," "no," "partial," "not sure," or "not applicable" and (2) summarize or cite relevant information from the evaluation documents. Analysts then were instructed to weigh the evidence and answers to these questions and provide "yes," "no," "partial,", "not sure," or "not applicable" responses for each category. Based on the analysis of the elements addressed in the assessment tool, analysts determined the extent to which each evaluation's findings, conclusions, and recommendations were supported using "yes," "no," "partial," or "not sure" as their responses. This overall determination was not based on a tally of responses to individual elements in the evaluation assessment tool, but rather a synthesis of these responses and an assessment of the contribution of each element to the overall support for the evaluation's findings, conclusions, and recommendations. To help ensure consistency in the application of the standards and questions, the assessors met weekly during the assessment period to clarify the instructions and discuss their observations. After each assessment was complete, a second analyst independently verified the results of the analysis by reviewing the program evaluation document and the completed evaluation assessment tool. In cases where the two analysts did not concur on the results, or where there was a "not sure" response, they met to discuss the evidence and documented a final determination. All the results for the evaluation assessment tools were then entered into a spreadsheet and analyzed. Analyzing Data Generated by Evaluation Assessments: To assess potential associations between key attributes of the sample of 78 evaluations we randomly selected, we calculated chi-square tests and the associated odds ratios for all pairs of the following variables: agency, methods used, evaluation type, and program type. Key results from these analyses are presented in the report. Additional results can be found in appendix III. We also employed logistic regressions to assess which of these variables (i.e., agency, methods used, evaluation type, and program type) had the strongest effects on the extent to which sampled evaluations contained support for findings, conclusions, and recommendations. Assessing State, OGAC, CDC, and USAID Evaluation Policies: To assess State, OGAC, CDC, and USAID evaluation policies, we developed an assessment tool based on nine AEA Roadmap principles. [Footnote 55] For each principle, we developed a question or series of questions asking how the policies addressed the AEA Roadmap principles. One analyst reviewed each agency's policy and filled out the tool by citing evidence that would support the policy's consistency with the AEA Roadmap principle, or a conclusion that no evidence could be found to support adherence to the principle. The analyst then concluded whether the policy was consistent with each principle assessed. A second analyst conducted a review of the completed assessment tools and either concurred with or disputed the conclusion for each principle. In cases where the two analysts did not concur, they met to discuss the evidence and made a final determination. To determine the extent to which operational plans contained evaluation plans, we reviewed OGAC's fiscal year 2011 and 2012 annual guidance to implementing agency headquarters regarding development of the annual PEPFAR headquarters operational plan. We documented instances where the guidance addressed program evaluation and determined whether it constituted instructions to develop an evaluation plan. We conducted similar analysis of OGAC's fiscal year 2011 and 2012 annual guidance to PEPFAR country and regional teams to identify instances where the guidance addressed evaluation and, finally, to determine whether the guidance constituted instructions for developing evaluation plans. In addition, we assessed 11 of the 33 country operational plans and 2 of the 3 regional operational plans submitted to OGAC for fiscal year 2011, the most recent year in which plans were available. We documented instances where these operational plans discussed evaluation and whether they contained evaluation plans. To determine the extent to which the program evaluations documented potential conflicts of interest and the identity of evaluators, we included questions on these two elements in our evaluation assessment tool. Analysts were instructed to respond using "yes," "no," or "partial" to these questions and to cite relevant evidence. After each assessment was complete, a second analyst verified the results of the analysis by reviewing the program evaluation document and the completed evaluation assessment tool. In cases where the two analysts did not concur on the results, they met to discuss the evidence and documented a final determination. All the results for the evaluation assessment tools were then entered into a spreadsheet and analyzed. We searched five Internet databases referenced by OGAC, CDC, and USAID officials to determine the public accessibility of PEPFAR program evaluations. These five sites included the Development Experience Clearinghouse [hyperlink, http://dec.usaid.gov/index.cfm], PubMed [hyperlink, http://www.ncbi.nlm.nih.gov/pubmed/], OVCsupport.net [hyperlink, http://www.ovcsupport.net/s/], AIDSTAR-One [hyperlink, http://www.aidstar-one.com/], and Global HIV M&E Info [hyperlink, https://www.globalhivmeinfo.org/Pages/HomePage.aspx]. For each of these websites, we conducted searches using keywords that would capture any PEPFAR-related program evaluations or documentation, such as "PEPFAR," "evaluation," and "HIV/AIDS." Where applicable, we then captured the results and counted the number of documents that could reasonably be considered documentation of a PEPFAR program evaluation. We conducted this performance audit from August 2011 to May 2012 in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives. [End of section] Appendix II: GAO Evaluation Definitions and Standards: Past GAO work has emphasized evaluation as a key source of information to help agency officials and Congress make decisions about the programs they oversee.[Footnote 56] GAO distinguishes performance measurement--the ongoing monitoring and reporting of program accomplishments--from evaluation, which is defined as individual, systematic studies conducted periodically or on an ad hoc basis to assess how well a program is working.[Footnote 57] Further, according to GAO guidance, experts external to the program, program managers, or both conduct evaluations to examine the performance of a program within a given context to understand not only whether a program works but also how to improve results. GAO guidance identifies four types of evaluation: * Process evaluation. This type of evaluation assesses the degree to which a program is operating as it was intended. It typically assesses program activities' conformance to statutory or regulatory requirements, program design, and professional standards or customer expectations. * Outcome evaluation. This type of evaluation assesses the degree to which a program achieves its outcome-oriented objectives. It focuses on outputs and outcomes (including unintended effects) to judge program effectiveness, but may also assess program process to understand how outcomes are produced. * Impact evaluation. This is a form of outcome evaluation that assesses the net effect of a program by comparing program outcomes with an estimate of what would have happened in the absence of the program. Impact evaluation is used when external factors are known to influence the program's outcomes, in order to isolate the program's contribution to achievement of its objectives. * Cost-benefit or cost-effectiveness analysis. This type of evaluation compares a program's outputs or outcomes with the costs to produce them. Cost-effectiveness analysis assesses the cost of meeting a single objective and can be used to identify the least costly alternative for meeting that goal. In addition, GAO guidance provides basic information about the more commonly used evaluation methods; introduces key issues in planning evaluation studies of federal programs to best meet decision makers' needs; and describes different types of evaluations for answering varied questions about program performance, the process of designing evaluation studies, and key issues to consider in ensuring overall study quality. Further, the guidance recommends standards for evaluation design, including establishing evaluation objectives, identifying constraints, and assessing the appropriateness of the evaluation design.[Footnote 58] [End of section] Appendix III: Statistical Comparison of PEPFAR Evaluations: We conducted a statistical analysis of the adequacy of support for findings in evaluations provided to us by CDC and USAID, to determine whether the adequacy of support differed by agency, by methods used, or by type of evaluation. Our analysis indicated that fully supported findings were more likely in CDC's evaluations than in USAID's evaluations; in evaluations that used quantitative methods than in evaluations that used qualitative or mixed methods;[Footnote 59] and in cost-benefit or impact evaluations, as well as outcome evaluations, than in process evaluations. However, while CDC's evaluations' findings were more likely to be fully supported than USAID's evaluations' findings, the difference was not statistically significant after we accounted for the method used in the evaluations. This lack of statistical significance suggests that the difference was driven partly by the agencies' choice of evaluation method.[Footnote 60] Table 6 shows technical details of our statistical analysis of the level of support for findings in CDC and USAID evaluations. Table 6: Statistical Analysis of Support for Findings in CDC and USAID Evaluations, by Agency, Methods Used, and Type of Evaluation: Total; Support for findings: Partial or none: 46; 59.0%; Support for findings: Full: 32; 41.0%; Total: 78; 100.0%; Odds on full support: [Empty]; Odds: ratios: [Empty]. By agency: CDC; Support for findings: Partial or none: 12; 40.0%; Support for findings: Full: 18; 60.0%; Total: 30; 100.0%; Odds on full support: 1.50; Odds: ratios: 3.64. USAID; Support for findings: Partial or none: 34; 70.8%; Support for findings: Full: 14; 29.2%; Total: 48; 100.0%; Odds on full support: 0.41; Odds: ratios: REF. Chi-square statistic (L[2]) = 7.28 with 1 degree of freedom, P-value = .007. By methods used: Qualitative or mixed; Support for findings: Partial or none: 41; 80.4%; Support for findings: Full: 10; 19.6%; Total: 51; 100.0%; Odds on full support: 0.24; Odds: ratios: REF. Quantitative; Support for findings: Partial or none: 5; 18.5%; Support for findings: Full: 22; 81.5%; Total: 27; 100.0%; Odds on full support: 4.40; Odds: ratios: 18.04. Chi-square statistic (L[2]) = 29.25 with 1 degree of freedom, P-value < .001. By type of evaluation: Cost-benefit or impact; Support for findings: Partial or none: 2; 16.7%; Support for findings: Full: 10; 83.3%; Total: 12; 100.0%; Odds on full support: 5.00; Odds: ratios: 23.00. Outcome; Support for findings: Partial or none: 21; 55.3%; Support for findings: Full: 17; 44.7%; Total: 38; 100.0%; Odds on full support: 0.81; Odds: ratios: 3.72. Process; Support for findings: Partial or none: 23; 82.1%; Support for findings: Full: 5; 17.9%; Total: 28; 100.0%; Odds on full support: 0.22; Odds: ratios: REF. Chi-square statistic (L[2] ) = 16.26 with 2 degrees of freedom, P-value < .001. Source: GAO analysis of CDC and USAID evaluations. Notes: We collapsed two categories of the dependent variable, "support for findings," into one category, collapsing "partial support" and "no support" into "partial or none." We also collapsed categories of the two independent variables: for the variable "methods used," we collapsed "qualitative methods" and "mixed methods" into "qualitative or mixed methods," and for "type of evaluation," we collapsed "cost- benefit evaluations" and "impact evaluations" into "cost-benefit or impact." We collapsed these categories after preliminary investigations revealed that doing so would result in no statistically significant loss of information. These preliminary investigations involved comparing likelihood-ratio chi-square statistics for expanded and collapsed versions of the tables. Where the difference in chi- squares for the tables compared is not significant, given the difference in degrees of freedom, it can reasonably be concluded that no significant information was lost as a result of collapsing. REF signifies the category chosen as the referent category, or denominator, in calculating the odds ratios. [End of table] In table 6, the chi-square statistics at the base of each of the three panels show that the adequacy of support for findings varied significantly between the two agencies and differed significantly based on the methods used and type of evaluations. The odds ratios in the far-right column show that the odds of evaluations' being fully supported were 3.6 times greater for CDC than for USAID; 18 times greater for quantitative evaluations than for qualitative or mixed- methods evaluations; 23 times greater for cost-benefit or impact evaluations than for process evaluations; and 3.7 times greater for outcome evaluations than for process evaluations.[Footnote 61] In addition, we estimated binary logistic regression models to determine whether the difference in adequacy of support for findings in CDC's and USAID's evaluations resulted from differences in the methods used or differences in the types of evaluations conducted.[Footnote 62] Table 7 shows the odds ratios that result from fitting logistic regression models to estimate the effects of the three different factors (agency, methods used, and type of evaluation) on the adequacy of support for findings. Models 1, 2, and 3 are bivariate models, which regress "support" on dummy variables for agency, methods used, and type of evaluation, with each variable considered one at a time. These produce the same odds ratios that we obtained from the observed data in table 6. In contrast, model 4 estimates the effects of agency and methods simultaneously, and model 5 estimates the effects of agency and type of evaluation. In comparing these models, we found that controlling for the methods used (model 4) rendered insignificant the differences between agencies in adequacy of support for findings, whereas controlling for type of evaluation (model 5) did not. Table 7: Odds Ratios from Logistic Regression Models, Where Support for Findings Was Regressed on Agency, Methods Used, and Type of Evaluation: Effects included: CDC; Model: 1: 3.64[A]; Model: 2: [Empty]; Model: 3: [Empty]; Model: 4: 0.88[A]; Model: 5: 3.90[A]. Effects included: Quantitative methods; Model: 1: [Empty]; Model: 2: 18.04[A]; Model: 3: [Empty]; Model: 4: 19.38[A]; Model: 5: [Empty]. Effects included: Cost-benefit or impact evaluation; Model: 1: [Empty]; Model: 2: [Empty]; Model: 3: 23.00[A]; Model: 4: [Empty]; Model: 5: 21.04[A]. Effects included: Outcome evaluation; Model: 1: [Empty]; Model: 2: [Empty]; Model: 3: 3.72[A]; Model: 4: [Empty]; Model: 5: 4.94[A]. Source: GAO analysis of CDC and USAID evaluations. Note: The three-category variable representing the type of evaluation requires two dummy variables, one contrasting the cost-benefit or impact evaluation with the process evaluations, and the other contrasting the outcome evaluations with process evaluations. [A] Odds ratio is statistically significant at the .05 level. [End of table] [End of section] Appendix IV: Comments from the Department of State: United States Department of State: Chief Financial Officer: Washington, DC 20520: May 21, 2012: Dr. Loren Yager: Managing Director: International Affairs and Trade: Government Accountability Office: 441 G Street, N.W. Washington, D.C. 20548-0001: Dear Dr. Yager: We appreciate the opportunity to review your draft report, "President Emergency Plan For Aids Relief: Agencies Can Enhance Evaluation Quality, Planning, and Dissemination" GAO Job Code 320857. The enclosed Department of State comments are provided for incorporation with this letter as an appendix to the final report. If you have any questions concerning this response, please contact Leigh Ann Monk-Reyes, Program Support Officer, Office of the U.S. Global AIDS Coordinator at (202) 663-2753. Sincerely, Signed by: James L. Millette: cc: GAO — David Gootnick: S/GAC— Eric Goosby: State/OIG — Evelyn Klemstine: [End of letter] Department of State Comments on GAO Draft Report: President's Emergency Plan For Aids Relief: Agencies Can Enhance Evaluation Quality, Planning, and Dissemination (GAO-12-673, GAO Code 320857): Thank you for the opportunity to comment on your draft report entitled, "President's Emergency Plan for AIDS Relief Agencies Can Enhance Evaluation Quality, Planning, and Dissemination, GA0-12-673, Job Code 320857." The GAO report included four recommendations for the Department of State's Office of the U.S. Global AIDS Coordinator (S/GAC). The Department of States' Office of the U.S. Global AIDS Coordinator (S/GAC) and the PEPFAR implementing agencies appreciate the work conducted by the GAO to produce these findings and this draft report. These results reflect an earlier phase of the larger PEPFAR evaluation efforts, and although this work has improved, some of these findings remain equally valid today. This GAO report provides guidance on several issues, and S/GAC will coordinate with the implementing agencies to carry out these recommendations. First, GAO recommends that the State Department coordinate with the Center for Disease Control (CDC) and United States Agency for International Development (USATD) to improve the PEPFAR implementing agencies' and country and regional teams' adherence to common evaluation standards. In response, State agrees to support the partner agencies in their implementation of agency evaluation policies and practices. These agency policies are generally consistent with the GAO- cited AEA standards. Headquarters based evaluations generally comply with these standards, but more effort will be important in each country. Different types of evaluations are conducted typically in the HQ (e.g., impact, outcome, operations research) and country (e.g., output, process, formative) settings, and S/GAC and PEPFAR partners will work over this next year to develop strategies to ensure the appropriate application of these standards accordingly. Second, GAO recommends that State require its CDC and USAID implementing agency headquarters, and country and regional teams, to include PEPFAR evaluation plans in their respective annual operational plans. In response, State agrees and will work through interagency processes to define overall PEPFAR evaluation objectives and plan, and apply this framework to agency-specific plans to account for PEPFAR- supported work and to evaluation planning for appropriate PEPFAR- supported countries. This latter effort requires collaborative planning with National partner governments and stakeholders. An evaluation plan also should be considered in conjunction with a broader research agenda for each country. These processes demand considerable effort in-country, and in the context of ongoing programmatic issues, this work will evolve over inc next couple of years. The annual operational plan can be used as a mechanism to submit and update plans, once they have been developed. Third, GAO recommends that State, CDC and USAID provide detailed guidance at implementing agency headquarters and to country and regional teams on assessing, ensuring, and documenting the independence and competence of PEPFAR program evaluator qualifications. In response, State agrees and will support agencies in the application of this standard to evaluation studies. Among HQ supported studies, a peer-review process ensures compliance with these recommendations. In-country studies typically are not as rigorous in design or objective, but over the next year S/GAC will work with the implementing agencies to develop and implement protocols to document the competence and appropriate independence of evaluators. Fourth, GAO recommends that State, CDC and USAID increase on-line accessibility of PEPFAR evaluation results. In response, State agrees and will collaborate with and support agency partners in the implementation of agency dissemination practices. This work will involve assessing the current status of online dissemination activities and platforms, leading to development of agency strategies to strengthen these efforts and improve the availability of evaluation results. S/GAC also will develop the www.PEPFAR.gov website to maximize linkages to these agency resources and expanding access to this information. [End of section] Appendix V: GAO Contact and Staff Acknowledgments: GAO Contact: David Gootnick, (202) 512-3149 or gootnickd@gao.gov: Staff Acknowledgments: In addition to the contact named above, Jim Michels, Assistant Director; Todd M. Anderson; Chad Davenport; David Dornisch; Lorraine Ettaro; Justin Fisher; Brian Hackney; Kay Halpern; Fang He; Reid Lowe; Grace Lui; and Erika Navarro made key contributions to this report. In addition to these staff, the following GAO staff assisted by conducting in-depth assessments of selected evaluations: Sada Aksartova, Gergana Danailova-Trainor, Leah DeWolf, Rachel Girshick, Jordan Holt, Kara Marshall, Jeff Miller, Steven Putansu, Mona Sehgal, and Doug Sloane. Sushmita Srikanth and Katy Crosby assisted with quality assurance reviews. [End of section] Related GAO Products: President's Emergency Plan for AIDS Relief: Program Planning and Reporting. [hyperlink, http://www.gao.gov/products/GAO-11-785]. Washington, D.C.: July 29, 2011. Global Health: Trends in U.S. Spending for Global HIV/AIDS and Other Health Assistance in Fiscal Years 2001-2008. [hyperlink, http://www.gao.gov/products/GAO-11-64]. Washington, D.C.: October 8, 2010. President's Emergency Plan for AIDS Relief: Efforts to Align Programs with Partner Countries' HIV/AIDS Strategies and Promote Partner Country Ownership. [hyperlink, http://www.gao.gov/products/GAO-10-836]. Washington, D.C.: September 20, 2010. President's Emergency Plan for AIDS Relief: Partner Selection and Oversight Follow Accepted Practices but Would Benefit from Enhanced Planning and Accountability. [hyperlink, http://www.gao.gov/products/GAO-09-666]. Washington, D.C.: July 15, 2009. Global HIV/AIDS: A More Country-Based Approach Could Improve Allocation of PEPFAR Funding. [hyperlink, http://www.gao.gov/products/GAO-08-480]. Washington, D.C.: April 2, 2008. Global Health: Global Fund to Fight AIDS, TB and Malaria Has Improved Its Documentation of Funding Decisions but Needs Standardized Oversight Expectations and Assessments. [hyperlink, http://www.gao.gov/products/GAO-07-627]. Washington, D.C.: May 7, 2007. Global Health: Spending Requirement Presents Challenges for Allocating Prevention Funding under the President's Emergency Plan for AIDS Relief. [hyperlink, http://www.gao.gov/products/GAO-06-395]. Washington, D.C.: April 4, 2006. Global Health: The Global Fund to Fight AIDS, TB and Malaria Is Responding to Challenges but Needs Better Information and Documentation for Performance-Based Funding. [hyperlink, http://www.gao.gov/products/GAO-05-639]. Washington, D.C.: June 10, 2005. Global HIV/AIDS Epidemic: Selection of Antiretroviral Medications Provided under U.S. Emergency Plan Is Limited. [hyperlink, http://www.gao.gov/products/GAO-05-133]. Washington, D.C.: January 11, 2005. Global Health: U.S. AIDS Coordinator Addressing Some Key Challenges to Expanding Treatment, but Others Remain. [hyperlink, http://www.gao.gov/products/GAO-04-784]. Washington, D.C.: June 12, 2004. Global Health: Global Fund to Fight AIDS, TB and Malaria Has Advanced in Key Areas, but Difficult Challenges Remain. [hyperlink, http://www.gao.gov/products/GAO-03-601]. Washington, D.C.: May 7, 2003. [End of section] Footnotes: [1] Pub. L. No. 110-293, 122 Stat. 2918. [2] See Pub. L. No. 110-293, § 101(a). [3] See Pub. L. No. 110-293, § 301(c)(3). [4] The Consolidated Appropriations Act directed GAO to review PEPFAR "results monitoring activities," among other things. See Pub. L. No. 110-161, § 668(d), 121 Stat. 1844, 2353 (2007). The 2008 Leadership Act directed GAO to provide a report including "a description and assessment of the monitoring and evaluation practices and policies in place" for U.S. bilateral global HIV/AIDS programs, among other things. See Pub. L. No. 110-293, § 101(d). In response to these directives, we also issued President's Emergency Plan for AIDS Relief: Program Planning and Reporting [hyperlink, http://www.gao.gov/products/GAO-GAO-11-785] in July 2011. A list of related GAO products, including past work conducted in response to these congressional mandates, is provided at the end of this report. [5] The American Evaluation Association, an international professional association for evaluators of programs, products, personnel, and policies, developed general principles for the work of professionals in everyday practice and to inform evaluation clients and the general public of expectations for ethical behavior. For more information, see American Evaluation Association, An Evaluation Roadmap for a More Effective Government (Washington, D.C.: 2010), accessed March 31, 2012, [hyperlink, http://www.eval.org/Publications/GuidingPrinciples.asp]. [6] The 31 countries were Angola, Botswana, Cambodia, China, Côte d'Ivoire, Democratic Republic of the Congo, Dominican Republic, Ethiopia, Ghana, Guyana, Haiti, India, Indonesia, Kenya, Lesotho, Malawi, Mozambique, Namibia, Nigeria, Russia, Rwanda, South Africa, Sudan, Swaziland, Tanzania, Thailand, Uganda, Ukraine, Vietnam, Zambia, and Zimbabwe. The 3 regions were the Caribbean, Central America, and Central Asia. [7] Other PEPFAR implementing agencies are the Departments of State, Defense, Labor, and Commerce and the Peace Corps. Additional HHS offices and agencies receiving PEPFAR resources are the Office of Global Affairs, the Food and Drug Administration, the Health Resources and Services Administration, the National Institutes of Health, and the Substance Abuse and Mental Health Services Administration. [8] CDC's Division of Global HIV/AIDS (DGHA) and USAID's Office of HIV/AIDS (OHA) have responsibility for coordinating PEPFAR program implementation. [9] Prevention-related program areas are mother-to-child transmission, abstinence/be faithful, other sexual prevention, blood safety, injection safety, medical male circumcision, prevention among injecting and noninjecting drug users, and testing and counseling. Treatment-related program areas are antiretroviral drugs, adult treatment, and pediatric treatment. Care-related program areas are adult care and support, pediatric care and support, orphans and vulnerable children, and tuberculosis/HIV. Other program areas are laboratory infrastructure, strategic information, and health systems strengthening. [10] According to OGAC guidance, an implementing mechanism is a grant, cooperative agreement, or contract in which a discrete dollar amount is passed through a prime implementing partner and for which the prime implementing partner is held fiscally accountable. [11] See GAO, President's Emergency Plan for AIDS Relief: Partner Selection and Oversight Follow Accepted Practices but Would Benefit from Enhanced Planning and Accountability, [hyperlink, http://www.gao.gov/products/GAO-09-666] (Washington, D.C.: July 15, 2009). [12] See GAO, President's Emergency Plan for AIDS Relief: Program Planning and Reporting, [hyperlink, http://www.gao.gov/products/GAO-11-785] (Washington, D.C.: July 29, 2011). [13] Public health surveillance is the continuous, systematic collection, analysis, and interpretation of health-related data needed for the planning, implementation, and evaluation of public health programs. Surveillance can serve as an early warning system for impending public health emergencies; document the impact of an intervention, or track progress toward specified goals; and monitor and clarify the epidemiology of health problems, to allow priorities to be set and to inform public health policy and strategies. [14] In March 2011, in an article published in the Journal of Acquired Immune Deficiency Syndromes, the U.S. Global AIDS Coordinator and other senior PEPFAR officials wrote that given PEPFAR's emergency response during its first 5 years, "state-of-the-art monitoring, evaluation, and research methodologies were not fully integrated or systematically performed." As such, for PEPFAR's second 5 years, to demonstrate value and impact in resource-constrained environments, PEPFAR adopted an "implementation science" framework, which, in turn, includes monitoring and evaluation, operations research, and impact evaluation as its main components. See "Implementation Science for the U.S. President's Emergency Plan for AIDS Relief (PEPFAR)," Journal of Acquired Immune Deficiency Syndromes, vol. 56, no. 3 (March 1, 2011). [15] Department of Health and Human Services, Centers for Disease Control and Prevention, "Framework for Program Evaluation in Public Health," Morbidity and Mortality Weekly Report: Recommendations and Reports, vol. 48, no. RR-11 (September 1999), accessed May 23, 2012, [hyperlink, http://www.cdc.gov/eval/framework/index.htm]. [16] U.S. Agency for International Development, Bureau for Policy, Planning, and Learning, Office of Learning, Evaluation, and Research, Evaluation: Learning from Experience, USAID Evaluation Policy (Washington, D.C.: January 2011), accessed May 23, 2012, [hyperlink, http://www.usaid.gov/evaluation/USAIDEvaluationPolicy.pdf]. [17] U.S. Agency for International Development, "USAID Evaluation Policy: Automated Directives System, Chapter 203: Assessing and Learning" (2010). The Automated Directives System is USAID's directives management program. Agency policy directives, required procedures, and helpful, optional material are drafted, cleared, and issued through this system. Agency employees must adhere to these policy directives and required procedures. [18] GAO, Designing Evaluations: 2012 Revision, [hyperlink, http://www.gao.gov/products/GAO-12-208G] (Washington, D.C.: January 2012). This document addresses the logic of program evaluation designs, describes different types of evaluations and the process for designing them, and highlights issues related to overall evaluation quality. Further, it updates [hyperlink, http://www.gao.gov/products/GAO/PEMD-10.1.4] (Designing Evaluations, March 1991), which we used to develop our evaluation assessment tool. For more information, see appendix I. [19] Two additional AEA Roadmap principles that we did not address in this report relate to integrating evaluation into planning, developing, and managing programs and providing stable, continuous funding for evaluation. [20] According to a journal article written by OGAC and other officials, PHEs have been relatively limited in number and disparate in the range of research questions. See Padian et al., "Implementation Science for the U.S. President's Emergency Plan for AIDS Relief (PEPFAR)." [21] PEPFAR's public health evaluation interagency subcommittee oversees PEPFAR policies and procedures for proposing, approving, and disseminating the results of PEPFAR public health evaluations. OGAC's Office of Research and Science, established in October 2011, coordinates the work of the PHE subcommittee and their interactions with implementing agencies, country teams, and other stakeholders. [22] We drew a probability sample of 84 of 436 evaluations submitted by CDC and USAID officials in 31 PEPFAR countries and 3 regions. Six cases were found to be out of scope, resulting in a sample of 78. Results based on random probability samples are subject to sampling error. The sample we drew for our survey is only one of a large number of samples we might have drawn. Because different samples could have provided different estimates, we express our confidence in the precision of our particular sample results as a 95 percent confidence interval. This is the interval that would contain the actual population values for 95 percent of the samples we could have drawn. The margin of error associated with proportion estimates is no more than plus or minus 11 percentage points at the 95 percent level of confidence. The margin of error for totals is not more than 44 evaluations. [23] Qualitative methods include collecting data through interviews, focus groups, document or literature reviews, and observation, and analyzing data by discerning, examining, comparing, and contrasting meaningful patterns or themes in qualitative data. Quantitative methods typically involve collecting quantifiable information through probability sampling and using various forms of statistical analysis to generalize results. Evaluations using mixed methods employ a combination of qualitative and quantitative data collection and analysis techniques. See appendix III for more information. [24] From the 18 completed PHEs submitted by OGAC, we selected a judgmental sample of 7 evaluations based on the type of program (e.g., prevention, treatment, care, or other) evaluated as well as the country or countries addressed by each evaluation. Because this is a judgmental sample, results should not be used to make inferences about all evaluations managed by OGAC; however, the PHEs selected represent a mix of the types of evaluations managed by OGAC. See appendix I for more information. [25] From the 42 evaluations we received from CDC and USAID headquarters (20 from CDC, 22 from USAID), we selected a judgmental sample of 15 evaluations (7 from CDC, 8 from USAID) based on the type of program (e.g., prevention, treatment, care, or other) evaluated as well as the country or countries addressed by each evaluation. Because this is a judgmental sample, results should not be used to make inferences about all evaluations managed by CDC and USAID headquarters. However, they represent a mix of the types of evaluations managed by CDC and USAID headquarters. See appendix I for more information. [26] We drew a probability sample of 84 of 436 evaluations submitted by CDC and USAID officials in 31 PEPFAR countries and 3 regions. Six cases were found to be out of scope, resulting in a sample of 78. Results based on random probability samples are subject to sampling error. The sample we drew for our survey is only one of a large number of samples we might have drawn. Because different samples could have provided different estimates, we express our confidence in the precision of our particular sample results as a 95 percent confidence interval. This is the interval that would contain the actual population values for 95 percent of the samples we could have drawn. The margin of error associated with proportion estimates is no more than plus or minus 11 percentage points at the 95 percent level of confidence. The margin of error for totals is not more than 44 evaluations. [27] A June 2011 assessment of 56 USAID evaluations--including 8 evaluations of programs funded at least in part through PEPFAR--found that 41 of the evaluations used appropriate data collection methods, while 15 evaluations used data collection methods that were deemed to be partially or somewhat appropriate. See Office of the Director of U.S. Foreign Assistance, A Meta Evaluation of Foreign Assistance Evaluations (Washington, D.C.: June 2011), accessed October 2011, [hyperlink, http://pdf.usaid.gov/pdf_docs/PCAAC273.pdf]. [28] The AEA Roadmap advises agencies to publish policies and procedures for conducting evaluations within their purview. These policies and procedures should provide guidance to evaluators, identifying the kinds of evaluations to be performed and defining administrative steps for developing evaluation plans, setting priorities, ensuring evaluation product quality and independence, and publishing evaluation reports. [29] State's evaluation policy requires evaluation of all large programs, projects, and activities at least once in their lifetime or every 5 years, whichever is less. Further, the policy notes that some State bureaus and OGAC do not directly implement projects or programs and, instead, provide funds to other agencies or operating units. In these cases, State bureaus and OGAC are expected to ensure that implementing organizations carry out evaluations of programs, projects, and activities consistent with State policy. For more information see State, Department of State Program Evaluation Policy (Washington, D.C.: Feb. 23, 2012), accessed March 31, 2012, [hyperlink, http://www.state.gov/s/d/rm/rls/evaluation/2012/184556.htm]. [30] For current guidance, see The President’s Emergency Plan for AIDS Relief, FY 2012 Technical Considerations Provided by PEPFAR Technical Working Groups for FY 2012 COPs and ROPs (Washington, D.C.: 2011), accessed March 31, 2012, [hyperlink, http://www.pepfar.gov/documents/organization/169737.pdf]. [31] Department of Health and Human Services, Centers for Disease Control and Prevention, "Framework for Program Evaluation in Public Health." CDC's Program Evaluation Unit sets standards and expectations for evaluation and provides tools, technical assistance, and resources to enhance CDC's evaluation efforts. [32] USAID, Bureau for Policy, Planning, and Learning, Office of Learning, Evaluation, and Research, Evaluation: Learning from Experience, USAID Evaluation Policy. [33] USAID reported in February 2012 that it had taken several steps to implement the new evaluation policy, including training USAID staff in evaluation and establishing an evaluation point of contact in every USAID field mission. USAID's Office of HIV/AIDS provides technical assistance, training, and other support to USAID mission officials and other implementing partners responsible for implementing PEPFAR programs. See USAID, Evaluation Policy: Year One, First Annual Report and Plan for 2012 and 2013 (Washington, D.C.: February 2012), accessed March 31, 2012, [hyperlink, http://www.usaid.gov/evaluation/USAIDEvaluationPolicy-YearOne.pdf]. [34] The AEA Roadmap states that major program components should prepare annual and multiyear evaluation plans, taking into account the need for evaluation results to inform program budgeting, reauthorization, strategic planning, program development and management, and questions of program effectiveness. [35] An addendum to OGAC's fiscal year 2012 operational plan guidance, issued in November 2011, states that some country teams could submit a country implementation science strategy, as part of a pilot initiative, which would include descriptions of monitoring and evaluation activities, current knowledge gaps, reference to implementation science strategies and priorities, descriptions of ongoing evaluations, and implementation science and priorities for the coming year. OGAC officials further clarified that the pilot initiative would begin in fiscal year 2013. At the time of our review, the fiscal year 2012 country and regional operational plans had not yet been approved, and thus it is too early to determine how country and regional teams have implemented this new guidance. [36] We reviewed 11 country operational plans and 2 regional operational plans for fiscal year 2011, available at [hyperlink, http://www.pepfar.gov/countries/cop/2011/index.htm]. [37] We sent a total of 67 questionnaires to CDC and USAID officials in the 31 PEPFAR countries and 3 PEPFAR regions that were required to submit PEPFAR country or regional operational plans in fiscal year 2010. The questionnaires took form as spreadsheets listing each agency's PEPFAR implementing mechanisms--a proxy for program activity-- from fiscal years 2008 through 2010 and prompted officials to indicate whether each implementing mechanism had an ongoing or completed evaluation. See appendix I for more information. [38] CDC and USAID officials responding to our survey also indicated that a higher percentage of program activities had evaluations that were ongoing for programs starting in later years, that some programs at the time of our survey were not sufficiently completed for evaluation, and that evaluations were planned for later in the program life cycle. [39] CDC and USAID officials reported roughly the same percentages of PEPFAR programs with ongoing or completed evaluations. CDC officials more frequently reported that evaluations were broader than individual program activities, compared to USAID officials. [40] According to the AEA Roadmap, agencies should safeguard the independence of evaluators in the design and performance of evaluations and in presentation of the results. Agencies should also promote objectivity in examining program operations and impact. [41] The AEA Roadmap states that evaluators should be professionals drawn from an interdisciplinary field that encompasses many areas of expertise, and with the appropriate training and experience for the evaluation activity. [42] The AEA Roadmap states that federal agencies should publicly disseminate evaluation results systematically, broadly, and in a timely manner, making them easily accessible and usable through the Internet, and should make evaluation data and methods available to ensure transparency. [43] The Institute of Medicine recommended in 2007 that the U.S. Global AIDS Initiative increase its contribution to the global evidence base for HIV/AIDS programs by learning about and sharing what works. See Institute of Medicine, PEPFAR Implementation: Progress and Promise (Washington, D.C.: 2007). [44] CDC's DGHA requires approval for public dissemination of reports and articles. [45] The USAID evaluation policy requires each program office to submit both final evaluation results and summaries of its findings to the DEC within 3 months of their completion. This applies to both completed evaluation reports and the final drafts of any report submitted to USAID. The policy further requires evaluation data to be warehoused for future use but does not denote a specific repository for that purpose. [46] The report also notes that USAID was designing the DEC to make it more user friendly and useful for USAID staff and external stakeholders. See USAID, Report to Congress on Program Review and Evaluation Process (Washington, D.C.: Mar. 30, 2011). [47] USAID also reported that, from 2010 to 2011, the agency had increased the number of evaluation reports submitted to the DEC. See USAID, Evaluation Policy: Year One, First Annual Report and Plan for 2012 and 2013. [48] For example, country and regional teams submitted evaluations published in journals such as The Lancet, Journal of Acquired Immune Deficiency Syndromes, and Journal of the American Medical Association. In addition, CDC publishes evaluation research findings in its Morbidity and Mortality Weekly Report, CDC's "primary vehicle for scientific publication of timely, reliable, authoritative, accurate, objective, and useful public health information and recommendations." For more information, see [hyperlink, http://www.cdc.gov/mmwr]. [49] According to CDC and OGAC officials, public dissemination may be limited by concerns raised by partner country ministries of health or implementing partners' copyright concerns. [50] OVCsupport.net is an online repository for sharing information on programs supporting orphans and vulnerable children. For more information, see [hyperlink, http://www.ovcsupport.net]. AIDStar-One is managed by USAID's Implementation Support Division and provides "targeted assistance in knowledge management, program implementation support, technical leadership, program sustainability, and strategic planning." For more information, see [hyperlink, http://www.aidstar-one.com]. [51] An implementing mechanism is a grant, cooperative agreement, or contract in which a discrete dollar amount is passed through a prime implementing partner entity and for which the prime implementing partner is held fiscally accountable. [52] We used USAID's 2010 guidance, which was in effect for fiscal years 2008 through 2010 (the time frame used to request evaluations from implementing agency headquarters and country and regional team officials). [53] See Department of Health and Human Services, Centers for Disease Control and Prevention, "Framework for Program Evaluation in Public Health" Morbidity and Mortality Weekly Report: Recommendations and Reports, vol. 48, no. RR-11 (1999), accessed October 2011, [hyperlink, http://www.cdc.gov/eval/framework/index.htm]. [54] See OECD, Development Assistance Committee Guidelines and Reference Series, Quality Standards for Development Evaluation (Paris: April 2010), available at [hyperlink, http://www.oecd.org/dataoecd/55/0/44798177.pdf]. [55] The AEA Roadmap principles include scope, coverage, analytic approaches and methods, resources, professional competence, evaluation plans, dissemination of evaluation results, evaluation policies and procedures, and independence. [56] See GAO, Program Evaluation: Experienced Agencies Follow a Similar Model for Prioritizing Research, [hyperlink, http://www.gao.gov/products/GAO-11-176] (Washington, D.C.: Jan. 14, 2011). [57] See GAO, Performance Measurement and Evaluation: Definitions and Relationships, [hyperlink, http://www.gao.gov/products/GAO-11-646SP] (Washington, D.C.: May 2011). [58] See GAO, Designing Evaluations: 2012 Revision, [hyperlink, http://www.gao.gov/products/GAO-12-208G] (Washington, D.C.: January 2012). [59] Qualitative methods include collecting data through interviews, focus groups, document or literature reviews, and observation, and analyzing data by discerning, examining, comparing, and contrasting meaningful patterns or themes in qualitative data. Quantitative methods typically involve collecting quantifiable information through probability sampling and using various forms of statistical analysis to generalize results. Evaluations using mixed methods employ a combination of qualitative and quantitative data collection and analysis techniques. [60] Additional analyses (not shown) indicate that 67 percent of the CDC evaluations and 15 percent of the USAID evaluations used quantitative methods. [61] To calculate the odds on findings' being fully supported in CDC evaluations (shown in table 6 under "Odds on full support"), we divided the number of evaluations with full support by the number with partial or no support (18/12 = 1.5). We performed a similar calculation of odds on findings' being fully supported in USAID evaluations (14/34 = 0.41). The results of these calculations imply that 1.5 CDC evaluations were fully supported for every CDC evaluation that was not, while 0.41 USAID evaluations were fully supported for every evaluation that was not. The ratio of these two odds--1.50/0.41 = 3.64 (shown in the far-right column of table 6)--shows that the odds on evaluation findings' being fully supported were 3.6 times greater for CDC than for USAID. [62] A June 2011 assessment of 56 USAID evaluations—including 8 evaluations of programs funded at least in part through PEPFAR—found that the majority of the evaluations used mixed methods and that about a fourth of the evaluations employed quasi-experimental or statistical evaluation methods. See Office of the Director of U.S. Foreign Assistance, A Meta Evaluation of Foreign Assistance Evaluations (Washington, D.C.: 2011), accessed March 2011, [hyperlink, http://pdf.usaid.gov/pdf_docs/PCAAC273.pdf]. [End of section] GAO’s Mission: The Government Accountability Office, the audit, evaluation, and investigative arm of Congress, exists to support Congress in meeting its constitutional responsibilities and to help improve the performance and accountability of the federal government for the American people. GAO examines the use of public funds; evaluates federal programs and policies; and provides analyses, recommendations, and other assistance to help Congress make informed oversight, policy, and funding decisions. GAO’s commitment to good government is reflected in its core values of accountability, integrity, and reliability. Obtaining Copies of GAO Reports and Testimony: The fastest and easiest way to obtain copies of GAO documents at no cost is through GAO’s website [hyperlink, http://www.gao.gov]. Each weekday afternoon, GAO posts on its website newly released reports, testimony, and correspondence. To have GAO e-mail you a list of newly posted products, go to [hyperlink, http://www.gao.gov] and select “E- mail Updates.” Order by Phone: The price of each GAO publication reflects GAO’s actual cost of production and distribution and depends on the number of pages in the publication and whether the publication is printed in color or black and white. Pricing and ordering information is posted on GAO’s website, [hyperlink, http://www.gao.gov/ordering.htm]. Place orders by calling (202) 512-6000, toll free (866) 801-7077, or TDD (202) 512-2537. Orders may be paid for using American Express, Discover Card, MasterCard, Visa, check, or money order. Call for additional information. Connect with GAO: Connect with GAO on facebook, flickr, twitter, and YouTube. Subscribe to our RSS Feeds or E mail Updates. Listen to our Podcasts. Visit GAO on the web at [hyperlink, http://www.gao.gov]. To Report Fraud, Waste, and Abuse in Federal Programs: Contact: Website: [hyperlink, http://www.gao.gov/fraudnet/fraudnet.htm]; E-mail: fraudnet@gao.gov; Automated answering system: (800) 424-5454 or (202) 512-7470. Congressional Relations: Katherine Siggerud, Managing Director, siggerudk@gao.gov, (202) 512-4400 U.S. Government Accountability Office, 441 G Street NW, Room 7125 Washington, DC 20548. Public Affairs: Chuck Young, Managing Director, youngc1@gao.gov, (202) 512-4800 U.S. Government Accountability Office, 441 G Street NW, Room 7149 Washington, DC 20548.