Advice for Designing and Administering Evaluation Tools

Below you can find our Research Board’s answers to some of the most common questions asked about measuring outcomes and using instruments like the ones found in this Toolkit. Don’t see your question here? Please email us and we can offer some guidance and perhaps add your question to the list!

  • What’s a reasonable number of questions for a youth survey?

    One of the biggest challenges faced by youth mentoring programs when designing a comprehensive survey for participants is limiting the questions to a reasonable number. This is especially true when surveying youth participants as they may struggle to provide accurate information over a lengthy survey that tests their patience and energy level.

    While there is no magic number that serves as a maximum, in general, you are likely to want the survey to be something youth could complete in a relatively short period of time (20 minutes or so). You may consider breaking up a longer survey over multiple administration points, although that also increases the likelihood of getting incomplete responses from more participants. Yes/No, True/False, and Likert-style ratings (e.g., noting the extent to which you agree or disagree with something on a 5-point scale) generally will allow for more total questions to be asked than open-ended questions that require the writing of detailed thoughts.

    Youth participants are also more likely to provide robust and accurate information when the questions are relevant to their time in the program, use appropriate language for their reading levels, and are sequenced in a way that flows logically and makes it easy for them to think about their response to each in context. If you have doubts about the length or content of your survey, consider testing it with a small group of youth before finalizing it to get a sense for how quickly and thoroughly they can complete the items and if there are any words or questions that are difficult for them.

  • What is the best way to get survey responses from youth?

    Most youth will respond well to either pencil and paper or (for older youth) online surveys. In either case, it may be appropriate to have questions read aloud for younger youth and those who do not have strong reading skills. This can serve both to enhance the quality of the data and reduce the time and effort required on the part of the youth to complete the survey. Ideally, youth will be administered surveys at the program site or in another location with the oversight of program staff. If this is not possible, care should be taken to ensure that youth are in a quiet and private location when completing the survey, without the close presence of a parent or their mentor. Verbal responses are great for qualitative evaluations or evaluations that are focused on program implementation, as they can contain nuanced information that provides rich feedback about the program’s services. But responding to an outcome survey verbally introduces a host of concerns (e.g., that the question asker is influencing the responses given, that youth may not want to respond honestly to some types of questions) and should only be used when trying to solicit responses from very young children who might struggle with a written or online survey.

  • Can I use survey questions that I’ve developed myself? What if I like mine better than what’s offered here?

    The surveys and scales offered in this Toolkit were selected by the NMRC Research Board for their general appropriateness with mentoring program goals and outcomes. But, these are by no means the only scales available to assess the outcomes (and risk and protective factors) involved. You may find that our recommended measures are not quite what you are looking for in terms of reading level/language or the nuances of how they represent a particular outcome (e.g., examining changes in attitudes rather than changes in behavior).

    For many of the measures recommended here, links are provided to alternative measures that may be closer to what you are looking for. However, we strongly caution programs about using truly “homegrown” questions or scales that have not been tested for validity (i.e., that the scale truly measures what it’s intended to measure) and reliability (i.e., it measures these things consistently across testings or situations). In rare circumstances, a program may have unique needs that warrant the development and testing of a brand new set of survey questions. But chances are that an existing set of questions can meet your needs. If you don’t find what you are looking for in this Toolkit, remember that you can always request free, personalized technical assistance through the NMRC to get help with exactly this kind of instrument identification.

  • Can I change the wording of questions or drop questions I don’t like from the scales I find here in the Toolkit?

    In a couple of cases, it is noted that you may want to alter a recommended measure for your own use. In most cases, however, we recommend you use the scale exactly as provided. You will make the strongest case for your program if you use pre-existing sets of questions in their entirety. Typically—especially when asking youth about attitudes—this means asking several similar questions, rather than only one or two. Picking a subset of items out of the recommended scales or changing the wording of these items—even very small changes—may not yield valid and reliable measurement of the outcome (or risk or protective factor) in which you are interested, and you may not be able to compare your findings with those of other programs using the original instrument.

    You can, in some instances, make small edits to the wording of a question to make it more age appropriate, but this should be undertaken only with the guidance of an experienced evaluator. If you find yourself wanting to substantially rewrite a set of questions, you may be better off identifying a different, more appropriate scale.

  • What do I need to keep in mind as I blend scales/questions from different sources together to make my program’s own unique evaluation survey?

    Overarching principles for good survey design, several of which are addressed above, include:

    • Keep a close eye on overall length, especially for youth surveys.
    • Make sure that all the questions use age-appropriate and culturally-relevant language, concepts, and response options, especially if drawn from different sources. The scales included in this Toolkit were selected with these considerations in mind.
    • Try to group questions on a given topic together and consider grouping questions with similar response options together.
    • Ensure that the survey has a logical flow to it and that the major sections are sequenced in a way that allows for efficient answering.
    • Consider adding brief “introductory” sections that orient youth to a new topic or set of response options (e.g., moving from “true/false” to “agree…disagree”).
    • Consider placing scales that ask about more personally sensitive topics (e.g., depressive symptoms or delinquent behavior) later in the survey so that youth are not confronted with these right at the beginning of the survey, which may reduce their comfort level in responding to these types of questions.

    Also consider the following when developing your survey:

    Make sure you are selecting outcomes that fit precisely with your program’s logic model and/or theory of change. While it can be tempting to cast a wide net when looking for outcomes from your program’s good work, remember that every set of questions added to a survey increases the overall burden on respondents and may decrease both the volume and accuracy of the information you collect. Be as targeted in your surveying as possible, sticking to the outcomes your logic model says are tied most closely to your program vision, and perhaps try to uncover additional rich information about your program via exit interviews, focus groups, or other qualitative approaches.

    Don’t forget about the time investment for your staff. The more questions asked, the more data will need to be stored, analyzed, and reported on. Even if you have hired an external evaluator, your staff is still likely to have a role in these types of tasks (see the Key Evaluation Considerations page for more information about evaluation roles and staffing).

    Take care to ask personal or sensitive questions only when they are critical to your program’s goals and you have considered the ramifications. Many mentoring programs address serious youth needs and circumstances and may need to demonstrate real progress on these challenges to accurately capture the good they are doing. But asking youth about highly personal and potentially painful topics also carries with it the potential for youth to experience negative feelings both during and following the completion of a survey. It also can place an ethical requirement on programs to follow up appropriately depending the information that a youth shares. Youth should always be informed that they have the option of not responding to any question on a survey or providing the information in another way if they feel uncomfortable. Personal and sensitive questions are also good examples of ones that are likely best placed toward the end of surveys.

    Consider how you can gather multiple opinions on the same outcomes. While most programs emphasize their youth surveys in efforts to get direct feedback from mentees about their gains in the program, additional perspectives from parents, mentors, teachers, and other caring adults or stakeholders in the relationship can be invaluable. As you are adding sets of questions to your youth survey, think about which topics might be meaningful to ask one or more other parties about. Their responses can often confirm or help to explain youth responses, providing added rigor and useful nuance to evaluation findings. For some topics, they may even be better respondents than youth. Future versions of this Toolkit will incorporate measures from sources other than youth in order to help programs take advantage of their potential benefits.

  • Should I always administer surveys pre-post (at the beginning and end of the program) or can I just do it at the end of the year?

    As a general rule, the scales recommended in this Toolkit are intended to be used in a pre-post design (ideally with a control/comparison group)—administering the scales before the youth begins receiving program services, and then again later—except for the Risk and Protective Factors, which are likely to be administered only at baseline to help clarify which types of children seem to benefit most from the program’s services. Outcome evaluation designs that do not establish a baseline starting point of where their participants are before they get a mentor or that lack a thoughtfully-identified comparison group of non-mentored youth have greatly reduced ability to attribute any subsequent “outcomes” to the program itself or to understand just how much the mentored youth changed compared to where they started.

  • Why is a control or comparison group so important when evaluating my program?

    As noted above, without a valid comparison group of youth, your evaluation is unlikely to say much about how big an impact your program has made. Having a comparative group of youth, ideally as similar as possible to those who were mentored in your program, can provide much needed context for findings. For example, a mentoring program serving middle schoolers might find that mentored youth are experimenting with drugs and alcohol more than they were at the beginning of the year. But, a comparison group might show even greater increased drug and alcohol use—critical context that might otherwise have led people to believe incorrectly that the program was, in fact, harmful. In another example, a program might find that mentored youth are faring much better in reading scores, but might take undue credit for that improvement if a comparison group would have shown that all students improved because of other factors (e.g., a new reading curriculum), not because of the work of the mentors.

    Smaller programs, though, in particular may often face challenges in coming up with a reasonable comparison group. Programs, for example, may have ethical concerns about denying mentoring to some youth to create a “comparison” group or they may simply be in too small of a setting to find a reasonable group of youth against whom to compare mentees. Programs are encouraged to be creative in how they find groups of comparable youth: working with similar youth at another school or in another city as a comparison group, using the incoming wave of mentees as the comparison group for the currently mentored youth, etc. These designs are beyond the scope of this Toolkit, but are options to consider with the guidance of a professional evaluator or through technical assistance requested through the NMRC. It should also be noted that if your program wants to know with the greatest confidence and credibility that it is making meaningful change in the youth it serves, an experimental design is likely to be necessary in which youth are randomly assigned to either the program or a non-program control group.

  • Is it OK if some of our youth skip some questions in our survey or don’t give complete answers? Are their responses usable or do I need to throw them out?

    In general, the scales in this survey are valid based on the questions being answered in their entirety. If youth are skipping questions, it can greatly impact the accuracy of the measures or prevent you from determining an overall score on the measure altogether. So when possible, encourage mentees to honestly and accurately answer all questions in a particular measure, while still making clear their opportunity to skip any question. There are, however, strategies available to score measures when some of the items have been skipped; these will be addressed as part of the further development of this Toolkit.

    What if youth in our program are slightly younger or older than the recommended ages for a scale?

  • What if youth in our program are slightly younger or older than the recommended ages for a scale?

    Can we stretch the age limits any? Within reason, scales may be able to be used with youth who are slightly younger than the lower end of the measure’s intended age range. This should only be considered when it is confirmed that such youth can still read and understand the questions. Even then, it should be kept in mind that younger-than-intended youth may find it difficult to understand and discriminate reliably among the response options that accompany each question. In general, it is easier for older youth to answer questions intended for a younger audience. Yet, it should be considered that questions designed for younger mentees may be asking about attitudes or behaviors that are developmentally inappropriate for older youth. In sum, to ensure the most accurate measurement, it is always best to use instruments with youth who are within the intended age range.

  • In general, are self-reported outcomes valid?

    Yes, and in some instances they are likely to be much more accurate than asking parents, mentors, teachers, or other adults about the outcome (or risk or protective factor) as exhibited or experienced by the youth. In general, younger mentees may struggle to conceptualize their internal feelings, beliefs, and even actions. Older youth may be more likely to falsify answers, either to hide things adults may disapprove of or to tell an adult what they think they want to hear. But in general, self-reports are a useful and well-accepted way of assessing attitudes, values, perceptions, and behaviors. One tip for helping to avoid bias in youths’ responses is to make sure that they know how their answers will be used—for example, that their answers will be used only when combined with other mentees for the purposes of your evaluation. Also, make sure that youth understand who their responses with whom will and will not be shared, keeping in mind promises of confidentiality may not extend to portions of surveys that ask about actions that may indicate self-harm or a potential for harm to others. The bottom line is that you should always tell youth exactly how you will and will not use their responses and who will see them, before they complete their survey.

  • Do I need approval from an Institutional Review Board (IRB) to survey my mentees?

    The purpose of an IRB review is to ensure that sensitive information about individuals participating in a research study is protected and potential harm to them is minimized. For most nonprofit mentoring programs, the purpose of collecting survey data is to inform internal program improvements or to report performance measures to a funding agency. If data are being collected for one of these purposes, this is probably not considered “research” (for additional guidance, see these helpful resources developed by the Department of Justice or Child Trends). But keep in mind that federally funded initiatives that include research activities or research being conducted by external evaluators, especially those affiliated with a higher education institution, may be required to go through IRB approval before surveying youth or families, so be sure to plan for what can sometimes be a lengthy delay. And while they may not have formal IRBs, many schools or school districts can have strict policies regarding surveying or collecting other data from students (tribal governments may also have similar policies). So make sure that you have explored all necessary protocols and discussed with grant officials if receiving funding before collecting data from your program participants.

    Even if you are not required to undergo an IRB review, it is still a good idea to try and protect the sensitive information that the youth and families may provide you in surveys. Some recommendations for protecting this information include: collecting anonymous surveys, removing identifying information from surveys, ensuring that surveys are secured in locked files, and working with a data information management specialist in designing protected data files.

  • How much support can I give my mentees in answering the survey questions?

    How can I help them understand the meaning of the questions without influencing their answers? In general, providing students with additional information or guidance can taint your results or change the accuracy of your data as even simple explanations or supports can make the survey experience different for some mentees. However, if you do need to help a mentee understand a concept, read a challenging word, or clarify the intent of a question, make sure that you provide the same information to all of the mentees taking the survey (e.g., you might want to create a survey “dictionary” that defines difficult words for youth and provide it to all youth who take your survey). This is more easily done in group settings where you can explain a concept to everyone at once. But generally, program staff should avoid providing much help and focus their energy on selecting an appropriate instrument that will be easy for their mentees to understand and complete on their own.  

  • If I receive a grant from a federal/state agency or foundation, can these resources be used to collect information to report to them as performance measures?

    The requirements of government or private organizations funding mentoring programs vary; however, many include the collection and reporting of performance measures. These survey resources may be able to assist with the data collection for these purposes, but you should carefully review the funder’s requirements.

    For OJJDP-funded projects, additional information on grant performance measures can be found on OJJDP’s Performance Measures Webpage.

  • How should I cite the toolkit?

    National Mentoring Resource Center Research Board (Bowers, E., DuBois, D., Elledge, C., Hawkins, S., Herrera, C., Neblett, E.) with Garringer, M., & Alem, F. (2016). Measurement guidance toolkit for mentoring programs. Washington, DC: Office of Juvenile Justice and Delinquency Prevention National Mentoring Resource Center. Available at http://www.nationalmentoringresourcecenter.org/index.php/learning-opportunities/measurement-guidance-toolkit.html

    Please note: This project was supported by Grant No. 2013-JU-FX-K001 awarded by the Office of Juvenile Justice and Delinquency Prevention, Office of Justice Programs, U.S. Department of Justice. Points of view or opinions in this document are those of the author and do not necessarily represent the official position or polices of the U.S. Department of Justice.

Request no-cost help for your program

Advanced Search