Child Care and Early Education Research Connections

Skip to main content

Key Questions to Ask

This section outlines key questions to ask in assessing the quality of research and describes internal validity, external validity, and construct validity.

  1. Was the research peer reviewed?

    Peer reviewed research studies have already been evaluated by experienced researchers with relevant expertise. Most journal articles, books and government reports have gone through a peer review process. Keep in mind that there are many types of peer reviews. Reports issued by the federal government have been subject to many levels of internal review and approval before being issued. Articles published in professional journals with peer review have been evaluated by researchers that are experts in the field and who can vouch for the soundness of the methodology and the analysis applied. As a result, peer-reviewed research is usually of high quality. A research consumer, however, should still critically evaluate the study's methodology and conclusions.

  2. Can a study's quality be evaluated with the information provided?

    Every study should include a description of the population of interest, an explanation of the process used to select and gather data on study subjects, definitions of key variables and concepts, descriptive statistics for main variables, and a description of the analytic techniques. Research consumers should be cautious when drawing conclusions from studies that do not provide sufficient information about these key research components.

  3. Are there any potential threats to the study's validity?

    A valid study answers research questions in a scientifically rigorous manner. Threats to a study's validity are found in three areas: 

    • Internal Validity

    • External Validity

    • Construct Validity

Internal Validity refers to whether the outcomes observed in a study are due to the independent variables or experimental manipulations investigated in the study and not to some other factor or set of factors. To determine whether a research study has internal validity, a research consumer should ask whether changes in the outcome could be attributed to alternative explanations that are not explored in the study. For example, a study may show that a new curriculum had a significant positive effect on children's reading comprehension.

The study must rule out alternative explanations for the increase in reading comprehension, such as a new teacher, in order to attribute the increase in reading comprehension to the new curriculum. Studies that specifically explain how alternative explanations were ruled out are more likely to have internal validity. Threats to a study's internal validity can compromise the confidence consumers have in the findings from a study and include:

  1. The introduction of events while the study is being conducted that may affect the outcome or dependent variable of the study. For example, while studying the effectiveness of children's participation in an early childhood program, the program was closed for an extended period of time due to damage from a hurricane.

  2. Changes in the dependent variable due to normal developmental processes in study participants. For example, young children's performance on a battery of outcome measures (e.g., reading and math assessments) may decline during the testing or observation period due to fatigue or other factors.

  3. The circumstances around the testing that is used to assess the dependent variable. For example, preschool children's performance on a standardized test may be questionable if test items are presented to children in unfamiliar ways or in group settings.

  4. Participants leaving or dropping out of the study before it is completed. This can be especially problematic if those who leave the study are different from those who stay. For example, in a longitudinal study of the effects of a school lunch program on children's academic achievement, the validity of the findings could be problematic if the most disadvantaged children in the program left the study at a higher rate than other children.

  5. Changes to or inconsistencies in how the dependent and independent variables were measured. For example, changing the way in which children's math skills are measured at two time points could introduce error if the two measures were developed using different assessment frameworks (i.e., they were developed to assess different math content and processes). Inconsistencies are also introduced when different staff follow different procedures when administering the same measure. For example, when administering an assessment to bilingual children, some staff give children credit for answering correctly in English or Spanish, and other staff only give credit for answering correctly in English.

  6. Statistical regression or regression to the mean can affect the outcome of a study. It is the movement of test scores (post-test scores) toward the mean (average score), independent of any effect of an independent variable. It is especially a concern when assessing the skills of low performing individuals and comparing their skills to those with average or above average performance. For example, kindergarten children with the weakest reading skills at the start of the school year may show the greatest gains in their skills over the school year (e.g., between fall and spring assessments) independent of the instruction they received from their teachers.

External Validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity) and over time (historical validity). To assess whether a study has external validity, a research consumer should ask whether the findings apply to individuals whose place and circumstances differ from those of study participants. For example, a research study shows that a new curriculum improved reading comprehension of third-grade children in Iowa. As a research consumer, you want to ask whether this new curriculum may also be effective with third graders in New York or with children in other elementary grades. Studies that randomly select participants from the most diverse and representative populations and that are conducted in natural settings are more likely to have external validity. Threats to a study's external validity come from several sources, including:

  1. The sample is not representative of the population of interest. As a result, findings from the study may be biased (sample selection bias) and do not accurately represent the population. Several factors can lead to a sample not being representative of the population.

    • The list of all those in the population who are eligible to be sampled is incomplete or contains duplicates. For example, in a household survey, the list of housing units from which the sample will be drawn may be missing housing units (e.g., one or the two housing units in a duplex home). Or, an address list that will be used to drawn a sample may have some households listed twice.

    • Some members of the population or members of certain groups may not be adequately represented in the sample (undercoverage). For example, a survey of adult education that relies on a published list of telephone numbers to select its sample may not get an accurate estimate of the participation of adults in different education programs because young adults who have higher rates of participation are less likely to have landlines and to have numbers published.

    • Not all individuals who are sampled agree to participate in the study. When those who participate are different in meaningful ways from those who do not, there is the potential for the findings from the study to be biased (nonresponse bias). That is, the findings may not represent an accurate picture of the total population.

    • Selecting samples using non-probability methods (e.g., purposive sample, volunteer samples), which tend to over- or under-represent certain groups in the population. For example, volunteer surveys on controversial topics such as school vouchers and sex education are more likely to overrepresent individuals with strong opinions. And, shopping mall surveys in general only represent the small group of individuals who are shopping at a particular location and at specific times.

  2. The findings from one study are difficult to replicate across locations, groups, and time. Despite the best efforts, it is extremely difficult to introduce and implement a program (treatment) exactly the same way in different locations. Similarly, it is difficult to conduct a study the same way each time. While researchers have control over many features of their studies, there are factors that are beyond their control (e.g., willingness of potential subjects to participate, scheduling conflicts that could lead to cancellations of data collection activities, data collection being suspended due to natural disasters). For example, the ability to carry out a study of school-age children's reading and math achievement in one school or in one school district may be affected by teachers' willingness to surrender instructional time for students to participate in a series of standardized assessments. In some cases, modifications to the study design (e.g., shorten the assessment, limit sensitive questions on a teacher or parent survey) must be made to accommodate the concerns of school and district leaders.

  3. Changes in the behaviors and reported attitudes of study participants as a result of being included in a research study (Hawthorne effect). For example, parents participating in a research study on children's early development may change the ways in which they support their child's learning at home.

Construct Validity refers to the degree to which a variable, test, questionnaire or instrument measures the theoretical concept that the researcher hopes to measure. To assess whether a study has construct validity, a research consumer should ask whether the study has adequately measured the key concepts in the study. For example, a study of reading comprehension should present convincing evidence that reading tests do indeed measure reading comprehension. Studies that use measures that have been independently validated in prior studies are more likely to have construct validity.

There are many threats to construct validity. These can arise during: the planning and design stage, assessment or survey administration, and data processing and analysis. Some are attributed to researchers and others to the subjects of the research. Here are some of the more common threats:

  1. Threats that occur during the planning and design stage include:

    1. Poorly defined constructs are perhaps the largest threat to construct validity. This applies to constructs that are too narrowly defined as well as those that are defined too broadly.

    2. Validity can also be affected by the measures a researcher chooses to measure a construct. Measures that include too few items to adequately represent the construct pose a threat as do measures that include items that tap other constructs. For example, a math assessment administered to four- and five-year old children that only includes items that require children to count would not be adequate to represent their math skills. A math assessment administered to this same group of children that was made up mostly of word problems would be tapping both their math and language skills. A valid measure should cover all aspects of the theoretical construct and only aspects of the theoretical construct.

    3. Assessment items or survey questions that are poorly written are threats to validity. Such items would include double-barreled questions that ask multiple questions within a single item (e.g., are you happily married and do you and your spouse argue?). Other examples of poorly written questions include those that use language that is above the reading level of most respondents, use professional jargon or are written in such a way as to trigger a socially desirable response.

    4. The validity of an assessment is threatened if there are too many items that are outside the ability of the individual being assessed (e.g., too many very easy items and too many very difficult items). For example, an early literacy assessment that only included passages that children were asked to read and answer questions about would not result in a valid assessment of children's early literacy.

  2. Threats that occur during administration include:

    1. Threats that are introduced by interviewers and assessment staff. Actions by these individuals that can affect the reliability and thus the validity of the assessment occur when they deviate from the research protocol and when they signal a correct answer to the study participant through their actions. For example, an assessment of young children's English language vocabulary may specify that only responses in English are acceptable. However, when assessing bilingual children, some assessors comply with this rule while others accept responses in English or in the child's home language (e.g., Spanish). Assessors may unintentionally signal to children the correct responses on an assessment by 'staring' at the correct response to a multiple-choice item or by smiling and giving praise only when the child answers correctly.

    2. Threats to validity can also be introduced by the research participants. These would include participant apprehension or anxiety that could result in poorer performance on an assessment or to incorrect or ambiguous responses to a series of interview items. These threats must be taken seriously and addressed when administering standardized assessments to young children, many of whom will have limited experience with these types of tests. The language used when administering an assessment can also threaten its validity, if subjects do not have the language skills to understand what they are being asked to do and the language skills needed to respond.

  3. Threats that occur during data processing and analysis include:

    1. Coding errors - Coding errors that are systematic as compared to those that are random are especially problematic.

    2. Poor inter-coder or inter-rater reliability - When coding responses to open-ended survey items or assigning scores to behaviors observed during a video interaction, it is important that different coders or raters assign the same code or score for the same response or behavior. That is, the goal is high inter-coder or inter-rater reliability. When inter-coder or inter-rater reliability is poor, it can have an adverse effect on the validity of a measure. For example, the construct validity of an observation measure of the quality of parent-child interactions could be compromised should individual members of a group coding a set of videotaped mother-child interactions apply different standards as to what they deem as intrusive parenting practices.

    3. Inconsistencies in how data are analyzed and missing data handled - Missing data may be handled in a number of different ways, and the approach that is chosen could prove to be problematic for a construct, especially when the data are not missing at random. For example, if items tapping certain math skills are missing disproportionately, the validity of the measure could be jeopardized if a researcher assigns the mean score for those items or if he simply averages the scores for the non-missing items.