The research glossary defines terms used in conducting social science and policy research, for example those describing methods, measurements, statistical procedures, and other aspects of research; the child care glossary defines terms used to describe aspects of child care and early education practice and policy.
The collection of data from all members, instead of a sample, of the target population.
Central Limit Theorem
A mathematical theorem that is central to the use of statistics. It states that for a random sample of observations from any distribution with a finite mean and a finite variance, the mean of the observations will follow a normal distribution. This theorem is the main justification for the widespread use of statistical analyses based on the normal distribution.
A measure that describes the "typical" or average characteristic; the three main measures of central tendency are mean, median and mode.
A statistic used when testing for associations between categorical, or non-numeric, variables. It is also used as a goodness-of-fit test to determine whether data from a sample come form a population with a specific distribution.
There are several different Chi-square tests in statistics. One of the more commonly used is the Chi-square test of independence. It is used to determine if there is a statistically significant association between two categorical variables. The frequency of each category for one variable is compared across the categories of the second variable, such as in a n x n cross tabulation. It is the null hypothesis for this test that there is no association between the two variables (i.e., the distributions of the two variables are independent of each other). The alternative hypothesis is that there is an association. For example, a Chi-square test could be used to examine whether parents' decision to delay their children's entry to kindergarten (delay vs. do not delay) is statistically significantly associated with their child's sex (male vs. female).
A type of multivariate analysis where the collected data are classified based on several characteristics in order to determine groups (or clusters) of cases that would be useful to explore further. This type of analysis can help one determine which groups of variables best predict an outcome.
Cluster analysis is a multivariate method used to classify a sample of subjects (or objects) in such a way that subjects in the same group (called a cluster) are more similar (e.g., in terms of their personal attributes, beliefs, preferences) to each other than to those in other groups (clusters).
A type of sampling method where the population is divided into groups, called clusters. Cluster designs are often used to control costs. For example, researchers first randomly select clusters of potential respondents, and then respondents are selected at random from within the pre-identified clusters. The researcher randomly selects several counties or groups of counties and then draws a random sample of households from within the selected counties. Cluster sampling is often used in education and early childhood research. Researchers sample schools/programs and then students/children enrolled in the selected schools/programs. Clustered sampling designs necessitate the use of special variance estimation techniques.
Information on the structure, content, and layout of a data set. The codebook typically provides background on the project, describes the study design, and gives detailed information on variable names and variable value codes. User's manuals and user's guides are examples of codebooks.
Values, typically numeric, that are assigned to different levels of variables to facilitate analysis of the variable. For example, codes such as strongly disagree=1, disagree=2, agree=3, and strongly agree=4 are often assigned.