Have you ever taken a test or questionnaire and wondered how your responses were being analyzed? If you’re interested in psychometrics, you may have heard of the Rasch model, a statistical model used in item response theory (IRT) to analyze data from tests and questionnaires. In this post, I’ll introduce you to the Rasch model, its history, practical uses, and some insights from experts in the field.
A Brief History of the Rasch Model
The Rasch model is named after the Danish mathematician Georg Rasch, who developed the model in the 1960s. Rasch was interested in developing a statistical model that could account for the relationships between a person’s ability or trait level and their response to test items, while minimizing the influence of other factors such as guessing, luck, or cultural bias. His model became the basis for modern IRT and has been used extensively in fields such as education, psychology, and health sciences.
Practical Applications of the Rasch Model
One of the main uses of the Rasch model is to develop and evaluate tests and questionnaires. By analyzing the relationships between test items and a person’s ability or trait level, researchers can identify poorly performing items and remove them from the test, or revise them to better match the intended construct. The Rasch model can also be used to evaluate the quality and fairness of tests, by examining whether items function similarly across different groups of people or whether certain groups of people are advantaged or disadvantaged by certain items.
Another important use of the Rasch model is in estimating ability or trait levels of individuals. The Rasch model provides estimates of ability or trait levels that are adjusted for the difficulty of the test items and the probability of guessing or chance responding. This makes the Rasch model more accurate and precise than other methods of estimating ability or trait levels, such as raw scores or percentile ranks.
How it works
The model assumes that each item has a certain level of difficulty, and that the probability of answering an item correctly depends on the individual’s level of the trait being measured and the item’s difficulty.
To illustrate this, let’s consider a simple example. Suppose we want to measure students’ mathematical ability using a set of three items: A, B, and C. The items are presented in order of increasing difficulty, and the responses are coded as 0 for incorrect and 1 for correct. The Rasch model posits that the probability of answering an item correctly can be modeled as follows:
P(item i is answered correctly) = exp(ability – difficulty_i) / (1 + exp(ability – difficulty_i))
where ability is the latent trait we want to measure, and difficulty_i is the difficulty of item i.
Assuming that the latent trait follows a normal distribution with mean 0 and variance 1, we can estimate the individual’s ability level based on their responses to the items. For example, if a student answers items A and B correctly but fails item C, we can infer that their ability level is somewhere between the difficulties of item B and item C.
Evaluating model fit
Model fit evaluation is the process of determining how well a statistical model fits the observed data. In the context of the Rasch model, evaluating model fit involves comparing the observed response patterns to the expected response patterns generated by the model. Fit statistics are like tests that tell us how well the model fits the data.
There are several fit statistics that can be used to evaluate the Rasch model. Some common fit statistics include the mean square error (MSE), the infit and outfit mean square statistics, and the standardized residuals.
Mean square error (MSE) is calculated by comparing the observed responses with the expected responses under the Rasch model. A small MSE indicates good model fit, while a large MSE indicates poor model fit.
Infit and outfit mean square statistics are based on the differences between the observed and expected responses, and are used to evaluate the fit of individual items. Infit statistics are sensitive to unexpected responses to items that are near the ability level of the respondent, while outfit statistics are sensitive to unexpected responses to items that are far from the respondent’s ability level.
Standardized residuals are another way to assess model fit. They are calculated by dividing the difference between the observed and expected responses by the standard error of the difference. Standardized residuals that are greater than ±2.0 indicate poor fit, while those between ±1.0 and ±2.0 indicate marginal fit.
The chi-square statistic is commonly used but can be sensitive to sample size, leading to significant results even when the model fits reasonably well. Root mean square error of approximation (RMSEA) is a measure of the discrepancy between the predicted and observed covariance matrices and is less affected by sample size. A smaller value of RMSEA indicates better model fit.
The Tucker-Lewis index (TLI) is another measure of model fit that compares the chi-square value of the model to that of a baseline model (usually a null model with no predictors) to see how much improvement the model provides. A value of TLI close to 1 indicates good fit.
In APA format, it is common to report the fit statistics used to evaluate the Rasch model, along with their values and any relevant interpretation or conclusion about the fit of the model. For example,
“The Rasch model was evaluated for fit using several fit statistics, including the chi-square statistic, RMSEA, TLI, MSE, infit, and outfit. The chi-square statistic was significant (χ² = 10.26, df = 8, p = .25), indicating acceptable fit. The RMSEA was .03 (90% CI [.00, .07]), which is below the recommended cutoff of .05, suggesting good fit. The TLI was .97, also indicating good fit. Additionally, the MSE, infit, and outfit statistics were all within acceptable ranges, further supporting the fit of the Rasch model.”
It is important to note that model fit statistics should be interpreted together rather than in isolation, and the overall judgment of the model fit should be based on multiple indicators rather than just one.
Graphs
Below are some graphs that are commonly used in Rasch analysis and can be useful for interpreting and presenting results.
- Category probability curves: These curves display the probability of a respondent endorsing each response category for a given item. These curves can help to identify disordered response categories, where the probability of endorsing a higher category is greater than the probability of endorsing a lower category.
- Person-item map: This map shows the distribution of respondents and items along a common latent trait continuum. The map can help to identify the range of ability levels of the respondents and the difficulty levels of the items.
- Item characteristic curves: These curves show the relationship between the probability of a respondent endorsing an item and the respondent’s ability level. These curves can help to identify items that are functioning differently for different groups of respondents.
- Residual plots: These plots show the differences between observed and expected responses for each item. Residual plots can help to identify items that are not fitting the Rasch model well.
- Differential item functioning (DIF) plots: These plots show the differences in item difficulty or item characteristic curves for different groups of respondents. DIF plots can help to identify items that are functioning differently for different groups of respondents.
When reporting the results of Rasch analysis in papers, it is common to include at least one of these graphs to help illustrate the findings. The choice of which graph to include will depend on the specific research question and the nature of the data being analyzed. It is important to provide a clear and concise description of the graph and its interpretation in the text of the paper.
The Good and the Bad of the Rasch Model
Like any statistical model, the Rasch model has its strengths and weaknesses. Some of the strengths of the Rasch model include its ability to estimate ability or trait levels with high precision, its ability to identify poorly performing test items, and its flexibility in allowing for different types of response formats (e.g., multiple-choice, Likert scale, etc.).
However, the Rasch model also has some limitations. One limitation is that it assumes that the relationship between a person’s ability or trait level and their response to test items is linear and unidimensional. This means that the Rasch model may not be appropriate for tests or questionnaires that measure complex constructs with multiple dimensions or facets. Additionally, the Rasch model can be sensitive to violations of its assumptions, such as item guessing or differential item functioning across groups.
Other imitations and assumptions to be aware of are the assumption of local independence and the need for a sufficient number of items.
Conclusion
Overall, the Rasch model is a powerful and flexible tool for measuring latent traits, and a key component of item response theory in the field of psychometrics.
Here are some references related to the Rasch model and item response theory:
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
- Embretson, S. E., & Reise, S. P. (2013). Item response theory (2nd ed.). New York: Routledge.
- Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
- Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge.
- de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
- Andrich, D. (1988). Rasch models for measurement. Sage Publications.
- Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage Publications.
- Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8(4), 341-349.
- Reckase, M. D. (2009). Multidimensional item response theory. Springer Science & Business Media.
- Wilson, M. (2005). Constructing measures: An item response modeling approach. Psychology Press.
I was extremely pleased to discover this great site. I need to to thank you for ones time for this particularly wonderful read!! I definitely appreciated every bit of it and i also have you book-marked to see new stuff in your web site.
Greetings from Florida! I’m bored to death at
work so I decided to check out your blog on my iphone during lunch break.
I enjoy the information you provide here and can’t wait to take a look when I get
home. I’m shocked at how quick your blog loaded on my phone
.. I’m not even using WIFI, just 3G .. Anyhow, very
good site!