Assessing Potential Predictors of Rater Fit Measures in the Establishment of Performance Standards
thesisposted on 16.02.2016, 00:00 authored by Maria Incrocci
The purpose of this study was to determine to what extent two rater background-related variables (i.e., a rater’s gender and content domain expertise) and two item characteristic-related variables (i.e., an item’s difficulty classification and content domain classification) could account for variance in rater fit indices in the context of a standard setting for a certification examination. Licensing and certification organizations convene groups of subject matter experts in the field (i.e., raters) and engage them in a standard-setting process to recommend a cut score (performance standard) to classify proficient, knowledgeable, and competent individuals. During a standard setting, it is common practice for raters to examine individual items and then provide an estimate of the proportion of minimally qualified candidates that the rater believes would answer each item correctly. Rater fit refers to the level of accuracy or precision that an individual rater attains when providing these estimates. In this study, the fit indices were based on the variance of raters’ proportion correct estimates of the performance of minimally qualified candidates on a 200-item certification examination and empiric data gathered on the performance of another group of minimally qualified candidates who took the same items. The 24 raters who participated in the 2011 standard setting were faculty members who had taught in U.S. colleges and schools of pharmacy. The researcher used a hierarchical linear model to conduct a two-level (items nested within raters) analysis. The outcome variable was the rater fit indices. The two item characteristic-related variables accounted for 91% of the variance in the rater fit indices, suggesting that the ability to provide accurate proportion correct estimates for minimally qualified candidates was related to an item’s difficulty level and content domain classification. By contrast, the ability to provide accurate proportion correct estimates for minimally qualified candidates was not related to a rater’s gender or content domain expertise. The study’s findings support the standard-setting experts’ view that rater training which includes multiple practice rounds, discussions, interactions, and feedback can be influential in decreasing the variance in raters’ proportion correct estimates.