posted on 2016-04-01, 00:00authored byMatthew Lineberry, Clarence D Kreiter, Georges Bordage
Recent reviews have claimed that the Script Concordance Test (SCT) methodology generally produces reliable and valid assessments of clinical reasoning. We describe three major validity threats not yet considered in prior research. First, the predominant method for aggregate and partial credit scoring of SCTs introduces logical inconsistencies in the scoring key. Second, reliability studies of SCTs have generally ignored inter-panel, inter-panelist, and test-retest measurement error. Instead, studies have focused on observed levels of coefficient alpha, which is neither an informative index of internal structure nor a comprehensive index of reliability for SCT scores. As such, claims that SCT scores show acceptable reliability are premature. Finally, SCT criteria for item inclusion, in concert with a statistical artifact of its format, cause anchors at the extremes of the scale to have less expected credit than anchors near or at the midpoint. Consequently, SCT scores are likely to reflect construct-irrelevant differences in examinees’ response style. This makes the test susceptible to bias against groups that endorse extreme scale anchors more readily; it also makes the test susceptible to score inflation due to coaching. In a re-analysis of existing SCT data, we found that simulating a strategy whereby examinees never endorse extreme scale points resulted in considerable score inflation (d = 1.51), and examinees that simply endorsed the scale midpoint for every item would still have outperformed most examinees that used the scale as intended. Given the severity of these threats, we conclude that aggregate scoring cannot be recommended. Recommendations for revisions of SCT methodology are discussed.