Exploring the use of item response theory models for analysing high-tariff items
PDF | 756.41 KB
By Yaw Bimpeh
Many high-stakes examinations in the UK use both constructed-response items and selected-response items. The increased use of constructed-response items with high-tariff response categories has motivated interest in polytomous item response theory (IRT) models. Paper 1 for one of AQA’s GCSE qualifications (referred to herein as ‘Subject A’) consists entirely of high-tariff items (e.g. worth 8, 12, 16, 24 marks), resulting in a large number of response categories per item. Furthermore, there is a likelihood that category frequency of responses can be zero for some items. An IRT model like the Partial Credit Model completely fails to provide any meaningful results when applied to high-tariff items. The purpose of this study is to examine the application of Samejima’s Continuous Response Model (CRM) as a suitable measurement model for high-tariff items.
This paper discusses the application of the CRM to the high-tariff items data, using both simulated data and 2018 data for Subject A. We compared the performance of the CRM with a new model known as an extended Nominal Response Model (eNRM) with fixed slopes, which has recently been suggested as suitable for high-tariff items in the technical report for the National Reference Test (NFER, 2018). The results suggest that both the CRM and eNRM models fit these items well. Both approaches can be used with constructed-response items scored externally by human markers.
However, the empirical evaluation shows that the CRM has some advantages over the eNRM. The CRM method does not require calibration using large numbers of response category parameters per item. One attractive feature of the eNRM approach is its robustness against violation of the postulated population distribution of ability.