Exploring second phase samples: what is the most appropriate basis for examiner adjustments?

By Lucy Billington

Abstract

Since 1918, if not before, the maintenance of standards over time in the English examination system has used approaches that assume that large cohorts of candidates sit their examinations at the same time of year, every year, after following a similar programme of study over a similar time period. These assumptions present barriers to the modernisation of the examination system.

Firstly, the personalisation policy agenda seeks to deliver a personalised classroom with a personalised examination timetable by 2020.

Secondly, the delivery of on-screen assessment is currently being hampered by the limitations on the number of candidates that can be tested on-screen in any centre in any one sitting. Multiple parallel versions of tests would allow longer testing windows, but would pose the standardsetting problem of multiple heterogeneous populations.

Item Response Theory (IRT) test-equating approaches would seem to hold the answer as the parameters that characterise an item do not depend on the ability distribution that characterises the examinees. IRT approaches, however, depend on strong statistical assumptions that do not hold precisely in real testing situations.

This research was undertaken to investigate the extent to which the invariance of item parameters would hold for a post-equating non-equivalent group design intended to maintain standards between a June and a November test session.

How to cite

Billington, L. (2009). Exploring second phase samples: what is the most appropriate basis for examiner adjustments? Manchester: AQA Centre for Education Research and Policy.

Keywords