Using exam results to inform teaching and accountability

Anton Béguin & Alison Wood

The current approach to the use of test or examination results at school level is strongly externally focused. As a consequence, they cannot clearly be seen to support school improvement and enhance the professional development of teachers.

Test results are, at present, used as a key indicator of how well schools are performing. In many cases, they are clearly seen as the most important one. They are the central piece of information upon which accountability measures are based.

As indicators of the quality of education, however, this type of measure is necessarily limited. A summative test will try to assess the relevant content of a subject, but the need to keep testing time-manageable and the limitations of test logistics mean that there are restrictions on how much content can be assessed. A single test can only ever take a sample from the content of the subject. Due to this limitation, a test might well have a certain degree of predictability (“Well, there is always a question on x and another on y … and they can’t really ask about z, so we won’t worry about that too much …”). As a consequence, it might reasonably be argued that there is a lot more to education than can be measured in a test.

When used for accountability purposes, test results are often analysed at an aggregate level, say as average scores or grades for a school. School performance is then evaluated in comparison with other schools. Sometimes comparisons are made between schools which are similar in terms of background variables, such as the prior attainment or socio-economic background of their intakes.

This approach to evaluating a school has value as a high-level indicator of quality, but it is too shallow to be used to support improvements in teaching and learning. In a worst-case scenario, test results are given so much emphasis that they can lead to unintended and undesirable consequences, such as teaching to the test, narrowing the curriculum and focusing on those students whose performance has the greatest impact on the headline accountability measures. All this potentially places risk on the validity of the accountability system and might even undermine trust in schools and teachers.

This chapter gives a brief overview of different types of assessment and their suitability for accountability purposes, and explores the potential for a somewhat different approach to using test results in accountability. This approach is not a radical departure from current practice, but an adaptation of the current approach, which builds on its strengths and begins to address its more major weaknesses.  For this reason, we will refer to our approach as the ‘adapted approach’.

The adapted approach builds on the current approach, because it retains test results as the core of the accountability system. The difference lies in the way in which the results are used. The adapted approach aims for a more process-based accountability system, in which test results are used for school self-evaluation, evidence-based improvement and school-driven accountability.  

Different types of assessment and different uses of results

It is helpful to begin by distinguishing between:

  1. the locus of assessment in the educational process
  2. the type of assessment carried out
  3. the entity being assessed.

With respect to the locus of assessment, a distinction is made between a final test or examination, which is at the end of a programme of study, and day-to-day teacher assessment, including progress testing, which takes place on an ongoing basis during the course of study.

Type of assessment distinguishes between a summative assessment, where the aim is to determine if predetermined standards have been reached, and formative assessment, where the primary aim is to support student learning. These two types of assessment are, of course, linked, because formative assessment can lead to interventions or adaptations to teaching and learning, better to enable a student to meet the standards of the summative assessment.

Finally, the entity being assessed can be the student, the teacher, the school, an organisation (such as a local authority or academy chain) or an educational system.

Where the entity being assessed is the student and where the type of assessment is a final, summative test, at the end of a course of study, the information that the test must provide is an accurate and reliable result. It must accurately measure or rank the student, in comparison with a pre-set standard and/or with his or her peers, and it must do so consistently, test after test. This is because, in many cases, the result is used for high-stakes decisions, such as entry to university.

Where the entity being assessed is the student and where the type of assessment is a progress test, which is formative in nature and takes place during a course of study, the requirement is for information which is as detailed and informative as possible. Clearly, the information provided cannot be manifestly inaccurate, but very high levels of accuracy are not as essential as they are in the case of a summative assessment.

Discussions of formative and summative assessment, in the literature, are interesting. Summative assessment and testing have generally been seen to be of major interest at teacher and school level, informing evaluation of teaching and learning. Formative assessment is generally seen as more educationally valuable than summative assessment and testing, because it can focus on validity, using a wide range of assessment methods and ranging across the whole of the subject content. It can support learning, rather than just reporting final outcomes, as summative assessment is generally held to do.

It has generally been thought that formative assessment, using progress testing, cannot be used for summative purposes, largely because it lacks the controls that are put in place around summative assessments. Final tests or examinations take place in highly controlled conditions, with standardised marking and grading, to ensure that all students are treated equally and fairly. It has also been argued that trying to use formative assessments for summative purposes risks undermining the strengths of those kinds of assessments. If the results become high stakes, it is argued, then the focus will cease to be on identifying strengths and weaknesses, and the ability of the assessments to support student progression will be lost.

The adapted approach

Relatively little consideration has been given to the ways in which summative test results can best be used by teachers and schools formatively, to improve teaching and learning and to inform self-appraisal and evaluation.

At present, test results are usually reported to schools in terms of grades or marks. In the adapted approach, summative test outcomes could be reported in more detail and these more detailed reports could serve as the basis upon which teachers could evaluate the approaches and methods they use in the classroom. They could provide opportunities for systematic, evidence-centred evaluation of education. Different approaches to teaching and learning could be evaluated, and the effectiveness and efficiency of these approaches given evidence-based consideration.

Requirements for the adapted approach

For final tests to serve a formative function, it is essential that a clear link can be made, by teachers themselves, between the approaches they take in the classroom and the outcomes from the assessments. It is essential that teachers can see themselves as the owners of the educational process and as professionals, whose reasoned interventions can have an impact on student outcomes. To support this, we need to move away from reporting in terms of marks or grades to reporting which:

1. is given in terms of small-grained outcomes:

  • sub-domains (small areas of the curriculum) and pre-specified standards
  • levels of understanding (based on taxonomies)
  • types of errors 

2. allows for analysis of student performance taking into account background variables  and, for example, specifying how minority groups perform on specific domains

3. allows for the identification of types of error such as common misconceptions

4. can be based on indicators of growth in which it is specified how results on the final test relate to previous performance. This allows for reports showing performance on specific domains (curriculum areas) in relation to the previous performance. So, for example, for the Mathematics GCSE in a particular school we might see that students perform less well at manipulating algebraic expressions than in other schools, but also that this was the case in that school for a test taken in Year 9 over the previous three years.

With richer reporting, teachers and schools can deepen their own understanding of their results and be better placed to explain them to key stakeholders, such as governors, parents and Ofsted. Richer reporting would support engagement with partner schools, to support raising attainment and, with comparator schools, to inform robust self-evaluation.

Properly understood and used, richer reporting can be used to set educationally meaningful goals for future performance. Teachers and schools can agree their priorities, focusing, say, on particular domains of the curriculum in a coherent way, or on at-risk groups of students. They can meaningfully set tangible, realistic targets for improvement and monitor their progression towards them. With this type of school evaluation, it is schools themselves who are in charge of and in control of their own development. The inspection and accountability frameworks could, over time, evolve to focus on the robustness of schools’ own self-evaluation and improvement systems, and there could be meaningful engagement about the goals between schools and Ofsted.  

Infrastructure and information-sharing

The adapted approach has IT infrastructure implications and consideration will need to be given about how data is collected, validated and can be used appropriately by stakeholders. The database underpinning the adapted approach would need to include background variables and student results and enable teachers to:

  • select sub-groups of students
  • select sub-sets of items (questions)
  • aggregate results from a student, over a range of subjects, based on overarching taxonomies
  • link test results with prior achievement
  • store test results of multiple years and allow for comparison of outcomes over years.
This functionality would allow teachers to construct their own outcome indicators and engage meaningfully with the results. They could, for example, select particular sub-groups of students for detailed analysis. They could exclude particular students, or sub-groups, from the analysis, to compensate for year-to-year variations in the cohort and so give themselves an indicator of the underlying stability of their underlying results. They could investigate the performance of particular sub-groups of students over time. Teachers could test hypotheses about the comparability of their own students with the national cohort, or with those in similar schools.

Teachers could select particular test items for analysis, such as those which they judge to be essential to the mastery of a subject. They could analyse performance on particular sub-sets of items with particular sub-sets of students, enabling them to make evidence-based judgements about the particular performance of, say, their highest achieving students, or those with particular needs

The ability to aggregate student results across a range of subjects could be a powerful driver for whole-school initiatives. Teachers might, for example, focus on literacy across the curriculum, with the identification of relevant items in tests across a range of subjects breaking down barriers between departments and allowing good practice to be disseminated between subjects. The analysis of student performance on those selected items could provide opportunities for discussion between departments about how work in one curriculum area could support that in others and how similarities and differences between subjects can be flagged to students.

The linking of tests to prior achievement allows the construction of growth measures. The approach can be nuanced, with the outcomes from external assessments (such as National Curriculum tests) linked with those from school-based assessments. Where comparisons need to be made with other schools, school-based assessments could, just for these analyses, be excluded, to ensure the robustness of the outcomes. Teachers might, of course, also be interested in analyses taking account of school-based assessment outcomes, especially if they work in groups of schools who collaborate on assessment.

So, the adapted approach is still based on test data, but uses that data in a far more nuanced way, giving teachers the ability to mine and manipulate that data in ways that can directly inform and support teaching and learning.

Supporting the implementation of the adapted approach would provide government with the opportunity to change the way in which it engages with education professionals and parents. Policymakers would be advocating and supporting the development of rich, summative assessments which would facilitate the use of new reporting tools. This, in turn, would support a move to a genuinely evidence-based approach to teaching and school improvement. In time, government could drive the evolution of the accountability system to one based on robust school self-evaluation and evidence-based approaches for school improvement.

Getting more out of exam results

In this chapter, we propose an adapted approach to school accountability. It is based on providing more granular information from summative assessments and then incorporating that information with other evidence in a database which can be used, on a day-to-day basis, by teachers. That data, with appropriate safeguards, could be shared and used between schools and across the education system, to drive teacher-led and school-based improvement.

In turn, external accountability could evolve to focus on the way individual schools use this information to improve teaching and learning. Targets could be set intelligently, based on this information. They could be based on comparisons with similar schools and so be challenging, but realistic.

This approach is efficient, because it uses data which already exists within the system both to set and monitor targets and to give schools a steer on where they could look for support. This allows for a system where proper monitoring of the educational approach is possible. It gives a significant role for test results, but far more freedom to schools and clear opportunities for the professional development of teachers.

Most importantly, however, this approach will encourage teachers to own assessment outcomes and have proper responsibility for the outcomes education they work so hard to provide for our young people.

A blueprint for getting more out of test results

  • Exam boards should work to develop rich assessments and reporting tools to inform teaching and learning and school improvement.
  • As richer assessment becomes used to support robust school self-evaluation, government should consider how the accountability system could evolve to reflect this and focus on supporting school improvement.