# Notes and guidance: large data set

Our new AS and A-level Maths specifications require students to study a large data set during their course of study. The data set is chosen by each exam board, based on Ofqual guidance.

The exams will include questions or tasks that relate to the prescribed large data set, giving a material advantage to students who have studied it.

The large data set is too large to be taken into an exam. Instead, we recommend using the large data set as a classroom tool to support teaching the statistics content of the specification and to familiarise students with working with and manipulating data. Basic knowledge of spreadsheet packages such as Microsoft Excel or Geogebra is required.

## Techniques for studying the large data set

Study of the large data set could include the following techniques:

- sampling
- histograms
- scatter graphs and correlation (not causation)
- measures of central tendency and spread (standard deviation)
- data cleansing
- select and critique different presentation techniques
- probability: exclusive and independent events
- brief interpretation of the data in order to answer short questions
- deep interpretation of the data using given graphs and summaries
- selecting from given graphs and summary data
- modelling with trend lines for bivariate data
- modelling with distributions and hypothesis testing
- describing a situation where data needed to be collected and how it might be done
- using and interpreting correlation coefficients (A-level only).

Students should be prepared for exam questions that require knowledge of any of the above in an exam.

## Material advantage questions

Examples of questions that give a material advantage to students who have studied the large data set can be found in the sample assessment materials for AS (Paper 2, questions 14 and 16(b)) and A-level (Paper 3, questions 10(a) and 10(c)).

In answering these questions, students would have gained a material advantage through:

- understanding the categories and sub-categories that the large data set uses
- understanding how values in the large data set are rounded
- knowledge of trends in the data
- knowledge of outliers and other anomalies in the data.