# Subject content

This is an extract of the full specification, which you can download from this page.

## 10 Statistics 1

### Introduction

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following fomula, which is not included in the formulae booklet, but which may be required to answer questions.

$(\mathrm{residual})_i = y_i-a-bx_i$

### 10.1 Numerical Measures

##### Standard deviation and variance calculated on ungrouped and grouped data.

Where raw data are given, candidates will be expected to be able to obtain standard deviation and mean values directly from calculators.

Where summarised data are given, candidates may be required to use the formula from the booklet provided for the examination. It is advisable for candidates to know whether to divide by   $n$   or   $(n-1)$ when calculating the variance; either divisor will be accepted unless a question specifically requests an unbiased estimate of a population variance.

##### Linear scaling.

Artificial questions requiring linear scaling will not be set, but candidates should be aware of the effect of linear scaling on numerical measures.

##### Choice of numerical measures.

Candidates will be expected to be able to choose numerical measures, including mean, median, mode, range and interquartile range, appropriate to given contexts. Linear interpolation will not be required.

### 10.2 Probability

##### Elementary probability; the concept of a random event and its probability.

Assigning probabilities to events using relative frequencies or equally likely outcomes. Candidates will be expected to understand set notation but its use will not be essential.

##### Addition law of probability.

$\mathrm{P}(A \cup B) = \mathrm{P}(A) + \mathrm{P}(B) - \mathrm{P}(A \cap B)$; two events only.

##### Mutually exclusive events.

$\mathrm{P}(A \cup B) = \mathrm{P}(A) + \mathrm{P}(B)$; two or more events.

$\mathrm{P}(A') = 1 - \mathrm{P}(A).$

##### Multiplication law of probability and conditional probability.

$\mathrm{P}(A \cap B) = \mathrm{P}(A) \times \mathrm{P}(B | A) = \mathrm{P}(B) \times \mathrm{P}(A | B)$; two or more events.

##### Independent events.

$\mathrm{P}(A \cap B) = \mathrm{P}(A) \times \mathrm{P}(B)$; two or more events.

##### Application of probability laws.

Only simple problems will be set that can be solved by direct application of the probability laws, by counting equally likely outcomes and/or the construction and the use of frequency tables or relative frequency (probability) tables. Questions requiring the use of tree diagrams or Venn diagrams will not be set, but their use will be permitted.

### 10.3 Binomial Distribution

##### Discrete random variables.

Only an understanding of the concepts; not examined beyond binomial distributions.

##### Calculation of probabilities using formula.

Use of $\Bigg(\begin{matrix}n\\ x \end{matrix} \Bigg)$ notation.

##### Mean, variance and standard deviation of a binomial distribution.

Knowledge, but not derivations, will be required.

### 10.4 Normal Distribution

##### Continuous random variables.

Only an understanding of the concepts; not examined beyond normal distributions.

##### Properties of normal distributions.

Shape, symmetry and area properties. Knowledge that approximately   $\frac{2}{3}$ of observations lie within   $\mu \pm \sigma$, and equivalent results.

##### Calculation of probabilities.

Transformation to the standardised normal distribution and use of the supplied tables. Interpolation will not be essential; rounding $z$-values to two decimal places will be accepted.

##### Mean, variance and standard deviation of a normal distribution.

To include finding unknown mean and/or standard deviation by making use of the table of percentage points. (Candidates may be required to solve two simultaneous equations.)

### 10.5 Estimation

##### Population and sample.

To include the terms 'parameter' and 'statistic'.

Candidates will be expected to understand the concept of a simple random sample. Methods for obtaining simple random samples will not be tested directly in the written examination.

##### Unbiased estimators of a population mean and variance.

$\bar{X}$   and   $S^2$ respectively.

##### The sampling distribution of the mean of a random sample from a normal distribution.

To include the standard error of the sample mean,

##### $\frac{\sigma}{\sqrt{n}}$
, and its estimator,

##### A normal distribution as an approximation to the sampling distribution of the mean of a large sample from any distribution.

Knowledge and application of the Central Limit Theorem.

##### Confidence intervals for the mean of a normal distribution with known variance.

Only confidence intervals symmetrical about the mean will be required.

##### Confidence intervals for the mean of a distribution using a normal approximation.

Large samples only. Known and unknown variance.

##### Inferences from confidence intervals.

Based on whether a calculated confidence interval includes or does not include a 'hypothesised' mean value.

### 10.6 Correlation and Regression

##### Calculation and interpretation of the product moment correlation coefficient.

Where raw data are given, candidates should be encouraged to obtain correlation coefficient values directly from calculators. Where summarised data are given, candidates may be required to use a formula from the booklet provided for the examination. Calculations from grouped data are excluded. Importance of checking for approximate linear relationship but no hypothesis tests. Understanding that association does not necessarily imply cause and effect.

##### Calculation of least squares regression lines with one explanatory variable. Scatter diagrams and drawing a regression line thereon.

Where raw data are given, candidates should be encouraged to obtain gradient and intercept values directly from calculators. Where summarised data are given, candidates may be required to use formulae from the booklet provided for the examination. Practical interpretation of values for the gradient and intercept. Use of line for prediction within range of observed values of explanatory variable. Appreciation of the dangers of extrapolation.

##### Calculation of residuals.

Use of $(residual)_i = y_i - a - bx_i$ . Examination of residuals to check plausibility of model and to identify outliers. Appreciation of the possible large influence of outliers on the fitted line.

##### Linear scaling.

Artificial questions requiring linear scaling will not be set, but candidates should be aware of the effect of linear scaling in correlation and regression.

## 11 Statistics 2

### Introduction

Candidates will be expected to be familiar with the knowledge, skills and understanding implicit in the module Statistics 1.

The emphasis is on using and applying statistics. Appropriate interpretation of contexts and the outcomes of statistical procedures will be required.

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formulae, which are not included in the formulae booklet, but which may be required to answer questions.

$\text{P(Type I error)} = \text{P(reject} \space \mbox{H}_0 | \mbox{H}_0 \space \text{true)}$   and

$\text{P(Type II error)} = \text{P(accept} \space \mbox{H}_0 | \mbox{H}_0 \space \text{false)}$

### 11.1 Time Series Analysis

##### Seasonal variation, trend, short-term and random variation.

Questions may require the use of regression to estimate trend.

Additive model assumed for seasonal effects.

##### Use of moving averages to estimate seasonal effects, to deseasonalise series and to make short-term forecasts.

Candidates should appreciate that numerical techniques can only project past patterns into the future and should not be expected to give accurate forecasts.

### 11.2 Sampling

##### Simple (without replacement) and unrestricted (with replacement) random samples. Use of random numbers from tables or calculators to obtain random samples.

Variance of sample mean not required for sampling without replacement.

##### Stratified random sample.

Use of prior information to make sample more representative of population. Calculation of means and variances not required.

##### Cluster, quota and systematic sampling.

Use to overcome practical problems of sampling. Advantages and disadvantages.

### 11.3 Discrete Probability Distributions

##### Expectation and variance.

Use of:

###### $\mbox{Var}(X) = \mbox{E}(X^2) - [\mbox{E}(X)]^2$

Candidates will be expected to apply these and to interpret the results in real-world situations.

##### Modelling a real-world situation using a Poisson distribution.

Evaluation of probabilities using formula will not be required.

Use, but not proof, of mean and variance of Poisson distribution may be tested.

##### Use of tables, distribution of the sum of independent Poisson distributions.

Questions may require knowledge of binomial distribution from module Statistics 1.

##### Knowledge of the conditions necessary for a Poisson Model to be applicable.

Candidates will be required to determine whether a Poisson distribution is appropriate in a particular real-world situation.

### 11.4 Interpretation of Data

##### Data may be presented in the form of diagrams, tables of secondary data, summary statistics and/or associated analysis.

Candidates may be asked to construct and interpret pie charts, line diagrams, box and whisker plots, cumulative frequency diagrams and scatter diagrams. Construction or interpolation of histograms will not be required. (This statement is included now as a histogram question appeared, in error, on the specimen paper.)

### 11.5 Application of Hypothesis Testing

##### Null and alternative hypothesis, significance levels, one and two tailed tests.

Questions may require understanding of the concept of $\text{Type I}$ errors $(\mbox{reject}\space \mbox{H}_0 \space | \space \mbox{H}_0 \space \mbox{true})$ and $\text{Type II}$ errors $(\mbox{accept} \space \mbox{H}_0 \space | \space \mbox{H}_0 \space \mbox{false})$ but questions requiring the calculation of the risk of $\text{Type II}$ errors will not be set.

##### Tests for means based on: 1. a sample from a normal distribution with known standard deviation; 2. a large sample from an unspecified distribution.

Appreciation of the need for random samples and of the necessary conditions.

Candidates will be required to identify and apply a suitable test appropriate to a particular context.

Interpretation of results in context. Appreciation of the need for random samples.

## 12 Statistics 3

### Introduction

Candidates will be expected to be familiar with the knowledge, skills and understanding implicit in the modules Statistics 1 and Statistics 2.

The emphasis is on using and applying statistics. Appropriate interpretation of contexts and the outcomes of statistical procedures will be required.

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formulae, which are not included in the formulae booklet, but which may be required to answer questions.

Contingency Tables

$\mathrm{E} = (\mbox{row} \space \mbox{total} \times \mbox{column} \space \mbox{total}) / \mbox{grand} \space \mbox{total}$

For an $m \times n$ table the degrees of freedom are $(m-1)(n-1)$

Yates' correction for a $2 \times 2$ contingency table is

$\Sigma \big(|\mbox{O} - \mbox{E}| - 0.5 \big)^2 / \mbox{E}$

### 12.1 Application of Contingency Tables in Real-world Situations

##### Use of   $\Sigma \Big(\mbox{O} - \mbox{E} \Big)^2 / \mbox{E}$ as an approximate   $\chi{^2}$-statistic. Conditions for approximation to be valid.

Identification and application of the appropriate test and the interpretation of the results in context.

The convention that all $\mathrm{E}$s should be $\gt{5}$ will be expected.

Yates' correction for  $2 \times 2$ contingency tables will be required.

### 12.2 Distribution Free Methods

##### 1. Tests of Average Sign test (for medians) and Wilcoxon signed-rank test (for medians/means). Choice of appropriate test in particular circumstances.

The Wilcoxon signed-rank test assumes that the distribution is symmetrical and consequently that the mean and median are identical. Questions may require choice between sign test, Wilcoxon signed-rank test and $z$-test (from module Statistics 2).

##### 2. Analysis of Paired ComparisonsUse of sign test and Wilcoxon signed-rank test to analyse results of a paired comparison.

Questions may be set which require an appreciation of simple ideas of experimental design - replication, randomisation and paired comparisons.

##### 3. Two Independent SamplesMann-Whitney U test to test hypothesis that two independent samples come from identical populations.

Although the hypothesis is that the populations are identical in every respect, only a difference in mean is likely to lead to $\mathrm{H_0}$ being rejected. Normal approximations to the critical values of the Wilcoxon and Mann-Whitney tests will not be required.

##### 4.More Than Two Independent Samples Kruskal-Wallis test to test the hypothesis that more than two independent samples come from identical populations.

Critical values for the Kruskal-Wallis $\mathrm{H}$ statistic are obtained from the  $\chi{^2}$ distribution with   $k - 1$ degrees of freedom where   $k$ is the number of samples compared.

Candidates will not be expected to rank results from more than 3 samples.

### 12.3 Correlation

##### Spearman's rank correlation coefficient. Use of tables to test no association between ranks.

Defined as the product moment correlation coefficient between ranks. For tied ranks, the convention of giving the mean rank to each equal item will be expected.

##### Use of tables to test   $\rho = 0$ for a bivariate normal distribution. Choice of appropriate correlation coefficient in particular cases.

Where  $\rho$ is the product moment correlation coefficient.

## 13 Statistics 4

### Introduction

Candidates will be expected to be familiar with the knowledge, skills and understanding implicit in the modules Statistics 1, Statistics 2 and Statistics 3.

The emphasis is on using and applying statistics. Appropriate interpretation of contexts and the outcomes of statistical procedures will be required.

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formulae, which are not included in the formulae booklet, but which may be required to answer questions.

When   $X \space \mbox{is} \space \mbox{N}(\mu_x, \sigma_x{^2})$ and  $Y$ is independently  $\mbox{N}(\mu_y, \sigma_y{^2})$  then

$aX \pm bY$   is   $\mbox{N}(a\mu_x \pm b\mu_y, a^2 \sigma_x{^2} \space + \space b^2 \sigma_y{^2})$

### 13.1 Continuous Probability Distributions

##### Distribution of a linear combination of independent normal random variables.

Applied to practical situations. Interpretation of results in context.

### 13.2 Distributional Approximations

##### Poisson approximation to binomial. Normal approximation to binomial and Poisson distributions.

Candidates will be required to recognise that a particular approximation is appropriate in a particular context.

Conditions for approximations to be appropriate.

Continuity corrections required.

Calculations of Poisson probabilities using formula may be required.

Properties of  $e^x$ are not required.

### 13.3 Estimation in a Real-world Context

##### Application of confidence intervals for mean based on a sample from a normal distribution with unknown standard deviation using the t-distribution.

Questions may involve knowledge of confidence intervals from module Statistics 1.

Only confidence intervals symmetrical about the mean will be considered.

Candidates will be required to interpret the meaning of a confidence interval in the context of a problem.

##### Approximate confidence intervals, using normal approximations, for proportions and for the mean of a Poisson distribution.

Continuity correction not required.

### 13.4 Application of Hypothesis Testing

##### Hypothesis tests for mean based on a sample from a normal distribution with unknown standard deviation using the t-distribution.

Candidates will be expected to identify and apply a suitable test in context.

Questions may involve knowledge of hypothesis tests for mean from module Statistics 2.

##### Hypothesis tests for proportions and for the mean of a Poisson distribution.

Using exact probabilities or, where appropriate, normal approximations.

Continuity correction not required.

Interpretation of results in context.

## 14 Statistics 5

### Introduction

Candidates will be expected to be familiar with the knowledge, skills and understanding implicit in the modules Statistics 1, Statistics 2, Statistics 3 and Statistics 4.

The emphasis is on using and applying statistics. Appropriate interpretation of contexts and the outcomes of statistical procedures will be required.

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formulae, which are not included in the formulae booklet, but which may be required to answer questions.

For an exponential distribution   $\mbox{P}(X \leqslant x) = 1 - e^{-\lambda x}$

### 14.1 Continuous Probability Distributions

##### Rectangular and Exponential distributions.

Candidates will be expected to recognise when these are appropriate models for a given real-world situation.

Use but not proof of mean and variance. Questions on exponential distribution will be solvable using cumulative distribution function only.

Integration will not be required or expected, but may be used by candidates.

### 14.2 Estimation

##### Determination of confidence intervals for variance and standard deviation based on a sample from a normal distribution.

Using   $\chi{^2} .$

Knowledge of the necessary conditions for application and deductions in context.

Questions may be set which require the calculation of confidence intervals for the mean using knowledge from modules Statistics 1 and/or Statistics 4.

### 14.3 Application of Hypothesis Testing

##### Tests for variance and standard deviation based on a sample from a normal distribution.

Throughout this section, candidates will be required to identify and apply a test appropriate to the context of a real-world situation. Interpretation of the results of such tests in context will be required.

Using   $\chi{^2} .$

Questions may be set which require hypothesis tests for the mean using knowledge from modules Statistics 2 and/or Statistics 4.

##### Goodness of fit test using   $\Sigma(\mbox{O}- \mbox{E})^2 / \mbox{E}$ as an approximate   $\chi{^2}$ -statistic. Conditions for approximation to be valid.

The convention that all $\mathrm{Es}$ should be $> 5$ will be expected.

Tests may be required for binomial, Poisson, normal, rectangular, exponential or specified discrete distributions.

Integration will not be required.

##### 1. tests for equality of variance of two normal distributions;

Using  $F$.

##### 2. tests for equality (or for a given difference) of means for two normal distributions with known variances or with unknown but equal variances.

Using  $z$.

Using  $t$.

## 15 Statistics 6

### Introduction

Candidates will be expected to be familiar with the knowledge, skills and understanding implicit in the modules Statistics 1, Statistics 2, Statistics 3, Statistics 4 and Statistics 5.

The emphasis is on using and applying statistics. Appropriate interpretation of contexts and the outcomes of statistical procedures will be required.

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formulae, which are not included in the formulae booklet, but which may be required to answer questions.

For Latin squares,  $\mathrm{SS_R}$ and, $\mathrm{SS_C}$ as for two-factor model with   ${m} = {n}$ and

$\mathrm{SS_L} = \Sigma L_k{^2} / n-T^2/n^2$

Warning limits for means chart are  $\mu \pm 1.96 \sigma/ \sqrt{n}$

Action limits for means chart are  $\mu \pm 3.09 \sigma / \sqrt{n}$

where $\mu$ is the target value.

### 15.1 Experimental Design

#### Experimental error, randomisation, replication. Control and experimental groups, blind and double blind trials. Use of paired comparisons and blocking to reduce experimental error.

Reduction of experimental error by standardising conditions.

Use of replication to estimate magnitude of experimental error and randomisation to eliminate unconscious bias.

#### Analysis of paired comparisons using paired t-test.

Questions may be set which require the use of the sign test or Wilcoxon signed-rank test from module Statistics 3 to analyse paired comparisons.

### 15.2 Analysis of Variance

##### One-way analysis of variance, completely randomised design.

Including an appreciation of the underlying model, i.e. additive effects with experimental errors  $\mbox{N}(0, \sigma{^2})$. Interpretation of results in context.

##### Two-way analysis of variance without replicates, randomised block design.

Motivated by the idea of blocking. Appreciation of the assumption of no interaction. Interpretation of results in context.

##### Latin squares - purpose, construction and analysis.

Appreciation of assumption of no interactions. Latin squares with replicates excluded. Interpretation of results in context.

### 15.3 Statistical Process Control

##### Construction and use of charts for mean, range, standard deviation and proportions.

Where a target value is given, this should be used as the centre-line of the chart for means.

Advantages and disadvantages of using attributes or variables.

##### Ability of a process to meet tolerances.

Estimate of proportion not meeting tolerances.

### 15.4 Acceptance Sampling

##### Schemes for attributes and variables. Design to meet specific criteria. Construction and use of operating characteristics.

Advantages and disadvantages.

##### Operating characteristics for double sampling plans.

Comparison with single sampling plans.