20 Statistics 1

Candidates may use relevant formulae included in the formulae booklet without proof.

Candidates should learn the following formula, which is not included in the formulae booklet, but which may be required to answer questions.

$(\textbf{residual})_i = y_i - a - bx_i$

 Standard deviation and variance calculated on ungrouped and grouped data. Where raw data are given, candidates will be expected to be able to obtain standard deviation and mean values directly from calculators. Where summarised data are given, candidates may be required to use the formula from the booklet provided for the examination. It is advisable for candidates to know whether to divide by   $n \space \text{or} \space (n-1)$ when calculating the variance; either divisor will be accepted unless a question specifically requests an unbiased estimate of a population variance. Linear scaling. Artificial questions requiring linear scaling will not be set, but candidates should be aware of the effect of linear scaling on numerical measures. Choice of numerical measures. Candidates will be expected to be able to choose numerical measures, including mean, median, mode, range and interquartile range, appropriate to given contexts. Linear interpolation will not be required.
 Elementary probability; the concept of a random event and its probability. Assigning probabilities to events using relative frequencies or equally likely outcomes. Candidates will be expected to understand set notation but its use will not be essential. Addition law of probability. Mutually exclusive events. $\mbox{P}(A \cup B)=\mbox{P}(A) + \mbox{P}(B) - \mbox{P}(A \cap B)$; two events only. $\mbox{P}(A \cup B)=\mbox{P}(A) + \mbox{P}(B)$; two or more events. $\mbox{P}(A')= 1-\mbox{P}(A)$. Multiplication law of probability and conditional probability. Independent events. $\mbox{P}(A \cap B) = \mbox{P}(A) \times \mbox{P}(B|A) = \mbox{P}(B) \times \mbox{P}(A|B)$; two or more events. $\mbox{P}(A \cap B) = \mbox{P}(A) \times \mbox{P}(B)$; two or more events. Application of probability laws. Only simple problems will be set that can be solved by direct application of the probability laws, by counting equally likely outcomes and/or the construction and the use of frequency tables or relative frequency (probability) tables. Questions requiring the use of tree diagrams or Venn diagrams will not be set, but their use will be permitted.
 Discrete random variables. Only an understanding of the concepts; not examined beyond binomial distributions. Conditions for application of a binomial distribution. Calculation of probabilities using formula. Use of $\space \displaystyle\binom{n}{x}$ notation. Use of tables. Mean, variance and standard deviation of a binomial distribution. Knowledge, but not derivations, will be required.
 Continuous random variables. Only an understanding of the concepts; not examined beyond normal distributions. Properties of normal distributions. Shape, symmetry and area properties. Knowledge that approximately   $\frac{2}{3}$ of observations lie within and equivalent results. Calculation of probabilities. Transformation to the standardised normal distribution and use of the supplied tables. Interpolation will not be essential; rounding − values to two decimal places will be accepted. Mean, variance and standard deviation of a normal distribution. To include finding unknown mean and/or standard deviation by making use of the table of percentage points. (Candidates may be required to solve two simultaneous equations.)
 Population and sample. To include the terms ‘parameter’ and ‘statistic’. Candidates will be expected to understand the concept of a simple random sample. Methods for obtaining simple random samples will not be tested directly in the written examination. Unbiased estimators of a population mean and variance. $\bar{X} \space \mbox{and} \space S^2 \space$ respectively. The sampling distribution of the mean of a random sample from a normal distribution. To include the standard error of the sample mean, , and its estimator, $\space \displaystyle{\frac{S}{\sqrt{n}}}$. A normal distribution as an approximation to the sampling distribution of the mean of a large sample from any distribution. Knowledge and application of the Central Limit Theorem. Confidence intervals for the mean of a normal distribution with known variance. Only confidence intervals symmetrical about the mean will be required. Confidence intervals for the mean of a distribution using a normal approximation. Large samples only. Known and unknown variance. Inferences from confidence intervals. Based on whether a calculated confidence interval includes or does not include a ’hypothesised’ mean value.
 Calculation and interpretation of the product moment correlation coefficient. Where raw data are given, candidates should be encouraged to obtain correlation coefficient values directly from calculators. Where summarised data are given, candidates may be required to use a formula from the booklet provided for the examination. Calculations from grouped data are excluded. Importance of checking for approximate linear relationship but no hypothesis tests. Understanding that association does not necessarily imply cause and effect. Identification of response (dependent) and explanatory (independent) variables in regression. Calculation of least squares regression lines with one explanatory variable. Scatter diagrams and drawing a regression line theorem. Where raw data are given, candidates should be encouraged to obtain gradient and intercept values directly from calculators. Where summarised data are given, candidates may be required to use formulae from the booklet provided for the examination. Practical interpretation of values for the gradient and intercept. Use of line for prediction within range of observed values of explanatory variable. Appreciation of the dangers of extrapolation. Calculation of residuals. Use of $\space (\text{residual})_i = y_i-a-bx_i$. Examination of residuals to check plausibility of model and to identify outliers. Appreciation of the possible large influence of outliers on the fitted line. Linear scaling. Artificial questions requiring linear scaling will not be set, but candidates should be aware of the effect of linear scaling in correlation and regression.