Measures of central tendency

The first step in dealing with any set of numbers usually involves calculating an average.  By average we usually mean mean. However, you should remember from GCSE maths that there are three measures of central tendency or average, (mean, median and mode) and you’ll need to know how to calculate each, and also the good and bad points relating to each.

Mean

The mean is the sum of average items. To calculate the mean add your numbers up and divide by the number there are.

So, in the case above it would be (3 + 8 + 5)/5 which equals 5. Getting a whole number was pure luck; you are not often likely to end up with a whole number!

The mean is very sensitive because it uses all the data.  In the example above it uses all eight numbers.

The mean is most useful when the data approximates to a normal distribution.

Median


Place your numbers in ascending (or descending) order and pick out the middle one.  How much more central could you get.  If there’s an even number then find the half way point between the two middle numbers:

19, 21, 23, 24, 32, 33, 36, 45
The middle two numbers are 24 and 32.   So our median is 28.  If in doubt add together your two middle numbers and divide by two.  24 + 32 = 56.  56/2 = 28. The advantages and disadvantages are the reversal of the mean. One visual example is the following. Imagine a group of children, all similar ages:

  • In this example the median is a good measure of central tendency, as it reflects the average age of the group.

  • In this example, one child is replaced with an adult. The mean age is still 12.5 years, but fails to take into account the older person. The median is not sensitive to outliers or extreme values.

It isn’t unduly influenced by extreme values.  Substituting 32 by 320 19, 21, 23, 24, 33, 36, 45, 320.  The median now lies between 24 and 33… 28.5. The median has been adjusted by 0.5.

Mode

Which number is most common number.  In our first set 23 occurs twice so the mode is 23.

  • Picture
  • Sometimes: 19, 21, 21, 21, 24, 27, 33, 33, 33, 39 two numbers share the honours, three 21s and three 33sIn this case we have two modes, a bimodal distribution.  Often this could be the best way of describing our set of data.  For example the males in some species of fish tend to be either big or small with few mediums.  A bipolar distribution would best explain these.

    Left: a bimodal distribution

Super quick and easy to calculate!

Measures of Dispersion

Dispersion refers to the extent to which a set of data is spread out, or dispersed from the ‘average’. explanatory context. There are various measures of dispersion, such as range and standard deviation.

Range

This is the simplest method of calculating dispersion.   Basically it is the difference between the largest and the smallest value in the data:

19, 21, 23, 23, 24, 25, 33, 33, 33

In the example above range is 33-19 = 14

Obviously this is easy to calculate but it doesn’t consider all the data and since it is based only on the greatest and smallest values it is very, very prone to outliers!

Standard deviation

This is the average distance of scores away from the mean. Essentially a mean is calculated and all other pieces of data are then compared to the mean.  As a result it uses all of the data.  After a little jiggery-pokery involving squaring and square rooting, we end up with a magical number.

To explain its value, we can use the example of IQ.  Tests are updated to ensure that the mean IQ is maintained at 100.  The standard deviation is 15:

 If we drop one standard deviation (SD) below the mean to 85 (100-15) and move one standard deviation up to 115 (100+15), about 68% of our population will fall between these two numbers.  Magically this is not just true of IQ and SDs of 15!  Given any set of normally distributed data, 68% will always fall within one standard deviation of the mean!!!

Standard deviation therefore doesn’t just tell us the spread; it allows us to quantify that spread.

The smaller the standard deviation the closer our data is to the mean. It isn’t widely spread out, it is consistent.  A large standard deviation tells us the opposite.  Data is widely spread out around the mean, it is inconsistent.

If I set you a mock (let’s say out of 35) and the average mark was 24 (dreaming J) with a standard deviation of 4, that would tell me that most people (68% in fact) scored between 21 and 28.  That’s a tightly packed set of data!

If the SD was 10 that would suggest there was a huge variety of scores, 68% falling between 14 and 34 and the other 32% being even more widely spread than that.