Measures of central tendency

The first step in dealing with any set of numbers usually involves calculating an average.  By average we usually mean mean. However, you should remember from GCSE maths that there are three measures of central tendency or average, (mean, median and mode) and you’ll need to know how to calculate each, and also the good and bad points relating to each.

Mean

The mean is the sum of average items. To calculate the mean add your numbers up and divide by the number there are.

So, in the case above it would be (3 + 8 + 5)/5 which equals 5. Getting a whole number was pure luck; you are not often likely to end up with a whole number!

The mean is very sensitive because it uses all the data.  In the example above it uses all eight numbers.
If there are any significantly anomalous results, then this will really affect the mean, making it far less useful as a measure of central tendency. If you have outliers or extreme values, then avoid using the mean!

The mean is most useful when the data approximates to a normal distribution.

Median


Place your numbers in ascending (or descending) order and pick out the middle one.  How much more central could you get.  If there’s an even number then find the half way point between the two middle numbers:

19, 21, 23, 24, 32, 33, 36, 45
The middle two numbers are 24 and 32.   So our median is 28.  If in doubt add together your two middle numbers and divide by two.  24 + 32 = 56.  56/2 = 28. The advantages and disadvantages are the reversal of the mean. One visual example is the following. Imagine a group of children, all similar ages:

  • In this example the median is a good measure of central tendency, as it reflects the average age of the group.

  • In this example, one child is replaced with an adult. The mean age is still 12.5 years, but fails to take into account the older person. The median is not sensitive to outliers or extreme values.

It isn’t unduly influenced by extreme values.  Substituting 32 by 320 19, 21, 23, 24, 33, 36, 45, 320.  The median now lies between 24 and 33… 28.5. The median has been adjusted by 0.5.
However, it ignores most of the numbers we’ve gone to all that trouble to collect.  We might have hundreds of numbers and we use the middle two!

Mode

Which number is most common number.  In our first set 23 occurs twice so the mode is 23.

  • Picture
  • Sometimes: 19, 21, 21, 21, 24, 27, 33, 33, 33, 39 two numbers share the honours, three 21s and three 33sIn this case we have two modes, a bimodal distribution.  Often this could be the best way of describing our set of data.  For example the males in some species of fish tend to be either big or small with few mediums.  A bipolar distribution would best explain these.

    Left: a bimodal distribution

Super quick and easy to calculate!
However, imagine the rats had taken the following times:19, 21, 23, 23, 24, 25, 33, 33, 33Our mode is 33!  Remember we want a figure of central tendency.  Too frequently the mode does not provide this.

Measures of Dispersion

Dispersion refers to the extent to which a set of data is spread out, or dispersed from the ‘average’. explanatory context. There are various measures of dispersion, such as range and standard deviation.

Range

This is the simplest method of calculating dispersion.   Basically it is the difference between the largest and the smallest value in the data:

19, 21, 23, 23, 24, 25, 33, 33, 33

In the example above range is 33-19 = 14

Obviously this is easy to calculate but it doesn’t consider all the data and since it is based only on the greatest and smallest values it is very, very prone to outliers!

Standard deviation

This is the average distance of scores away from the mean. Essentially a mean is calculated and all other pieces of data are then compared to the mean.  As a result it uses all of the data.  After a little jiggery-pokery involving squaring and square rooting, we end up with a magical number.

To explain its value, we can use the example of IQ.  Tests are updated to ensure that the mean IQ is maintained at 100.  The standard deviation is 15:

 If we drop one standard deviation (SD) below the mean to 85 (100-15) and move one standard deviation up to 115 (100+15), about 68% of our population will fall between these two numbers.  Magically this is not just true of IQ and SDs of 15!  Given any set of normally distributed data, 68% will always fall within one standard deviation of the mean!!!

Standard deviation therefore doesn’t just tell us the spread; it allows us to quantify that spread.

The smaller the standard deviation the closer our data is to the mean. It isn’t widely spread out, it is consistent.  A large standard deviation tells us the opposite.  Data is widely spread out around the mean, it is inconsistent.

If I set you a mock (let’s say out of 35) and the average mark was 24 (dreaming J) with a standard deviation of 4, that would tell me that most people (68% in fact) scored between 21 and 28.  That’s a tightly packed set of data!

If the SD was 10 that would suggest there was a huge variety of scores, 68% falling between 14 and 34 and the other 32% being even more widely spread than that.