Measures of central tendency
The first step in dealing with any set of numbers usually involves calculating an average. By average we usually mean mean. However, you should remember from GCSE maths that there are three measures of central tendency or average, (mean, median and mode) and you’ll need to know how to calculate each, and also the good and bad points relating to each.
The mean is the sum of average items. To calculate the mean add your numbers up and divide by the number there are.
So, in the case above it would be (3 + 8 + 5)/5 which equals 5. Getting a whole number was pure luck; you are not often likely to end up with a whole number!
The mean is most useful when the data approximates to a normal distribution.
Place your numbers in ascending (or descending) order and pick out the middle one. How much more central could you get. If there’s an even number then find the half way point between the two middle numbers:
In this example the median is a good measure of central tendency, as it reflects the average age of the group.
In this example, one child is replaced with an adult. The mean age is still 12.5 years, but fails to take into account the older person. The median is not sensitive to outliers or extreme values.
Which number is most common number. In our first set 23 occurs twice so the mode is 23.
- Sometimes: 19, 21, 21, 21, 24, 27, 33, 33, 33, 39 two numbers share the honours, three 21s and three 33sIn this case we have two modes, a bimodal distribution. Often this could be the best way of describing our set of data. For example the males in some species of fish tend to be either big or small with few mediums. A bipolar distribution would best explain these.
Left: a bimodal distribution
Measures of Dispersion
Dispersion refers to the extent to which a set of data is spread out, or dispersed from the ‘average’. explanatory context. There are various measures of dispersion, such as range and standard deviation.
This is the simplest method of calculating dispersion. Basically it is the difference between the largest and the smallest value in the data:
19, 21, 23, 23, 24, 25, 33, 33, 33
In the example above range is 33-19 = 14
Obviously this is easy to calculate but it doesn’t consider all the data and since it is based only on the greatest and smallest values it is very, very prone to outliers!
This is the average distance of scores away from the mean. Essentially a mean is calculated and all other pieces of data are then compared to the mean. As a result it uses all of the data. After a little jiggery-pokery involving squaring and square rooting, we end up with a magical number.
To explain its value, we can use the example of IQ. Tests are updated to ensure that the mean IQ is maintained at 100. The standard deviation is 15:
Standard deviation therefore doesn’t just tell us the spread; it allows us to quantify that spread.
The smaller the standard deviation the closer our data is to the mean. It isn’t widely spread out, it is consistent. A large standard deviation tells us the opposite. Data is widely spread out around the mean, it is inconsistent.
If I set you a mock (let’s say out of 35) and the average mark was 24 (dreaming J) with a standard deviation of 4, that would tell me that most people (68% in fact) scored between 21 and 28. That’s a tightly packed set of data!
If the SD was 10 that would suggest there was a huge variety of scores, 68% falling between 14 and 34 and the other 32% being even more widely spread than that.