Thursday, April 16, 2015

MEASURES OF CENTRAL TENDENCY

Measures of central tendency describe how close something is to "the middle". The three most common measures of central tendency in statistics are: MEAN, MEDIAN and MODE. First, let us discuss what each is, and then when and how each can best be used.

MEAN 

Add up all the observations and divide by the total number of observations. This is also called the "arithmetic average".

EXAMPLE: 

5 students are asked their shoe size. Their shoe sizes are as follows: 6, 8, 9, 9 and 10. The average is: 6 + 8 + 9 + 9 + 10 = 42 divided by the number of students (5). So the MEAN is 42/5 = 8.4.

MEDIAN

Line up all of the observations in order and find the value that is in the middle: 

EXAMPLE: 

Using the same example on shoe size, the median is: 6, 8, 99, 10 .

So here the median is clearly 9, but it is not always that simple. What if we have two middle numbers as follows?

6, 7, 8, 99, 10

8 and 9 are both in the middle, so which is the median? If there are two numbers in the middle, take the average = (8 + 9)/2 = 8.5. 

So in this second case, the median is 8.5...right between 8 and 9, the two middle numbers. 

It can also be tricky if you are using ordinal level data rather than ratio level data. For example, consider the following table showing the number of responses for different numbers of TV hours watched per week:


TV hours watched this week
Number of people
0-5
4
6-9
2
10+
1


Let's line up everyone in a row: 

Four people watched "0-5", two watched "6-9" and one watched "10+":  0-5, 0-5, 0-5, 0-5, 6-9, 6-9, 6-9, 10+.

REMEMBER: that the median is the value that corresponds to the middle observation, not the number of the middle observation itself! Some students are tempted to say that the median is "4", because the 4th observation is right in the middle. This is not the case! The median here is "0-5", not "4".  

FOR MORE ON THIS SEE THE FREQUENCY TABLE PAGE.

MODE

The mode is simply the most common (most frequent) response. 

EXAMPLE:

In the example above on TV hours, the mode is "0-5 hours". More people watched 0-5 hours than any other category. 

There can also be more than one mode in a given dataset. Consider the following table of ages of season 19 contestants on The Bachelor:

Name
Age
Mackenzie
21
24
Megan
24
Alissa Giambrone
24
Jordan Branch
24
24
Jillian Anderson
25
25
25
Michelle
25
26
Becca Tilley
26
26
Tara Eddings
26
26
27
Samantha Steffen
27
Jade Roper
28
28
Kimberly Sherbach
28
28
29
29
29
29
29
30
30
Nicole
31
Trina Scherenberg
33

Five were 24, five were 26 and five were 29. So there are 3 modes: 24, 26 and 29. This dataset is said to be "multi-modal".

WHEN TO USE MEAN, MEDIAN AND MODE:

Mean, median and mode are all "averages"; however, when people say "average" they are often talking about the mean, or "arithmetic average".  So how do you know when to use each one. 

  • The mean is the most sensitive to outliers, so if you have a dataset with very extreme cases (or observations) you may be better of using the median. 
INCOME is a classic example. If Bill Gates walks into a room with 6 other people who make $50,000 a year, the mean income would be over $50 million. But that is not a very average picture of what the typical person makes! The median and mode would still be $50,000 with or without Bill Gates, and either is a better representation of what is typical!

  • The mean is not an option for nominal and ordinal level data (see the page of level of measurement for more on this). 
How can you compute the mean of 3 cats, 4 dogs and 5 chickens (nominal) or even the mean TV hours in the example above (the categories are not equal)? 

  • Median and mode can be used with ordinal level data
  • Mode can be used with nominal data

TABLE: WHICH MEASURES OF CENTRAL TENDENCY CAN BE USED WITH WHICH LEVEL OF MEASUREMENT

Measure can be used?

Mean
Median
Mode
Nominal
No
No
Yes
Ordinal
No
Yes
Yes
Ratio
Yes
Yes
Yes
SEE MORE ON THE PAGE ON LEVELS OF MEASUREMENT.

No comments:

Post a Comment