Thursday, May 7, 2015

MEASURES OF VARIANCE

HOW FAR SPREAD OUT SOMETHING IS 

This unit explains different measures of variance. Measures of variance refer to how spread out a dataset is. Measures of variance include:

  • The range
  • Interquartile range
  • Standard deviation
  • Variance (this one seems obvious!)
After you complete this page and the quizzes on it, you should have a pretty solid foundation for understanding measures of variance. 

If you think you may already be a pro at this, just skip the explanations and go right to the quizzes. If you pass all the quizzes, you may be ready to move on! 

Range = The biggest number - the smallest ("Max-Min")

Pretty easy...Why don't you have a go at it:

What is the range of the following dataset?

2,3,3,3,4,5,5,6,7,9,10

    2 to 10
    9
    8
    7
    6



How did you do?







Interquartile Range=The 3rd quartile-the 1st quartile. 

So, what's a quartile? It is literally a quarter (like 25 cents). So the first thing to do is to identify what 1 quarter of the data is. 

Look at this little dataset: 1,1,2,3,3,3,3,4,5,6,8,9. 

There are 12 observations, so a quarter (or 1/4) of 12 is 3. 

This means that 1,1,2 are the first quarter (or 4th or first 25%) of the dataset.

3,3,3 are the next (2nd) quarter.

3,4,5 are the 3rd quarter.

And 6,8,9 are the fourth or last quarter. 

However, in statistics we often talk about quartiles instead of quarters. Quartile means that little infinitesimally small point "between" one quartile and the next. So the first quartile is right between 2 and 3. In this case, we average the two numbers (2 + 3 / 2=2.5). 

The 1st quartile is 2.5

The 2nd quartile is 3

The 3rd quartile is 5.5 (right between 5 and 6)

And...wait for it...There is no "4th quartile"! At least not that we talk about in statistics. Some people challenge this, and I suppose there is a theoretical 4th quartile right after the last number, but in this case, we don't know what the number after that is, so we can't average it anyway...

Despite a few people that want to talk about a "4th" quartile you will never really see it pop up--so no worries!

Now, we have the 1st, 2nd and 3rd quartiles. 


SIDE NOTE: It turns out that the 2nd quartile (sometimes called the "middle" quartile) is also the median. (Remember the median is the number right in the middle? So is the 2nd quartile!)
Now that you know the quartiles, the interquartile range is very straightforward: Find the 3rd quartile and the 1st quartile, then subtract the 1st from the 3rd. 

Interquartile range = the 3rd quartile - the 1st quartile. 

NOTE OF CAUTION: In the example above we had 12 observations and 4 divided nice and evenly into it. But that is not always the case. Consider this mini dataset:

4,5,6,6,6,7,7,8,9,9,10

Here we have 11 observations. So 11/4=2.75. So it is a little harder to brake it up into quartiles (4ths). To do this, first find the median:


4,5,6,6,6,7,7,8,9,9,10

Median=7. 

Now divide the dataset into two smaller datasets, including the median in each:

4,5,6,6,6,7
              7,7,8,9,9,10

Now, find the median of each half (remember to include the median in each):

4,5,6,6,6,7 (6+6)/2=6

So Q1=6.

7,7,8,9,9,10 (8+9)/2=8.5

So Q3=8.5

Now we have 
Q1=6
Q2=7 (the median)
Q3=8.5

Can you compute the interquartile range? Remember it is just Q3-Q1 :)

IQR: 8.5-6=2.5
The interquartile range is often shown visually through a graphic known as the boxplot or box and whisker plot. 

bow and whisker plot explained

Notably, the "min" and "max" exclude outliers so they may not match up with the very smallest and very largest numbers. It is different with each software package you use, but it is often 3 times the IQR above the mean (for the "max" line) and 3 times the IQR below the mean (for the "min" line). 

Here the boxplot is turned sideways, but it can be shown vertically as well: