Wednesday, September 28, 2016

Z SCORES FOR SAMPLE MEANS

CAVEAT

Hopefully everything from the Descriptive vs. Influential statistics module makes sense. If not, consider reviewing it again! We are going to take that information and combine it with what you know about standard deviation to move onward!

Let's start today with a video overview of what we will be discussing:

{VIDEO}

Introduction

In the lesson on Z Scores, we learned that Z Score is nothing scary--it just means "number of standard deviations from the mean". So, what if we have a dataset with a mean of 50 and standard deviation of 15. What is the Z Score of an observation in that dataset with the value 80? 

Take a quiz to see if you remember!

{Quiz. A: Z Score=2}

That quiz is a reminder that Z Score is nothing scary. It just means "number of standard deviations from the mean!" Sometimes we have to first figure out how far away something is from the mean, but it is otherwise quite straightforward!

"So, how can we combine this with inferential statistics?"

I'm glad you asked! We can do the same thing with sample means that we can with observations in a sample.

Observations in a Sample vs. Sample Means in a Distribution

^^^^ That subtitle is a little intense! Let's break it down into English. (As always, this is about translating from Statistics language to English--or really whatever language you are learning stats in!).

Examples usually help make the point:

EXAMPLE OF OBSERVATIONS IN A SAMPLE

This one should seem familiar. Suppose we have a classroom of 10 students (it is a small classroom). They took a stats test and got the following scores:

98, 92, 90, 95, 88, 94, 95, 100, 99, 98

(They are very good students)

If we want to know the Z Score of the first observation (98), what do we do? What we have been doing all along! 

Let's try again (just for fun!).

Start by adding up the observations and dividing by the number of observations:

Xi
98
92
90
95
88
94
95
100
99
98
94.9<--mean!

Now, subtract 94.9 (the mean) from each observation to get the difference of each observation from the mean (that is the heart and soul of variance!).

Xi Xi-Xbar
98 3.1
92 -2.9
90 -4.9
95 0.1
88 -6.9
94 -0.9
95 0.1
100 5.1
99 4.1
98 3.1
Now, add up those differences. REMEMBER, THIS IS A CHECKPOINT AND SHOULD COME OUT TO ZERO, MEANING WE HAVE SUCCESSFULLY BALANCED THE SEESAW!

Xi-Xbar
3.1
-2.9
-4.9
0.1
-6.9
-0.9
0.1
5.1
4.1
3.1
-5.6843E-14

Wait! The mean is not zero, but -0.000000000000056843! 

All is well! In Stats language, we call this "rounding error". It means that we rounded to tenths (0.1) or hundredths (0.01) so our numbers are not exact. That is fine! As long as that number is tiny, we are still OK.

*The very aspiring and astute student will usually ask at this point, "How tiny is tiny?" As a rule of thumb, if this number is less than 0.01% of the mean, you are (probably) OK! So take your mean, multiply by .0001 and if the difference is less than that, chances are it is just rounding error!*

Next, square the differences so that the summed differences don't add up to only zero!

Xi Xi-Xbar diff^2
98 3.1 9.61
92 -2.9 8.41
90 -4.9 24.01
95 0.1 0.01
88 -6.9 47.61
94 -0.9 0.81
95 0.1 0.01
100 5.1 26.01
99 4.1 16.81
98 3.1 9.61
94.9 -5.6843E-14 142.9
So, we have summed differences equal to 142.9!

Now, how do we compute variance? Divide by the number of observations (n)!

142.9/10=14.29

14.29 = variance

How do we get standard deviation? S squared = S squared! Variance is S (standard deviation) squared! So, just take the square root of variance to get S (standard deviation). 

sq rt of 14.29=3.78! This is standard deviation! 

Now, to get the Z Score--the number of standard deviations from the mean--we need to know how far the observation of interest (98) is from the mean:

98-94.9=3.1

So, how do we find out how many standard deviations 3.1 is? The same way you find out how many feet 24 inches is--DIVIDE! 

3.1/3.78=0.82

So, 0.82 is the number of standard deviations the 98 is from the mean! We can translate it like this Z Score of 98=0.82. 

So, the Z Score is 0.82! 

EASY! (?)

EXAMPLE OF SAMPLE IN A DISTRIBUTION

Now, suppose we want to know the last stats exam score for every college student in America! It just became unfeasible to collect all of those scores! It will cost too much (time, money, etc). This means you will have to take the largest sample you can and hope it comes close to the "true" population average! 

OF UTMOST IMPORTANCE--> As the video shows, choosing the largest sample will give you the best chance of getting a sample average that matches the "true" population average. This is due to the "Law of Large Numbers"and the Central Limit Theorem (CLT)! More on these later...

Think back to the video now. Imagine 100,000 test scores (I won't put a column that big in this module, so your imagination will have to do!). Let us suppose that you can afford to do a sample of 100 students. You also have taken a stats class and know that you should take a random sample to give yourself the best chance at that sample average matching the "true" population average. 

Random=All element in the population have a non-zero chance of being chosen and all elements have an equal chance of being chosen. 
Because we are taking a random sample, you never know which 100 observations will be chosen. This means that there will be some variation in the sample's average compared to other possible samples. In the video, we used a simulator to show that a sample of a given size will be a little different each time if you take repeated samples. 

This means that (drum roll...) SAMPLE MEANS HAVE A DISTRIBUTION THEMSELVES!


So, now picture that we take 10 samples of 100 (10 samples of 100 observations each) and we record the average for each sample of 100. The first has an average score of 78, the second has an average of 72, the third of 70 and so on.

Here are the averages for all 10 samples of 100:

78, 72, 70, 75, 68, 74, 75, 80, 79, 78

THERE IS GOOD NEWS AND BAD NEWS!

The good news is that we can still compute Z Scores in the same GENERAL way we have been doing! The bad news is that we now have to make some adjustments because we are not collecting all the information in the population--we are trying to estimate it.

When we estimate, we must adjust.
When we estimate, we must adjust.
When we estimate, we must adjust. 


Let's first look at what we do exactly the same way as what we have been doing:

Means Mean-Avg. of the means diff^2
78 3.1 9.61
72 -2.9 8.41
70 -4.9 24.01
75 0.1 0.01
68 -6.9 47.61
74 -0.9 0.81
75 0.1 0.01
80 5.1 26.01
79 4.1 16.81
78 3.1 9.61
74.9 -5.68434E-14 142.9

Notice that there are slight modifications to the headings because we now have Means in column 1. We still take the average of that column just like always and it gets a little awkward in the wording because we now have a mean of the means...


To slip out of that awkward wording, we call it THE GRAND MEAN!

The Grand Mean!
Here it is again, with the new title:

Means Grand Mean diff^2
78 3.1 9.61
72 -2.9 8.41
70 -4.9 24.01
75 0.1 0.01
68 -6.9 47.61
74 -0.9 0.81
75 0.1 0.01
80 5.1 26.01
79 4.1 16.81
78 3.1 9.61
74.9 -5.68434E-14 142.9


Notice that everything up to this point is just as you are used to.

However, when we divide the sum of squared differences (142.9), we must make some of those adjustments!

When we estimate, we must adjust!

Because we took samples, and did not get information from the whole population, there is going to be some error in our variance and standard deviation if we just divide 142.9 by the number of means (10). This is where "n-1" comes into play! We have 10 means, but we can subtract 1 as an adjustment that will make the variance and standard deviation closer to the "true" population variance and standard deviation.

There will be a quiz in a minute, so be sure to take note:

FOR SD OF OBSERVATIONS IN A SAMPLE, DIVIDE BY N
FOR SD OF MEANS IN A DISTRIBUTION, DIVIDE BY N-1!

DESCRIPTIVE SD = DIVIDE BY N
INFERENTIAL SD =DIVIDE BY N-1!

So 142.9/10 gives variance =14.29
BUT 142.9/9=15.88

***Some of you will be asking, "How do we know that dividing by n-1 gives us a better estimate of the "true" population standard than dividing by just n." Here is proof:

Option 1




Option 2, think of this as more STATS MAGIC! 


If you have all elements of the population, divide by n, if not, divide by n-1!

Most students prefer Option 2...***

Now that we have 142.9/9=15.88, we know the variance.

We can take the square root and have standard deviation! sq rt of 15.88=3.98!

Now you try...

{quiz}

Hopefully that quiz went well. If not, don't worry, you will get more practice!

Means have a distribution

Some of you will have caught on by now, but sample means have a distribution just like observations in a sample! Remember the bell curve?

z-score normal distribution

It still works! It is mostly a theoretical distribution, but if you took all possible samples of a given size, the sample averages would make this same shape! Most would be close to the "true" population mean, right in the center. Few would be extremely larger or smaller than the "true" population mean. But everything holds true for the average of samples in a theoretical distribution of all possible samples compared to observations within a sample. The standard deviation is computed (almost) the same way (just remember n-1 instead of n!), the standard deviations still correspond to their same respective area under the curve at each point. 

Now is a good time to memorize some of those main areas:

Area between -1 and 1=68.2%
Area between -2 and 2=95.4%
Area between -3 and 3=99.6%

Take a second to commit them to memory, then try a quiz!

{quiz}

How did that go? You will definitely be seeing more of those in the future so be sure to memorize them if you didn't get them correct on the quiz!


Population parameters:

The theoretical distribution of sample means has a standard deviation, but because it is a theoretical distribution, we almost never know what it "truly" is. We estimate it like this: standard deviation of your sample divided by the square root of the number in your sample. 

In stats language: s/(√n)

{quiz. How do you write: standard deviation of your sample divided by the square root of the number in your sample in Stats language?}. 

Great! This standard deviation also has a special name "standard error"

Standard error=standard deviation of the theoretical distribution of all possible samplesstandard deviation of your sample divided by the number in the sample=s/(√n)

≈ (mean that it is roughly equivalent. In this case, it is an estimate!)

Conclusions

  • When we have a really big population, and it is not feasible to get information from every single person, we can take a sample and use it to estimate the true population mean!
  • We can take a sample of a given size and compute the mean of that sample in the same way we have been computing the mean all along!
  • The larger the sample, the more likely it will be to be a close estimate of the "true" population mean.
  • We can compute Z Scores for sample means the same way we computed Z Scores for observations in a sample--the only difference is that we divide the sum of squared variance by n-1 instead of just n. That helps make it a more accurate estimate of the "true" population mean.
  • There is a theoretical distribution of all possible random samples of a given size. It has the standard normal shape with standard deviations that correspond to the same respective areas under the curve as the standard normal distribution you have learned about already!
  • The theoretical distribution of all possible random samples of a given size has a standard deviation that is almost never known! You can estimate it by computing the standard deviation of your sample and dividing by the square root of your sample size! This standard deviation has a special name=standard error!

DESCRIPTIVE VS. INFERENTIAL STATISTICS

Introduction to inferential statistics


In the 1991 hit movie "What About Bob?", psychiatric patient Bob Wiley tells his therapist something like, "There are only two kinds of people in the world, those who love Neil Diamond and those who don't."

What About Bob film.jpg
Some might call Bob's statement a "false dichotomy", but there really are only two kinds of statistics: Those that DESCRIBE and those the INFER. 








 
So far, we have been dealing with those that describe. Now, we are going to start talking about statistics that infer! (It is a great day!)

Descriptive statistics are very nice to us...they don't care about anything other than describing the data that we have in front of us. 

The real fun comes when we use statistics to scientifically predict things. When I was a child, I always wanted to become a weather person...I dreamed about how everyone would love me because of my ability to predict the weather. Almost like magic, I would predict impending weather disasters and save entire cities. Well, I became a statistician, and you are becoming one too! So, we can do similar things (and, sadly, sometimes get it just as wrong as the weather people :(  More on this later...)

Samples: Because collecting data from everyone is too costly!


The entire group of people* you want to study is called the "population". If you can get information from everyone in your population, you are all set! There is nothing else to do besides the descriptive statistics we have been learning: mean, median, mode (measures of central tendency), standard deviation, variance, skew, kurtosis, and maybe some nice visuals: bar chart, pie chart, histogram, frequency table and so on.

Those things tell you all about--or describe--your population.

So, where do inferential statistics come in? We need inferential statistics when we are no longer able to gather data from the entire population!


Gathering data from every person in the entire population is called a "census". Hence the name of the U.S. Census that happens every 10 years--they are trying to gather data from everyone!

Q. So, why not gather data from the whole population?
A. It is too costly!

If you can afford to collect data from every member of your population (the group of people that you want to study), then do it! Usually, that is not the case.

{Quiz}

If you need to know how many people in your office prefer cheese pizza, pepperoni, or black olive, then what is your population?

{Quiz}

Thanks for taking that quiz! Hopefully by now you realize that your population is the group of people you want to study!

After you ask everyone in your office, you might have a frequency table like this:

Fx %
Cheese 5 35.7%
Pepperoni 7 50.0%
Black Olive 2 14.3%
TOTAL 14 100.0%

Now, there is nothing else to do! You know you will need 35.7% of your pizza to be cheese, 50% to be pepperoni and so on (hopefully the pizza place is good at math!).

But what if your population is on a bigger scope? Sociologists, doctors and others are often interested in national trends (or even global).

So, what if your question is: "What are the pizza preferences OF AMERICANS?"

Imagine that. Really imagine that. If your next assignment said, "Find out the pizza preference of each and every American." How would you do it? A survey would cost you THOUSANDS of dollars even if you had already identified every American. Even then, perhaps only 10% will respond without an incentive. Maybe if you gave everyone a $25 gift card for participating, you could get 50% of people to respond. That would cost you OVER $4 BILLION! Let's make it a $5 gift card and assume we still get a 50% response rate (which we probably won't). That is only (*sarcasm alert*) going to cost you $750,000,000. All of this isn't even taking into account the time it will take to do the surveys and analyze the data or what it will cost to hire staff to do all of those things.

Bottom line: We need a better way!

It turns out that we can estimate (key word alert!) ESTIMATE these preferences by taking a sample, a smaller number of people from our total population.

YOU WILL NOW BE LEARNING 2 RULES OF SAMPLING IN DETAIL:

1) If you use a random sample, you can use it to estimate the "true" statistics of your population (The "true" statistics of your population are called parameters. So we have Sample Statistics and Population Parameters! SS, PP).

2) The larger your sample is, the more confident you are that it is a good reflection of the population parameters.

Now, allow me to translate into English from Statistics language using the pizza example:

When you cannot get information about pizza preferences for every single person in the group you are interested in studying, you can use a smaller number of people to estimate pizza preferences for the large group! Using this smaller group means that we may or may not get numbers that match the large group. But we can improve our chances of it by doing two things. First, choose your smaller group without any "rhyme or reason". More specifically, you need to use a method of selection that gives every person in the population a chance to be chosen, and that gives everyone an equal change of being chosen. Rolling dice is a good example (as long as it is not a weighted di from Las Vegas!). The second thing is the get the biggest number of people that you can! The more people you have, the more likely it will be that your numbers match what is really going on in the population!

Whew! See why Statistics is a helpful language? Instead of the whole paragraph above, once you know Statistics language we can just say: You may use a random sample to estimate population parameters and those estimates will be less biased the larger the sample is.

Don't worry! You will get the hang of all of this as we go...The key is to remember that Samples have Statistics and Populations have Parameters.

*Here, we will use the term "people" but it is not limited to that. If you are studying Redwood Trees in a certain forest, your population is the group of trees. It might also be fish, amoebas or anything else that you want to study!