Preamble
You may remember the original Karate Kid movie where a wise and aged Mr. Miyagi takes a young and unseasoned Daniel-san under his wing to teach him karate!Mr. Miyagi puts Daniel-san to work sanding the floor, painting the fence, painting his house and waxing his cars. When Daniel-san starts to complain that he has learned nothing about karate, that he is only Mr. Miyagi's slave, Miyagi unleashes a barrage of karate attacks against which Daniel-san is able to defend himself using the motions from the chores he has been doing around Miyagi's house.
It is done in a way that can only exist in Hollywood. Still, it is hoped that in this learning module (if you have been diligent in the previous modules) you will have an experience something like the "sand the floor, paint the fence" scene from Karate Kid.
It might even be a good idea to take a 3 minute break and watch the clip from your favorite online video source! Come back when you feel inspired, or continue if you already feel inspired.
What is hypothesis testing?
Over the next few weeks, we will learn about hypothesis testing. This is intended to be a very short introduction to just the basics. More detail will come later. Just focus on these key terms and ideas and then let it set up over the Fall Break!
To begin, Daniel-san learned to sand the floor, paint the fence, wax on wax off, etc. What are the equivalent tools that you have in your arsenal?
- General knowledge about Z Scores
- Finding Z Scores for a sample mean in a theoretical distribution (just like Z Scores for observations in a sample except we divide by STANDARD ERROR instead of standard deviation!)
- Finding confidence intervals (finding two Z Scores that contain a certain percent of the distribution)
You are ready to go!
The main idea of hypothesis testing is that you can use these tools with your data to make a point, but we have to play a little game rooted in philosophy. (Yes, many statistical concepts originate in philosophy).
Here's an example:
A local organization claims that children are spending 3 hours per day on screentime. This seems a little high to you, so you want to test it.
So, here is the game we play (first in English, then in Stats Language):
ENGLISH:
Ok...you think children spend 3 hours per day on screentime? I challenge this assertion! I challenge the status quo.
So, you say screentime is 3 hours per day.
I don't think it is.
I am willing to risk a 5% chance that I am wrong but I want a 95% chance that I am right.
I took a sample of 300 children and the average is only 2.3 hours. The standard deviation (not standard error!) of the sample is 1.8 hours. Let's compute a Z Score for my sample average to see how likely it is that to get a sample mean of 2.3 hours. If fewer than 5% of sample means are this far away from the proposed 3 hour number, I am more than 95% confident that I am right and I am willing to reject the 3 hour number!
STOP AND THINK ABOUT ALL OF THAT UNTIL YOU FEEL CONFIDENT ABOUT IT IN ENGLISH. THEN MOVE ON TO THE STATS LANGUAGE INTERPRETATION:
STATS LANGUAGE:
H0: μ = 3 hours
H1: μ ≠ 3 hours
α =.05
Xbar = 2.3 hours
s = 1.8 hours
That's it! And this is hypothesis testing!
Hopefully you just had a "Miyagi moment", but if not, stop and think about it for a moment.
We are simply using what we know about distributions and Z Scores to put some assertion on trial!
H0 means "this is the thing on trial". It is usually the status quo because it is something we want to debunk! NOTE! NOTE! NOTE! It contains a statement of equality!
H0 is called the "Null hypothesis" and null means "nothing". It is the statement of "no difference" and no difference means equal!
It is the statement of what is! (Just remember you are putting this statement on trial and you have to know what something is in order to put it on trial).
Just remember this cartoon. You have to know what something is in order to put in on trial! H0 is on trial and must contain an "equal" statement. Usually, the population mean is (equals) some number. OR is equal or less than (more than) some number.
H1 is the evidence against H0! The prosecution, if you will! You have to find enough evidence that H1 is not like H0 in order to convict it.
α is out tolerance for making a mistake. You know how in court, one is innocent until proven guilty? This is to avoid calling an innocent person guilty! Mistakes happen though, but we can focus on avoiding the mistake we do not want to make.
There are two kinds of mistakes:
- Calling a guilty person innocent
- Calling an innocent person guilty
In court, we want to avoid calling an innocent person guilty, so we avoid it in the way trials are set up: You only give a guilty verdict if you are sure beyond a reasonable doubt.
In statistics we have two kinds of mistakes:
- Rejecting H0 when it really is correct
- Failing to reject H0 when it is not correct
In statistics, we want to avoid the first one. It is called Type I error. It goes with our alpha level (α)!
So, when we said α=.05, it actually means, we will only reject H0 if we are satisfied that there is only a 5% chance or less that we are rejecting the true statement. We want to be 95% confident that we have not made a mistake!
So, in stats, we want to avoid the Type I error.
This means that sometimes we fail to reject H0 when we should have. That is why our wording must always be like this (repeat after me):
We reject H0
OR
We fail to reject H0.
There is no other conclusion in hypothesis testing.
There is no other conclusion in hypothesis testing.
And, finally, there is no other conclusion in hypothesis testing.
In court, there is only "guilty" or "not guilty".
In hypothesis testing, there is only "reject H0" or "fail to reject H0".
There is no "innocent" verdict in court and there is no "accept H0" verdict in hypothesis testing.
Example
Because examples are an effective way to learn statistics, we will end this shorter week with an example. Next week, we will resume and get into greater detail.
This example shows how to do a hypothesis test. We will use the numbers from the screentime example.
H0: μ = 3 hours
H1: μ ≠ 3 hours
α =.05
Xbar = 2.3 hours
s = 1.8 hours
Standard Error=1.8/sq root of 300
1.8/17.32=.10
Test: (Xbar-μ)/S.E.
(2.3-3.0)/.10 = -.7/.10 = -7.0
What is -7.0? Think of it as a Z Score (it is technically a "t" score, but more on that next week).
If you use the Math Is Fun Z Score tool, you will see that -7.0 is literally off the charts! It is out there somewhere, the tail of the distribution has just thinned out so much, there is no need to include it in most charts. In fact, only 0.13% of sample means are expected to fall below a Z Score of -3.0. So, even with a Z of -3.0 we would have been 99.87% confident that we would not see a sample mean that low with a true mean of 3.0 hours.
With a Z of -7.0, we are greater than 99.999999% confident that we would not see a sample mean as low as 2.3 hours if the true mean is really 3.0 hours. Look how thin the tail is at that point. It almost looks like 0 to the naked eye, and it basically is 0 it is so small (smaller than 0.000001).
So, what do we conclude?
We reject the null hypothesis (H0) that μ = 3 hours, in support of the alternative (H1) that the real mean is not equal to 3 hours!
We call our risk of making a Type I error, alpha.
We call the probability that we have made that error our p value. (p for probability of the Type I error).
So, with a Z Score of -7.0 we have a p value less than 0.000001%. It is less than the alpha (tolerance) level of 5%.
How do we remember this? With a small poem:
"When p is low, reject the null!
When p is low, reject H0!"
No comments:
Post a Comment