| Hand in Monday
. Continue with shoebox numbers: on a separate sheet: If you haven't already, get a sample of size 4 from each of the two shoeboxes (in class, or outside my door.) (White from red-top box, Yellow from green box.): Bring Mon: A. For each of your samples of size n=4 from the two shoeboxes *(keep track of which box they came from!): test H0: µ=20 vs. Ha: µ > 20. Do it like this: --Find xbar (may have already). --Standardize your xbar, thus finding a z (assuming the population mean is 20, and the population s.d. is 4, so the s.d. for Xbar is 2) --Use the standard normal table to find the probability to the right of your z. (this is the "P-value" for your x-bar.) --Is your P-value smaller (less likely) than alpha = .10? (Y/N) If Yes, your result is "significant at the alpha = .10 level" --NOW Do you think the box has mean > 20? Be ready to add your results to the circulating sheets Monday. *Boxes outside my door, if you didn't get your samples in class or over the weekend. <>Beginning Ch. 15 p. 364, 15.1 Anemia Stating null and alternative hypotheses p. 366, 15.3 Anemia p. 366, 15.4 Student attitudes (15.2, and more, was done in class) p. 367, 15.6 travel time p. 367, 15.7 stating hypotheses - - - - - - - - - - - - Test statistic: xbar to z p. 368, 15.8, 15.9, 15.10 (same old examples) - - - - - - - - - - - - Calculating p-value (one-sided, mostly) p. 371, 15.12, 15.13, 15.14 (Same examples). Calculate by hand. p. 371, 15.11, Applet. Do the one given (two-sided), then check your answers for 15.12, 13, 14 (one-sided) using the Applet: P-value of a test of significance How? How to. Do the rest on a separate sheet and keep it: More Setups and Calculations. Use the Applet: P-value of a test of significance to check your work. Use Table A (normal table) to find P-value: p. 376, 15.18 Water quality p. 376 15.19 SAT Check the mean you calculate in the back of the book. |
Read, to discuss |
Optional (more practice)
|
Your shoebox
results: Write your xbars (one on each pad--yellow or
white) and make a dot for each on the circulating dotplot.
Exam 4's not finished.
Final exam: Thurs. Dec. 13, 9-12am. If this is
a
problem for you, please email me soon.
Alternative--Tueday Dec. 11 morning/afternoon?
Full exam schedule is at http://www.wells.edu/academic/dates.htm#exams
| Examples: | Ex1 | Ex2 | Ex3 |
Ex4 | final % | final -10 | |
| Student 1 | Original | 85 | 80 | 85 |
60 | 85 | 75, replaces lower 60 |
| Treated | 85 | 80 | 85 |
75 | 85 | <--ß These will be used. | |
| Student 2 | Original | 85 | 80 | 80 |
70 | 75 | 65, lower than 70, don't replace. |
| Treated | 85 | 80 | 80 |
70 | 75 | ||
| Student 3 | Original | 85 | 50 | 75 |
55 | 85 | 75, replaces lower 50 |
| Treated | 85 | 75 | 75 |
55 | 85 | <--ßThese will be used |
This is to encourage those who are nervous about Exam 4, and to
encourage all to try to put it
together for the final.
Homework questions? sample size: Day 35
Why do CI's work? CI's Day 35
"Statistics means
never having to say you're
certain."
Confidence interval Estimation made our best guess at an
unknown population mean.
Testing will investigate a claim made that the
unknown
mean is actually a particular value.
~~~~~~~~~~~~~~~~
Ch. 15: "Significance tests use
an elaborate
vocabulary, but the basic idea is simple: an outcome that would
"rarely" happen if a claim were true--is good evidence that the claim
is
NOT true." (p.363 top)
Suppose someone claims that the average height of Wells women over the
years is 70" (5'10"). I take samples (151 classes) every
year. This year my sample has mean 65.67" (n = 20ish). Standard
deviation for heights of women in population is supposed to be about
2.5" , so s.d. for means from samples of 20 is about 2.5/4.48= 0.56. IF
the real mean is 70", my sample is astonishingly unusual
(65.67-70)/0.56= -4.33 /0.56 = -7.73, 7.73 s.d's below the mean.
Conclude the
claim is Not true.
- - - - - - - - - - - - - - - - - - - - -
- - -
Extended Standard Normal Table--"Normal Tails"
(also from Weblinks page, )
z
P(Z <
z)
P(Z > z) = same in scientific notation: E-03 = 10-3
3.00
.9986501019683700
.0013498980316301 1.35E-03
4.00
.9999683287581670
.0000316712418331 3.17E-05
5.00
.9999997133484280
.0000002866515718 2.87E-07
6.00
.9999999990134120
.0000000009865877 9.87E-10
7.00
.9999999999987200
.0000000000012799 1.28E-12
8.00
.9999999999999990
.0000000000000007 6.66E-16 Below this, machine
can't compute.
If your assumptions lead you to a(n almost)
impossible
z value, question your assumptions!
(The basis of significance/hypothesis testing)
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
Need machinery to analyze less "obvious" results--build in effect
of
standard deviation (if s.d. were 10" would my sample still be
inconsistent with the claim?) and sample size (if n were only 4 would
that change my result?) .
Do 15.2 p. 365 Are older students like traditionals, or
higher (on average) on this measure?: Normal, s.d. =
30. Claim:
pop. mean = 115. n = 25. IF mean is really
115, Xbars are N(115, 6). Sketch!
xbar = 118.6 .
118.6-115=3.6. This is 3.6/6 = 0.6 s.d.'s above the mean, a
pretty typical kind of value.
xbar =
125.8 125.8-115=10.8. This is 10.8/6 = 1.8
s.d's
above the mean, high enough to be pretty unusual (how unusual?) if the
mean is
really 115.
xbar = 139
139-115=24. This
is 24/6 = 4 s.d.'s above the mean, unreasonably high if the
mean is really 115.
So 125.8 or (more so!) 139 would be evidence that the mean
for this group (older students) is NOT 115, is in fact higher.
Shoeboxes (white and
yellow
slips): Take a sample of size 4 from each,
record,
return numbers.
I claim the
mean value for both shoeboxes is µ = 20.
Am I telling you the truth? I can't remember for sure. I do
know that the distribution in the box is normal, standard
deviation
is 4.
I do remember that if µ
is not 20, then it is greater than 20. µ > 20.
Take a sample of size 4, find
xbar. Once for each shoebox! (should have found xbar
already)
How far from 20 is it?
far enough that I believe the mean is not 20??
Take data. Calculate test statistic,
usually based on one that estimates the parameter in the
hypotheses. For µ, test statistic is the z-score of xbar,
so a big z-score number means that xbar is far from µ.
Is it an unlikely
result if H0 is true? Then that is
evidence
against
H0.
Measuring the strength of the evidence against H0 (a
common measuring stick for all distributions and parameters):
P-value of
a test: The probability, computed assuming
that H0 is true, that the observed outcome would
take a value as extreme or more extreme than what we actually
observed
(if
we could repeat taking-data again). p. 368.
The smaller the P-value, the stronger the data's
evidence against H0 ( for Ha).
For a test of µ , using xbar (sigma
known),
the P-value is
--the area of the tail beyond the observed xbar, in the
direction of Ha (one tail)
(--or twice that area (two-tail).)
We usually calculate it by standardizing the observed xbar (assuming
H0 true) and looking in the normal table. (p. 369 on)
H0: µ =20 Ha:
µ > 20 How far from 20 is your xbar?
Find
z for xbar.
For xbar = 24, z = 2
Is this a far-out value of z? What is
the probability of being farther out, i.e. being in the tail beyond this z?
That's the P-value. P = .0228
Table A
<>Applet: P-value of a
test of significance automates this. (Uses "raw" scale of xbars, rather
than z-scores). Use as check, guide.
How to: At
top, put in H0 value, choose direction of Ha,
put in sample size n, and s.d. of the population sigma.
Do Update (Reset sends back to 'opening" values). The graph
and scale axis show distribution of x-bars assuming H0 is true.
Under the graph, put your "observed" x-bar value in the "I have data..." box
and do Show P. The P-value is the size of the tail, shown in gold.
For HW draw the picture and label the axes both in "raw" and in z-
values. Show direction(s) of the alternative. Mark xbar, z, and
shade the area which is P-value. (And do the calculations of course.)
Example (one sided): H0: µ =1000 hrs. (Average
lightbulb life.) Competing bulb: Show it's better.
Ha:
µ > 1000 hrs. (one-sided)
Sample of size n = 25. Population
sigma = 150 hrs. S.d. of xbars = 150/5
= 30.
Get xbar = 1075
hrs. Are these bulbs better than the "standard?"
z = (1075-1000) ÷ (150/5) = -75/30 = 2.5;
P(Z > 2.5) = .0062 =
P-value. More than 6 in a thousand
and less than 7 in a thousand. More crudely, Less than 1% chance of getting
a result this high if we did it again--if the real mean is 1000.
Example (one sided): H0:
µ =1000 hrs. (Average lightbulb
life.) Suspect company's cheating:
Show it's worse.
Ha:
µ < 1000 hrs.
Sample of size n = 25.
Population sigma = 150 hrs. S.d. of xbars = 150/5 = 30.
Get xbar = 940 hrs. Are these bulbs worse than claimed?
z = (940-1000)
÷ (150/5) = -60/30 = -2.
P(Z <
- 2) = .0228 = P-value
More than 2% and less than 3% chance of getting a result this low (below1000)
if we did it again--if the real mean is 1000.
Example (two sided):
H0: µ
=1000 hrs. (Average lightbulb life.)(Quality control on assembly line--find if it is "off" either
way.)
Ha:
µ Not = 1000 hrs. (two-sided)
Ha: "Alternative hypothesis" A claim
or statement about the population we are trying to find evidence FOR.
A value either much bigger than or much smaller than the H0
value is evidence against H0 & for Ha.
Sample of size n
= 25. Population sigma = 150 hrs. S.d. of xbars = 150/5 = 30.
Get xbar = 940
hrs. Is the quality control "off?"
z = (940-1000) ÷ (150/5) = -60/30
= - 2;
P(Z < - 2) = .0228
P-value (two sided): We measure the probability
of seeing something (again) as extreme as the observed value (or more
so).
So you need to measure the P-value symmetrically
both directions from the observed value--so the P value is double
what it would be for a one-sided test. P-value is approximately 5%; more precisely, 2·.0228
= .0456
So for a test of a mean, the P-value for one-sided is half
that for two sided, IF the result is in the direction of evidence for the alternative.
Review the above, and Continue
here Monday:
Start with understanding "null and alternative hypothesis, p-value."
Those are the foundation. Then
A "Significance level" alpha is a probability level
we
decide on in advance as being the "rarely" amount that
will
push us over into believing (well, sort of) that the H0
claim is not true. (Historically older
language
than P-value)
We tend to use simple benchmark numbers for it, like .10 (1 in 10),
.05 (1 in 20), .01 (1 in 100).
When the P-value is less than (or equal to) a particular
significance
level alpha (say .05), we say,
"The results are significant at the alpha = .05
level," or "The results are significant (P< .05)"
A particular scientific discipline may have a commonly accepted set
of benchmarks, and language to go with it. (I think I
remember
.05 = "significant", .01 = "highly significant" in psychology?)
We will be less doctrinaire, use the language "significant at the alpha
= ___ level."
(However, "nobody" uses a significance level less rare
than .10, 1 in 10).
| Sievers home | Math151-Fall07/Dayf37.htm | 2:30pm | 11/19/07 |