| Hand in:
With significance levels (Sec.
6.2)
6.104 (p. 443) Plot n on the x-axis and
z on the y-axis. Plot n on the x-axis and P-value on the y-axis.
6.67 (one-sided), 6.66 (two-sided) Table D
6.47 CI <==> sig
|
Read, discuss
6.63c sig. 6.58, 9 Applet exploration of xbars and alphas 6.49"significant"
6.82 n and P: answers are .3821, .1711, .0013. 6.84 P, alpha
|
Optional
6.46 CI<==>sig
|
Questions on Testing HW? Day 31
Quiz returned. Almost everyone did almost everything correct, EXCEPT: standard deviation of X - Y! Which was on the last quiz too! Once more with feeling: If X and Y are independent, sigma2X-Y= sigma2X +sigma2Y (Day 23, bottom, IPS p. 302. Example Ann and Betty Day 28. )
Significance tests use
an elaborate vocabulary, but the basic idea is simple: a result that would
"rarely" happen if a claim were true--is good evidence that the claim is
NOT true. Notes Day 30
Look at the results from the shoeboxes:
Notice that there is always a chance of getting a "somewhat
unusual" result from a population where the null hypothesis is true.
And if the actual mean is not extremely different from the null, a result
may not be detectably different from the null-hypothesis results.
A "Significance level" alpha
is a probability
level we decide on in advance as being the "rarely" amount
that will push us over into believing (well, sort of) that the H0
claim is not true.
Simple benchmarks: .10 (1 in 10), .05 (1 in 20), .01 (1 in 100).
When the P-value is less than (or equal to) a particular significance
level alpha (say .05), we say,
"The results are significant at the alpha = .05
level," or "The results are significant (P < .05)" , or "Reject the
null hypothesis at level alpha = .05"
Day
31
What if you don't have the Z-table but
only have the t-table (Table D)?
What if you have a demanded level of significance,
alpha?
Table D: a limited
list of probabilities across the top row:
= Right tail values for the bell curve distribution.
The
value in the bottom (z*) row under p is the corresponding standard
normal value.
"z*
is the upper p critical value of the standard normal distribution."
Do this: Find your z from
the data. Make a sketch of the normal curve and mark z on it. Mark
the direction(s) of Ha.
(If your z is in the direction
of Ha , continue. Otherwise the results are hopelessly
not significant: you can quit.)
Find the two z*'s in Table D that bracket your
z
(ignore minus sign). Find the corresponding
p's.
e.g. z =2.111
p
.02 .01
z* 2.054 \/
2.326
z = 2.111So the P-value for
your z is: between those 2 p's (one sided test)
between double those 2 p's (two sided test)
(Some versions of the table add another top line, for two-sided tests:
Double the one-sided values)
Test is significant at the
bigger bracketing probability; not sig. at the smaller.
One sided: P-value
is less than .02 and greater than .01
Significant at the .02 level,not
at the .01 level
Two sided: P-value
is less than .04 and greater than .02
Significant at the .04 level,not
at the .02 level
If you have a specific demanded significance
level, compare it with these levels.
If a test is significant at level b, then it is significant
at every level bigger than b.
If a test is Not significant at level d, then it is Not significant
at every level smaller than d.
"Significant at a":
probability of getting my results (again) by chance (if H0 is
true) is less than (or =) a.
Results
Significant at
Not significant at
p bigger
.10 .05
.01 .005 .001 smaller
/\
P-value
z-value (one-sided)
z* smaller
1.282 1.645 |
2.326 2.576 3.091 bigger
You
can compare z directly to z* for your desired alpha. The 2-sided is a bit
tricky.
(2-sided: Split the alpha in 2, then find the z*. Don't
halve or double z's--it doesn't work!)
CI's and Two-sided
tests (pp. 413-14):
Your 95% CI doesn't
include µo <==> Reject Ho =
µo at the alpha = .05
level (Seems like common sense.)
Start here Friday
Sec. 6.3
>>Don't do inference on data that doesn't
look like probability-model data (All that bias, design flaws stuff
was for this!) and check the data for weirdness (Ch. 1)
>>(Not in text any more?) How small a P is "convincing
evidence" against H0?
In practice...beyond
the formal testing.
How
plausible is H0? Ha? Strong evidence
needed to reject "conventional wisdom."
How
expensive (mentally, economically) will abandoning H0 be?
(May need more than one set of data;
replicate, recast, refine.)
>> In reality,
no sharp border between "significance"
and "not significant"
>>"Statistically Significant" doesn't always
mean "Important." (e.g. medicine: "Clinically significant.") Big
enough sample sizes will allow you to distinguish even small
differences.
>> Lack of significance--doesn't prove
H0 true. Best: "data are consistent with (not inconsistent
with) H0 "
>>You cannot legitimately test a hypothesis
on the same data that first suggested that hypothesis. Every
data set will turn up with some unusual pattern if you examine it hard
enough.
(If you
must explore and confirm with the same data set, one way is to (randomly)
take half the data set, explore and generate hypotheses; then use the other
half for confirmatory tests. You can use P-value to describe
unusualness, but be wary of making decisions with it if you didn't expect
that particular unusualness.)
>>Multiple Tests: beware!
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
Moral: if you use the
testing mechanism as a screening instrument for many questions,
a proportion will give falsely significant results. You can't
accept the results from such multiple tests as good evidence, only as indicating
questions requiring further, more specific study. The game gives you one
shot, not a hundred shots.
- - - - - - - - - - -
Statistical inference in
a nutshell:
Am I surprised (If Hois
true)? (Do I reject null?)
How surprised? (give P-value)
What would not surprise me? (confidence interval--estimate
the actual value)
(IPS: Testing is over-used, Confidence interval estimation under-used)
Next: Brief look at issues of 6.4; then Ch. 7
| Sievers home | Math251-Fall05/Dayps32.htm | 2pm | 11/09/05 |