| Hand in Wednesday . p. 393, 16.5 Is it significant? p. 394-5 16.6&7 Acid rain. Do them by hand, and on the Applet. You should, of course, get the same answers both ways. p. 395, 16.8 Acid rain, Confidence intervals. (there are only 3, not 6, since #6 and #7 are the "same" problem) p. 396 16.9 rich parents and education Postpone p. 397, 16.10 searching for ESP p. 407. 16.32 evidence, pacemakers p. 407, 16.34 a, b. larger samples p. 407, 16.35 significance is good for... Postpone p. 408 16.40 success of trainees Postpone p. 408 16.41 schizophrenia markers p. 409 16.43 comparing package designs (What did they not tell us that we would want to know?) p. 409, 16.44 island life (correlation coefficient) p. 409, 16.45 helping welfare mothers Postpone & & & Leftover problems from Day 30 & & & & & & & & These ideas are related to those in Ch. 15. p. 290, 11.39 Pollutants in auto exhausts For 11.39: You might want to know L so that if you tested your 25 cars and found a high value of x-bar, you would be able to compare it with L; if it was greater than L, you would go back to the manufacturer and say "I believe you sold me a batch of bad cars, because the chances of getting an average emission level this high if the exhaust system is working properly is only 1 in 100. It is more reasonable to believe the exhaust system is not working, than that we "are" that 1 in 100 possibility." p. 290, 11.38 Glucose testing If we use this cutoff level L to say that people (with a mean of 4 tests) over L "have diabetes", then the chances of declaring that someone "has diabetes" when they really are OK (with mean 125mg/dl) is .05. .05 or 5% is the chance of a "false positive" using this protocol, when the real mean is 125. & & & & & & & & & & & & & & & & & & |
Read, to discuss |
Optional (more practice) |
Exams returned.
Comments
Buffer
against
one low hour exam:
The final % exam grade minus 10 points will be substituted for the
lowest hour exam grade, if it is higher.
| Examples: | Ex1 | Ex2 | Ex3 |
Ex4 | final % | final -10 | |
| Student 1 | Original | 85 | 80 | 85 |
60 | 85 | 75, replaces lower 60 |
| Treated | 85 | 80 | 85 |
75 | 85 | <--ß These will be used. | |
| Student 2 | Original | 85 | 80 | 80 |
70 | 75 | 65, lower than 70, don't replace. |
| Treated | 85 | 80 | 80 |
70 | 75 | ||
| Student 3 | Original | 85 | 50 | 75 |
55 | 85 | 75, replaces lower 50 |
| Treated | 85 | 75 | 75 |
55 | 85 | <--ßThese will be used |
This is to encourage all to try to put it together for the (cumulative!) final.
Effect of sample size on distribution of x-bars: NormalandXbar.xls Ch. 15: "Significance
tests use
an elaborate
vocabulary, but the basic idea is simple: an outcome that would
"rarely" happen if a claim were true--is good evidence that the claim
is
NOT true." (p.363 top)
HW questions? Day
37
Day 34 and Day37
for other details.
Summary, comments:
Take data. Calculate test statistic. For
µ, test statistic is the z-score of xbar. (Start with xbar,
standardize using mean of H0)
Is it an unlikely
result if H0 is true? Then that is
evidence
against
H0.
Measuring the strength of the evidence against H0 (a
common measuring stick for all distributions and parameters):
P-value of
a test: The probability, computed assuming
that H0 is true, that the observed outcome would
take a value as extreme or more extreme than that actually observed
(if
we could repeat taking-data again). p. 368.
The smaller the P-value, the stronger the data's
evidence against H0 ( for Ha).
For a test of µ , using xbar (sigma
known),
the P-value is
--the area of the tail beyond the observed xbar, in
the
direction of Ha (one-sided)
(--or twice that area (two-sided).)
<>Applet: P-value
of a
test of significance automates this. (Uses "raw" scale of
xbars, rather than z-scores).
So for a test of a mean, the P-value for one-sided is half
that for two sided, IF the result is in the direction of evidence for
the alternative.
A "Significance level" alpha is a probability level
we
decide on in advance as being the "rarely" amount that
will
push us over into believing (well, sort of) that the H0
claim is not true. (Historically older
language
than P-value. Appropriate levels vary by discipline.)
We tend to use simple benchmark numbers for it, like .10 (1 in 10),
.05 (1 in 20), .01 (1 in 100).
When the P-value is less than (or equal to) a particular
significance
level alpha (say .05), we say,
"The results are significant at the alpha = .05
level," or "The results are significant (P< .05)"
. Giving actual P is better, if you can.
Applet: Applet:
Statistical Significance
You can pick the alpha you desire, and see if your x-bar lies outside
the "alpha" barrier(s). (approach of p. 376-79) But P-value is more
informative.
What if you don't
have the Z-table but only have the t-table (Table C)?
What if you have a demanded level of
significance,
alpha?
Table C: a
limited
list of probabilities across the bottom rows:
= Tail values for the bell curve distribution.
(one sided = one tail, two sided = two symmetrical tails)
The
value in the z* row above P is the corresponding
standard
normal value ("critical value").
Check z* = 1.960, .025 above it (or below -1.960). .05 farther
out than it. Corresponds to Table A.
Do this: Find your z from
the data. Make a sketch of the normal curve and mark your z on
it.
Mark
the direction(s) of Ha.
(If your z is in the
direction(s)
of Ha, continue. Otherwise the results are hopelessly
not significant: you can quit.)
Find the two z*'s in Table C that bracket your
z
(ignore minus sign). Find the
corresponding P's.
e.g. z =2.111
z = 2.111
z* 2.054 \/
2.326
One-sided
P ...
.02 .01
Two-sided P ... .04 .02
So the P-value for your z is: between .02 and .01
(If it's a one sided test)
&
between double those 2 p's--between .04 and .02 (If it's a two
sided test)
Test is significant at the
bigger bracketing probability; not sig. at the smaller.
If you have a specific
demanded
significance
level, compare it with these levels.
If a test is significant at level b, then it is
significant
at every level bigger than b.
If a test is Not significant at level d, then it is Not
significant
at every level smaller than d.
"Significant at a":
probability of getting my results (again) by chance (if H0
is
true) is less than (or =) a.
My result is less common than a.
-
- - - - - - NEW STUFF- - - - - -
back to Ch. 17, cautions:
>>How small a P
is convincing evidence against H0? (What alpha, to
"Reject H0?)
--Is Ha surprising?
(Entrenched opinion is "for" H0 . ) Need
strong evidence (small P)
--Is rejecting H0 expensive? Need
strong evidence for Ha.
[May need to repeat
experiment for doubters]
No sharp border between "significant" and "not"--though decisions may
need to be made.
>>Statistical
significance is not the same as practical significance ("clinical
significance")
Tiny difference can be statistically
significant if sample size is large.
Big difference may not be
statistically significant if sample size is too small.
Do confidence intervals: Estimate the size of the
effect, not just yes/no of test..
Start here Wednesday
>>Multiple Tests: beware! pp.
395-6
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of
the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
Moral: if you use the
testing mechanism as a screening instrument for many questions,
a proportion will give falsely significant results. You
can't
accept the results from such multiple tests as good evidence, only as
indicating
questions requiring further, more specific study. The game gives you
one
shot, not a hundred shots.
(This is becoming an important issue for developing new statistical
techniques, for instance in biology, where microarrays can do a
thousand tests at once.)
(not in text)You
cannot legitimately test a hypothesis on the same data that first
suggested
that hypothesis. Every data set will turn up with some
unusual
pattern if you examine it hard enough.
(If you must explore and confirm
with the same data set, one way is to (randomly) take half the data
set,
explore and generate hypotheses; then use the other half for
confirmatory
tests. You can use P-value to describe unusualness, but
be
wary of making decisions with it if you didn't expect that particular
unusualness.)
>>All the warnings about
designing experiments and surveys still apply.
& & & & & & & & &
Today Look
back at
11.38, p. 297. "backward
normal" problem. From a proportion/probability, find a z*,
from that a raw value (here an x-bar). We can think of this as a
significance testing question. n = 4, sigma = 10 mg/dl.
H0:
µ =125mg/dl (Sheila is normal), Ha:
µ > 125 (Sheila has gestational diabetes.)
Find the L so that only .05 of
random samples of 4 tests would have mean above L, among
people(Sheila) whose real mean is 125.
L is the "cutoff" for doing an alpha =
.05 test. 5% of "healthy" people will be diagnosed diabetic
(false positive).
Doctors like a "decision
making rule", want an alpha cutoff to apply, rather than
calculating a P-value for each indivual's set of 4 tests..
Note that table C gives us another way to get z*'s for some
probabilities! Bottom row, "one sided P". The table is set
up to go from "tail" probability to z*, without having to calculate
"probability to the left."
| Sievers home | Math151-Fall06/Daym38.htm | 2pm | 11/27/06 |