|
Hand in Friday. Remember, the P-value
applet can
be used to check any P-value computation. Complete all problems in Setups
and Calculations section, Days 37 and 38: |
Read, to discuss p. 391, 16.3 environment p. 407, 16.31 sampling at the mall |
Optional (more practice) p. 389, 16.1 TV poll p. 391, 16.2 red lights |
Exams returned,
discussed last time Discussion
,. Solutions.
Buffer
against
one low hour exam:
The final % exam grade minus 10 points will be substituted for the
lowest hour exam grade, if it is higher.
| Examples: | Ex1 | Ex2 | Ex3 |
Ex4 | final % | final -10 | |
| Student 1 | Original | 85 | 80 | 85 |
60 | 85 | 75, replaces lower 60 |
| Treated | 85 | 80 | 85 |
75 | 85 | <--ß These will be used. | |
| Student 2 | Original | 85 | 80 | 80 |
70 | 75 | 65, lower than 70, don't replace. |
| Treated | 85 | 80 | 80 |
70 | 75 | ||
| Student 3 | Original | 85 | 50 | 75 |
55 | 85 | 75, replaces lower 50 |
| Treated | 85 | 75 | 75 |
55 | 85 | <--ßThese will be used |
This is to encourage all to try to put it together for the (cumulative!) final.
Wed. Dec. 17, 9-12 a.m. If this is a problem for you, please email me very verysoon. Ch. 15: "Significance
tests use
an elaborate
vocabulary, but the basic idea is simple: an outcome that would
"rarely" happen if a claim were true--is good evidence that the claim
is
NOT true." (p.363 top)
I'm not making it up that this idea is important: Financial Times
(influential and high-end British newspaper) this winter:
(with formatting and pictures) (without)
Statistical Significance: #10 of "The Ten Things Everyone
Should Know About Science"
HW questions? Summarize and look at them in
sequence
Details Day
38
Continuing with Significance levels
and use of table C
Cautions (Ch 16)
(SRS, Normal pop. or Xbars, sigma known)
How small a P is convincing?
Statistical significance is not the same as
practical significance
Postpone this last? :YES
>>Multiple Tests: beware! pp.
395-6
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of
the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
(10%--one-ish-- of your 10 simulations of the shoebox with mean 20 will
give "significant" (P< .10) results even though the mean is
the null value of 20) (My results)
.Real
shoeboxes earlier, : only 1
falsely significant at .10... last term shoeboxes 3/16 falsely
significant. This term looks
like 1/14..
Moral: if you use the
testing mechanism as a screening instrument for many questions,
a proportion will give falsely significant results. You
can't
accept the results from such multiple tests as good evidence, only as
indicating
questions requiring further, more specific study. The game gives you
one
shot, not a hundred shots.
(This is becoming an important issue for developing new statistical
techniques, for instance in biology, where microarrays can do a
thousand tests at once.)
>>(Not in text) You
cannot legitimately test a hypothesis on the same data that first
suggested
that hypothesis. Every data set will turn up with some
unusual
pattern if you examine it hard enough.
(If you must explore and confirm
with the same data set, one way is to (randomly) take half the data
set,
explore and generate hypotheses; then use the other half for
confirmatory
tests. You can use P-value to describe unusualness, but
be
wary of making decisions with it if you didn't expect that particular
unusualness.)
>>All the warnings about
designing experiments and surveys still apply!
| Sievers home | Math151-F08/Dayf39.htm | 10:30pm | 12/5/08 |