Quiz: better. If you got a B+ or lower, you may try a third time (max grade A--), Monday before or after class, or at 12:30. Returned quizzes are in HW folder outside my door.
Significance Testing, cont'd.
2-sided test: We measure the probability
of seeing something (again) as extreme as the observed value
(or more so).
So you need to measure the P-value symmetrically
both directions from the observed value--so the P value is double what
it would be for a one-sided test.
#6.35, p. 333 Engine crankshafts: We want to stop the process
and fix it if the mean gets too far "off" from 224--either direction would
be bad. So two-sided. sigma = 0.060
mm. n = 16. Std. dev. of xbar = 0.060/4 = 0.015
H0 : mu= 224 mm
Ha : mu Not = 224 mm
xbar = 224.0019375 (sample
standard deviation = .0618)
Standardizing: z = (224.0019375 - 224)/
.015 = .0019375/.015 = 0.12917 ~ .13 (xbar is clearly close to
mu)
(If you used .0618, not the .06
you were supposed to, you would get .1254--still rounds to .13)
Farther out than .13 to the right has probability
(1- .5517) = .4483.
Farther out than -.13 (symmetrical) to the left
also has probability .4483.
So P-value, 2-sided, = .4483 + .4483 =
.8966
Results of shoebox samples.
Questions on HW
Sec 6.3, cont'd: cautions and
limitations: pp. 345-348
>>Data must be from SRS or reasonable
facsimile
All the other
warnings
p.
312: normality, watch out for outliers, skewness. Sigma known
or n large.
>>Multiple Tests: beware!
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
Moral: if you use the
testing mechanism as a screening instrument for many questions, a proportion
will give falsely significant results. You can't accept the results
from such multiple tests as good evidence, only as indicating questions
requiring further, more specific study. The game give you one shot, not
a hundred.
"Significance
testing" vs. "Hypothesis
testing"-- two different approaches that
blur...
Both start with null and alternative hypotheses.
You want to show the alternative is true.
Significance testing:
Calculate P-value (or closest alpha), describe
how unusual your result is if H0
is true.
Let the audience for your work decide if they
believe in the alternative hypothesis or not.
Language: "strong evidence for
Ha, against H0"
or not strong...
Hypothesis testing:
Make a decision
between H0 and Ha (often associated with predetermined
fixed alpha level)
We need to do something.
Language: "Accept
Ha, reject H0" if
P-value smaller than alpha.
What
if we can't reject H0? Do we accept H0?
Safer: "fail to reject H0"
H0
"Innocent"
"Guilty" Ha
\ "Not Proven" / but
defendant goes free...
If we make a decision
we run the risk of error:
Type I error, Accepting
alternative Ha when null H0 is true
(probability = alpha) Test designed to focus on this one.
Type II error, Accepting null H0
when alternative Ha is true (probability
= beta, depends on what exact parameter value in Ha is
true) Can't make this one if we refuse to commit, but
A small Type II error means the power
of the test to detect the alternative hypothesis is high.
(Sec. 6.4, optional, takes this further)
HW: Reread Ch. 6, bring questions. If no questions, I'll start Chapter 7 Monday.
| From Day 32--Hand in
Monday:
More p-values p.341, 6.44 CEO pay = = = = = = = = = Table C: p.341, 6.48 CEO pay again p. 341, 6.46, 6.49 general z statistic, significance,Turn the page--6.49 continues. p. 342 6.50 patent protection; another z. = = = = = = = = = = Fixed significance levels: if you only have table C, what can you say? p. 337, 6.37 testing number generator 6.38 nicotine content = = = = = = = = = = p. 342, 6.52 1% vs 5% 6.53 define stat. signif. p. 343, 6.54 knife edge .05 p. 345, 6.55 and 56 effect of n |
Read, to discuss | Optional |
| Bring questions on Ch. 6, (or Ch. 4).
Hand in: Sec. 6.3, (pp. 344-48 is new) p. 346 6.57 test ok? p.348 6.61 strong vs. signif. p. 347 6.58 500 tests for psychic powers 6.59 what is significance good for? 6.60 radar detectors 6.61 77 potential schizophrenia markers Review of ch. 6 p. 339 6.40 job satisfaction, 2 sided p. 360 6.74 wine--stemplot, CI , test. Notice "less sensitive" noses will have higher thresholds. p. 362, 6.79 a,b effect of sample size 6.83 Train Welfare mothers This kind of study was the basis (plus conservative philosophy) for our present "welfare reform." |
Read, to discuss | Optional
Review: p. 360, 6.75 Optional Sec. 6.2 Two-sided test is doable using confidence interval (pp. 337-9) 6.39 IQ tests Use your calculator to get the sample mean |
| Sievers home | Math151-Sp01/Day33.htm | 10:30 am | 4/20/01 |