Questions on HW: significance
levels
= = = = = = = = = = = = = = = = = = = = = =
"Significance
testing" vs. "Hypothesis testing"--gathering evidence vs.
making decisions.
Begin here Monday
Sec 6.3, cont'd: cautions and
limitations: pp. 345-348
>>Data must be from SRS or reasonable
facsimile
All the other
warnings
p.
312: normality, watch out for outliers, skewness. Sigma known
or n large.
>>Multiple Tests: beware!
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
Moral: if you use the
testing mechanism as a screening instrument for many questions, a proportion
will give falsely significant results. You can't accept the
results from such multiple tests as good evidence, only as indicating questions
requiring further, more specific study. The game gives you one shot, not
a hundred.
Add these:
>>You cannot legitimately test a hypothesis
on the same data that first suggested that hypothesis. Every
data
set will turn up with some unusual pattern if
you examine it hard enough. (If you must explore and confirm with
the same data
set, one way is to (randomly) take half the data
set, explore and generate hypotheses; then use the other half for confirmatory
tests. You can use P-value to describe
unusualness, but be wary of making decisions with it.)
>> All the warnings about designing experiments
and surveys still apply. Another common lurking variable is the Hawthorne
effect: People tend to respond positively when their environment
is changed in a way they know is supposed to be "better," especially if
they know they're being studied. (Get
half-page handout.) (Prospective teachers,
keep this in mind as the fads blow in and out.)
= = = = = = = = = = = = = = = = = = = = =
Chapter 7, Inference for Distributions (we'll
do 7.1, 7.2, and the first segment, to p. 414, of 7.3)
Inference for means, using xbar from a SRS:
|
Sigma known Sigma unknown |
|
|||
|
normal
Population is
not normal
|
Xbar is normal;
find z using sigma |
Xbar is normal;
find z using s. |
Xbar is normal;
find z using sigma |
Xbar is normal;
Find t using s |
| Xbar is normal-ish (CLTh);
find z using sigma |
Xbar is normal-ish (CLTh);
find z using s |
Unrealistic | (See p. 381)
If you can't use t, Find a statistician |
|
t-distribution
family: like standard normal only slightly fatter in the tails.
Mean = 0. Symmetrical around 0.
"Degrees of freedom" tell which member of
the t family. t(k) is the t distribution with k degrees of
freedom.
Lower d.f.--fatter tails. Higher d.f.--more
like standard normal.
Table C: upper tail: probability
<--> "critical" t-value.
Start working on green box:
Assume Normal population . Mean µ, s.d. sigma, both unknown.
Take SRS, size n, find xbar, find s (sample standard dev.)
"Standard error of the (sample) mean" = s/sqrt(n) Standard deviation of xbar, estimated from the data.
Standardizing xbar with s instead of sigma results in
t = xbar -µ
s/sqrt(n)
the one-sample t statistic
which has the t-distribution with n-1 degrees of freedom.
We'll now repeat all the stuff from Chapter 6, only wherever there was
a z, we'll substitute a t.
| t-distribution procedures:
Activstats Ch's 18 (CI's) pp 1,2,3 and 20 (tests)pp 1,2. For next, pp 1, 2 of each, or read Moore carefully - p. 18-1 Activities 1 and 2 introduce "Standard Error", review CI's from normal table, large n, s substituted for sigma Activity 3 shows how using s instead of sigma (with n=15) gives CI lengths that vary from sample to sample. - p. 18-2 Activities 1 and 2 introduces t-distribution and a CI with it. Activity 3 shows a t-table, like ours (See note--ours is easier. Activity 4 stresses assumptions. - p. 20-1 Activities 1 and 2 introduce t-test, analyzing the data (same data as Moore p. 371, Eg. 7.2) (Activity 3, SPSS--we'll come back to that) Activity 4 is self-test. - p. 20-2: Activity 1 using t-tables, repeating sweetness data (Moore p. 371) Activity 2, repeats conditions for t test, same as for CI p. 18-2 activity 4. Activity 3, choosing a test. Cf. my webpage with the chart. Moore, read about Gosset, p. 364, and Sec 7.1, at least thru p. 374. We'll start by doing some by hand, then turn the computation over to SPSS |
| Hand in
Review of ch. 6--these review material that may be on Exam 3: p. 339 6.40 job satisfaction, 2 sided p. 360 6.74 wine--stemplot, CI , test. Notice "less sensitive" noses will have higher thresholds. p. 362, 6.79 a,b effect of sample size 6.83 Train Welfare mothers This kind of study was the basis (plus conservative philosophy) for our present "welfare reform." - - - - - - - - - - - - - - - - - - - - - Will be assigned with Day 37 Sec. 6.3, (pp. 344-48 is new), and above notes p. 346 6.57 test ok? p.348 6.61 strong vs. signif. p. 347 6.58 500 tests for psychic powers 6.59 what is significance good for? 6.60 radar detectors 6.61 77 potential schizophrenia markers A. You have a theory that walls painted pale pink will have a mellowing effect on elementary school students and produce better grades. So you receive permission to repaint one classroom from each grade at the local school over Christmas vacation (the others stay as they were). Indeed, the students in the pink classrooms do better on end-of-year tests. What criticism can be made of your experiment, and how could it have been designed to avoid this? = = = = = = = = = = = = = = = = = Ch. 7, Sec. 7.1 "Standard error" &
t-distribution family
|
Optional
Review: p. 360, 6.75
|
| Sievers home | Math151-Sp02/Day35.htm | 3pm | 4/24/02 |