| Hand in:
Sec. 6.3, (pp. 346-48 is new), and notes below p. 347 6.58 500 tests for psychic powers A. You have a theory that walls painted
pale pink will have a mellowing effect on elementary school students and
produce better grades. So you receive permission to repaint one classroom
from each grade at the local school over Christmas vacation (the others
stay as they were). Indeed, the students in the pink classrooms do
better on end-of-year tests. What criticism can be made of your experiment,
and how could it have been designed to avoid this?
p. 373, 7.4 CI t*
|
Optional
|
"Significance ", using table C, see Day
34
Homework questions?
"Significance
testing" vs. "Hypothesis testing"--gathering evidence vs. making
decisions.
= = = = = = = = = = = = = = = = = = = = = =
More cautions and limitations--Sec
6.3, cont'd (pp. 346-8)
>>(not in text)You
cannot legitimately test a hypothesis on the same data that first suggested
that hypothesis. Every data set will turn up with some unusual
pattern if you examine it hard enough. (If you must explore and confirm
with the same data set, one way is to (randomly) take half the data set,
explore and generate hypotheses; then use the other half for confirmatory
tests. You can use P-value to describe unusualness, but be
wary of making decisions with it if you didn't expect that particular unusualness.)
All the warnings about
designing experiments and surveys still apply.
>> (not in text) Another
common lurking variable is the Hawthorne effect: People tend
to respond positively when their environment is changed in a way they know
is supposed to be "better," especially if they know they're being studied.
(Get half-page handout.)
(Prospective teachers, keep this in mind as the fads blow in and out.)
>>Multiple Tests: beware! pp.
346-7
If you do 100
tests and use the alpha = .05 significance level for each, then
the structure of testing requires this:
When all 100 null
hypotheses H0 are true, out of your 100, about 5 of the
100 (.05) will give "significant" results by chance alone (falsely
indicating the alternative hypothesis is to be preferred.)
Moral: if you use the
testing mechanism as a screening instrument for many questions,
a proportion will give falsely significant results. You can't
accept the results from such multiple tests as good evidence, only as indicating
questions requiring further, more specific study. The game gives you one
shot, not a hundred.
= = = = = = = = = = = = = = = = = = = = =
Start here Wed.
Chapter 7, Inference for Distributions (we'll
do 7.1, 7.2, maybe the first segment, to p. 414, of 7.3)
Inference for means, using xbar from a SRS to make inference about µ:
|
Sigma known Sigma unknown |
|
|||
|
normal
Population is
not normal
|
Xbar is normal;
find z using sigma |
Xbar is normal;
find z using s. |
Xbar is normal;
find z using sigma |
Xbar is normal;
Find t using s |
| Xbar is normal-ish (CLTh);
find z using sigma |
Xbar is normal-ish (CLTh);
find z using s |
Unrealistic. sigma's
only "good" for normal pop's. |
(See p. 381)
If you can't use t, Find a statistician |
|
t-distribution
family: like standard normal only slightly fatter in the tails.
Mean = 0. Symmetrical around 0.
"Degrees of freedom" tell which member of
the t family.
t(k) is the t distribution
with k degrees of freedom.
Comparison with normal (Excel
file)
Lower d.f.--fatter tails. Higher d.f.--more
like standard normal.
Table C: upper tail: probability
<--> "critical" t-value.
Start working on green box:
Assume Normal population . Mean µ, s.d. sigma, both unknown.
Take SRS, size n, find xbar, find s (sample standard dev.)
Standard error of the (sample) mean =
Standard deviation of xbar, estimated from the data.
"Standard
error of the mean": s/sqrt(n) SEM, SEXbar,
etc.
Just
like sigma/sqrt(n), only s from data replaces sigma.
When you estimate the standard
deviation of a statistic,
the resulting estimate is called the "standard error" of the
statistic.
Standardizing xbar with s instead of sigma results in
the one-sample t statistic
which has the t-distribution with n-1degrees
of freedom.
We'll now repeat all the stuff from Chapter 6, only wherever there was
a z, we'll substitute a t.
Here we go....
"One-sample"
t- procedures:
SRS
of size n. Use Xbar
to estimate µ.
Substitute s for sigma in the standardizing
formula. We get t instead of z, with n-1 degrees of freedom.
It's a good idea to check
for at least approximate normality in the data set.
Confidence intervals:
Choose t*
from table C, using the n-1
row,
and confidence level C.
Special case of common
pattern: estimate + t* SEestimate
Significance tests:
State hypotheses
as in Ch. 6, find
t from data, by:
Calculating the one-sample
t-statistic, using the null hypothesis value of µ (call it
µ0)
Then proceed
as if it were a "z", only using the (n-1)
d.f.
row in table C,
to find P-values for the t*'s it's between,
write
"P-value is between ___ and___".
(Or use software which will find P-value exactly.
)
Example: bacteria per milliliter in 10
specimens of raw milk from one producer.
Parameter: actual mean bacteria/ml.
5370, 4890,
5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
| 4|5
4|77 4|889 5|11 5|23 |
n = 10,
xbar = 4950,
s = 268.45 SEM = 268.45/sqrt(10) =268.45/3.162=84.89. deg. of freedom = 9 90% CI: from t(9) in table, t* = 1.833 CI is 4950+1.833x268.45/sqrt(10) 4950 +1.833x84.89, or 4950+155.6 bacteria/ml. If we had KNOWN Population sigma = 268.45, we'd have used z* = 1.645, gotten a narrower CI. (but we don't know sigma!) Test: H0 : µ
= 4800
t = (4950 - 4800)/SEM
= 150/84.89 =
1.767
|
SPSS--Get
Handout
for 7.1 Next time. Type in above data and find P-value
and CI. (Dataset)
Analyze>Compare Means>One-Sample T Test:
Test value = 4800.
P-value
is labeled "Sig (2-tailed)"--divide by 2 for 1 tail (if observed is in
correct direction)
Analyze>Descriptive Statistics>Explore. Statistics
button, set Confidence level.
| Sievers home | Math151-Sp04/Days37.htm | 3pm | 5/3/04 |