| Hand in:
A) Re-create the results on the SPSS handout, for the
matched
pairs situations. 7.31, 7.32 SPSS vit. c, test and CI's; matched pairs +. For 32b, you have to re-express the 5 "after" numbers as percents (e.g. Sample 1: 20/98 = 20.4%...) and then find a new CI of this data set. 7.39 SPSS C: Factory to Haiti, matched pairs The answers weirdly assume you'll do Haiti - Factory, when it seems more natural to do Factory - Haiti; and the SPSS file is set up to do Factory - Haiti. Do it the natural way. (Note something weird; this must not be the same WSB of 7.31and 32 because it starts out with half the vitamin C. What's up??) Using SPSS in lieu of tables: You
may use your calculator as an aid. Sketch the probabilities, and
show your computations. If feasible, check with book table. 7.47 piano lessons sign test
by
hand
. Also get an exact value for P from SPSS using the Binomial
dist. 7.27, 7.44 TBBMC Read, don't do the problems as
written!
Sometimes a quick "sign test" will give an indication of whether
there's
a significant difference. For these data (p. 478) just count the
number of +'s in the 8 trials. From your knowledge of flipping
coins,
will there be a significant difference between the operators? |
Read,
discuss
|
Optional (more practice) |
What is the
significance
to Statistics of the Guinness Stout Bottle ?
~~~~~~~~~~~~~~~
Homework questions: Day 35
MATCHED PAIRS t procedures: (get for
free!)
Example by hand, robustness Day 35
SPSS:
Analyze >Compare Means> Paired-Samples T-test. handout
Data in parallel
columns--subtracts rightmost from left column. Don't get to
choose
which way to subtract.
CI level under Options.
or Transform>Compute: Let Target variable
be Difference, Numeric expression be VarA -VarB. You can
use
the Difference to examine for Normality, do one-sample procedures on
Difference.
What if t's not suitable?
Skewness: Try log or other transformation, work
on transformed data. (Sadly, CI's can't be transformed back.
Because
µlog(X) is not equal to log(µX) )
last
time.
Outliers or other nonnormality: Distribution-free/
nonparametric procedures. Usually less power than
distribution-based.
(Uses less information, duh!) Often based on binomial or similar
models.
Sign test (p. 465-8) is a nice "trick", that
turns
any paired sample situation into a binomial situation.
For each pair, "success" is that the item from Group A is bigger than
the matched item from Group B. If there are ties, just throw them
away (like the flipped coin that balances on its edge).
The null hypothesis is always that the groups are the same,
so it is
just like a coin-flip, the prob. of success is 1/2 under H0.
Then see how likely you are to get at least as many successes
as
you saw, using the binomial distribution. That's the p-value, for
the alternative Ha that Group A is bigger on average
than
Group B. More specifically, we're testing this:
H0 : (the median of XGroup A-GroupB is
0) ~ (probability that XGroup A-GroupB is positive =
.5)
~ ( p =.5) .
Ha: (the median is above 0) ~ (probability that XGroup
A-GroupB is positive > .5) ~ ( p >.5) .
Example: We suspect that students living on campus for
their first semester gain weight. Poll 11 students, asking just
the
sign of their weight change:
Get these results + + + 0 + -
+
+ + - + (0 means no change) 8 +'s
and
2 -'s out of 10.
If there's no weight gain on average (Median gain is 0) we have a B(10,
.5) distribution. One sided alternative, that median gain is
higher.
Let X be B(10, .5). Then the P-value is P(X = 8, 9, or 10) =
.0439
+ .0098 + .0010 = .0547, from Table C in the book.
SPSS:
Transform/Compute
(pp.8&9, first
handout)
CDF.BINOM(7, 10, .5) gives the probability
that X
is less than or equal to 7, in a B(10, .5) distribution.
You will
probably want to increase the number of digits after the decimal point
(Decimals). To find P (X = 8, 9, or 10), subtract the SPSS number
from 1.
Disadvantage: You're obviously
throwing away a lot of information (how big the differences
are).
The result is that the power to detect a difference--if there
is
one--is much less than that of a t-test, where the t is usable.
The sign test can be extended to a single
data set, where you test the median: If
a
is the median,
then in the population, half the observations will be above a,
and
half below. Each data point is then like a coin flip, above or below
the
median. (Can you see how this could be extended to test for a
particular
value of the first quartile, for instance?)
SPSS will
do the sign test if you have the two "matched pair"
variables.
(Be sure you have descriptive labels)
Analyze>Nonparametric Tests>2 Related Samples. Get a
box
where you choose the pair (can't choose direction of subtraction).
Under Test Type, choose Sign. Get counted results
and two-sided P-value.
Start here Monday Yes
Sec.
7.2, Comparing two means"Two-sample
tests". Two SRS's, independent, from
distinct
populations. (Populations are normally distributed)
Often--comparing means from an experiment with two treatments (usually
control and "treatment"). Cf. p. 202.
/--- Group 1, n1---- Treatment 1---\
/
\
Random
asst.
Compare results
\
/
\--- Group 2, n2---- Treatment 2---/
To examine the difference of the two means, µ1
- µ2:
Theoretical assumption is normal populations. Back to
back stemplots are good; boxplots will do.
We use the Difference of the two x-bars, diff =
xbar1 - xbar2
=
.
The Standard Deviation is calculated like the hypotenuse
of a right triangle (Pythagorean Theorem), from the individual
standard
deviations:
Then the "Two-sample z-statistic" 
is N(0,1) (p. 488)
But we don't know the population standard deviations! We need
the Standard Error of the difference xbar1 - xbar2
,
and
then we can proceed as before, more or less. As usual, we substitute
sample
standard deviations for population standard deviations, and our z's are
replaced by t's.
For testing, if Ho is "population means are equal"
"Two-sample
t-statistic"
Unfortunately, this doesn't have an exact t-distribution, and its exact distribution is very hard to deal with; but if we "adjust" the degrees of freedom, t is a good approximation..
For doing by hand: df
= smaller of (n1- 1) and (n2- 1).
Will give a "conservative" result--slightly wider C.I., slightly less
significance, than a "sharper" value. If your
results
hinge on the difference between this result and the computer result,
they're
too close for comfort anyway. Table D? go to
lower
df. if the one you want isn't given.
From a computer: df = complicated formula on p. 498. Produces non-integer degrees of freedom. Very good approximation to the exact distribution, if both sample sizes are at least 5. Unsuitable for doing by hand.
Once we have (xbar1 - xbar2) , SEdiff
, and the df, our formulas pattern on the earlier
ones.
CI : estimate + t* . SEestimate
CI for µ1 - µ2,
difference
of means, is
Test: H0: µ1 - µ2
= 0 same as µ1 = µ2 , "no
difference"
always
Ha: µ1
- µ2 > 0 same as µ1
> µ2 Be
careful with these, that you know which direction you want.
or Ha: µ1
- µ2 < 0 same as µ1 <
µ2
Often
we label our variables "1" and "2" so that we expect µ1 >
µ2
or Ha: µ1
- µ2
0 same as µ1
µ2 (not equal)
Calculate t, find P-value
(approximate, conservative)
Example
by hand.
You can check your by-hand work with Excel
Two-sample calculator
Robust? Yes...p. 493 Outliers are bad, as
before:
Use same guidelines (p. 463) with n = n1 + n2
Large n's have robustness from CLTh.
Equal sample sizes help: then robust against
non-normality
, more so if populations have the same shape, down to n=5 each..
In doubt? Use the conservative df!
--SPSS will do our computations when we
are given raw data. Next.
| Sievers home | Math251-Fall07/Day2s36.htm | 10pm | 11/15/07 |