7.1 continued.pp. 518-523. Sign test and log transformation,
when the data's clearly not normal. (remaining cell in table)
Log transformation can sometimes turn right-skewed data normal.
(A distribution for which this is true is said to have a "lognormal" distribution.)
The only unsatisfactory thing is that you can't translate your confidence
intervals back to the original units. Mean of log(xi's) doesn't
equal log(mean of xi's).
Will cover Sign Test
Friday:
Sign test is a nice "trick", that turns
any paired sample situation into a binomial situation. For
each pair, "success" is that the item from group 1 is bigger. If
there are ties, just throw them away (like the flipped coin that balances
on its edge). The null hypothesis is always that the groups are the
same, so it is just like a coin-flip, the prob. of success is 1/2 under
H0. Then see how likely you are to get at least as many successes
as you saw, using the binomial distribution. That's the p-value.
Disadvantage:
You're obviously throwing away a lot of information (how big the differences
are). The result is that the power to detect a difference--if
there is one--is much less than that of a t-test, where the t is usable.
The sign test can be
extended to a single data set, where you test the median: If
a
is the median, then in the population, half the observations will be
above a, and half below. Each data point is then like a coin flip,
above or below the median. (Can you see how this could be extended
to test for a particular value of the first quartile, for instance?)
- - - - - - - - - - - - - - - - - -
- - -
Sec.
7.2, Comparing two means
"Two-sample tests". Two SRS's, independent, from
distinct
populations. (Populations are normally distributed)
Often--comparing means from an experiment with two treatments (usually
control and "treatment"). Cf. p. 242.
/--- Group 1, n1---- Treatment 1---\
/
\
Random asst.
Compare results
\
/
\--- Group 2, n2---- Treatment 2---/
To examine the difference of the two means, µ1
- µ2:
We need fairly normal populations; no extreme outliers.
Back to back stemplots are good; boxplots will do.
We use the difference of the two x-bars,
We need the Standard Deviation of xbar1 -
xbar2
,
and then we can proceed as before, more or less.
Using the Algebra of means and variances, we find that
Variance (xbar1 - xbar2)
= Variance (xbar1) + Variance (xbar2)
So theStandard Deviation is calculated like the hypotenuse of a right
triangle (Pythagorean Theorem), from the individual standard deviations.
We can use this to standardize the difference (xbar1 - xbar2),
and get a standard normal Z (p. 539).
But usually the standard deviations are unknown, and we substitute
s's for sigmas. Then our hypotenuse formula is
SEdiff = sqrt(SE(xbar1)2 + SE(xbar2)2 )
"t" = (xbar1 - xbar2)-(µ1
- µ2) (See
p. 541, for another way of writing the same thing.)
SEdiff
It would be nice if substituting s's gave us a t distribution.
Unfortunately, this doesn't quite have an exact t-distribution, and its
exact distribution is very hard to deal with.
For doing by hand: df
= smaller of (n1- 1) and (n2- 1).
Will give a "conservative" result--slightly wider C.I., slightly less
significance, than a "sharper" value. If your results
hinge on the difference between this result and the computer result, they're
too close for comfort anyway.
From a computer: df = complicated formula on p. 403. Produces non-integer degrees of freedom. Very good approximation to the exact distribution, if both sample sizes are at least 5. Unsuitable for doing by hand.
Once we have (xbar1 - xbar2) , SEdiff
, and the df, our formulas pattern on the earlier ones.
Example
CI : estimate + t* . SEestimate
CI for µ1 - µ2,
difference
of means, is (xbar1 - xbar2)
+
t* . SEdiff
Test: H0: µ1 - µ2
= 0 same as µ1 = µ2 , "no difference"
Ha:
µ1 - µ2 > 0 same as µ1
>
µ2 Be careful with these, that you
know which direction you want.
or Ha: µ1
- µ2 < 0 same as µ1 < µ2
Often
we label our variables "1" and "2" so that we expect µ1
>
µ2
or Ha: µ1
- µ2 <> 0 same as µ1 <>
µ2 (not equal)
Calculate t, find P-value
(approximate, conservative)
There is a third way of doing these; the "pooled two-sample t-procedure."p.550.
It was the only choice in many circumstances before the above good
approximations were developed, computing power increased, and robustness
was explored. It requires that the variances of the two populations
be equal. The newer ways are usually preferable in practice.
However, the pooling of the data to estimate the common variance is a device
also used elsewhere, so is worth looking at.
- - - - - - - - - - - - - - - - - - - -
Read pp. 518-523.
Read ahead 7.2, pp. 537-549, then continue.
We'll do all of 7.2. You will NOT need to remember the d.f. formula
p. 549.
| Hand in Friday (7.1): The last assignment
( SPSS work we cleaned up in class today)
Sign tests HW postponed till Friday. Sign tests can be done easily by hand. (do at least one by hand.) Try SPSS. (On the handout you have).. 7.43 a, b (turn page!) sign test, rt. threads 7.44 sign test, summer institute. 7.45 sign test?? Log transformations. Need SPSS.
|
Read, discuss
|
Optional
(more practice) |
| Sec. 7.2, part of the next assignments.
Those that need to be done on the computer are labeled SPSS (two-sample
is on the handout you have)
p. 556, 7.48, 49, 50 (SPSS) bread vitamins
Pooled-sample (equal sigma's). Pooled-sample computation gets a bigger d.f. and therefore a shorter CI & smaller p-value than the unequal variances method, on the same data. 7.65 and 7.77 rowing--weight. (unequal and equal methods compared.) 7.75 social insight. this is Example 7.16, p. 546, not 526. Some algebra: General advice is to put equal numbers into each sample if you can. Here's some hints why. A) If n1= n2 then the expression for the standard error of (xbar1 -xbar2 ), i.e. the denominator of the t-statistic, is the same for the unequal variances version p. 541 amd for the pooled-t p.551. Use algebra to show they are the same (set n= n1= n2). [Thus the only difference in computing with the different versions in this case will be the d.f. you use] B) If n1= n2 = n and
s1= s2 =s, the complicated df formula on p. 549 collapses
into
|
Read, discuss
|
Optional
(more practice) |
| Sievers home | Math251-Fall01/DayP36.htm | 3pm | 11/29/01 |