MATH 251, Probability and Statistics I, Fall 2001, Wed. Nov. 28, Day 36 final version

Exam 2 due this afternoon.
Science Colloquium Friday:  Mathematics! Logical paradoxes, speaker from Cornell.
- - - - - - - - - - - - - - - -
Interesting news release:  Government "experimental methods" to measure tar and nicotine in cigarettes are not useful .  Smokers manage to smoke in such a way as to get more tar and nicotine, and low tar cigarettes have not helped with cancer rates.   http://newscenter.cancer.gov/pressreleases/lowtar.html
Also the National Cancer Institute's fantastic website of charts and graphs http://www.nci.nih.gov/atlasplus/charts.html
(lots of 95% confidence intervals)
 - - - - - - - - - - - - - - - -

7.1 continued.pp. 518-523. Sign test and log transformation, when the data's clearly not normal.  (remaining cell in table)
Log transformation can sometimes turn right-skewed data normal. (A distribution for which this is true is said to have a "lognormal" distribution.)  The only unsatisfactory thing is that you can't translate your confidence intervals back to the original units. Mean of log(xi's) doesn't equal log(mean of xi's).

Will cover Sign Test Friday:
Sign test is a nice "trick", that turns any paired sample situation into a binomial situation. For each pair, "success" is that the item from group 1 is bigger.  If there are ties, just throw them away (like the flipped coin that balances on its edge).  The null hypothesis is always that the groups are the same, so it is just like a coin-flip, the prob. of success is 1/2 under H0. Then see how likely you are to get at least as many successes as you saw, using the binomial distribution.  That's the p-value.
     Disadvantage:  You're obviously throwing away a lot of information (how big the differences are).  The result is that the power to detect a difference--if there is one--is much less than that of a t-test, where the t is usable.
     The sign test can be extended to a single data set, where you test the median:  If a is the median, then in the population, half the observations will be above a, and half below. Each data point is then like a coin flip, above or below the median.  (Can you see how this could be extended to test for a particular value of the first quartile, for instance?)
 - - - - - - - - - - - - - - - - - - - - -
Sec. 7.2, Comparing two means
"Two-sample tests".  Two SRS's, independent, from distinct  populations. (Populations are normally distributed)
Often--comparing means from an experiment with two treatments (usually control and "treatment"). Cf. p. 242.
                /--- Group 1, n1---- Treatment 1---\
              /                                    \
 Random asst.                                       Compare results
              \                                    /
               \--- Group 2, n2---- Treatment 2---/
To examine  the difference of the  two means, µ1 - µ2:
We need fairly normal populations; no extreme outliers.  Back to back stemplots are good; boxplots will do.
We use the difference of the two x-bars,
We need the Standard Deviation  of  xbar1 - xbar2 , and then we can proceed as before, more or less.
Using the Algebra of means and variances, we find that
      Variance (xbar1 - xbar2) = Variance (xbar1) + Variance (xbar2)
So theStandard Deviation is calculated like the hypotenuse of a right triangle (Pythagorean Theorem),  from the individual standard deviations.  We can use this to standardize the difference (xbar1 - xbar2), and get a  standard normal Z (p. 539).
But usually the standard deviations are unknown, and we substitute s's for sigmas.  Then our hypotenuse formula is

    SEdiff  = sqrt(SE(xbar1)2 + SE(xbar2)2 )

"t" = (xbar1 - xbar2)-1 - µ2)       (See p. 541, for another way of writing the same thing.)
              SEdiff
   It would be nice if substituting s's gave us a t distribution.  Unfortunately, this doesn't quite have an exact t-distribution, and its exact distribution is very hard to deal with.

For doing by hand:  df = smaller of (n1- 1) and (n2- 1).
Will give a "conservative" result--slightly wider C.I., slightly less significance, than a "sharper" value.  If your results hinge on the difference between this result and the computer result, they're too close for comfort anyway.

From a computer:  df = complicated formula on p. 403.  Produces non-integer degrees of freedom.  Very good approximation to the exact distribution, if both sample sizes are at least 5. Unsuitable for doing by hand.

Once we have (xbar1 - xbar2) , SEdiff and the df, our formulas pattern on the earlier ones. Example
CI :  estimate + t* . SEestimate
    CI for µ1 - µ2, difference of means,  is (xbar1 - xbar2) + t* . SEdiff
Test:  H0: µ1 - µ2 = 0 same as µ1 = µ2 , "no difference"
           Ha: µ1 - µ2 > 0 same as µ1 > µ2   Be careful with these, that you know which direction you want.
      or Ha: µ1 - µ2 < 0 same as µ1 < µ2 Often we label our variables "1" and "2" so that we expect µ1 > µ2
      or Ha: µ1 - µ2 <> 0 same as µ1 <> µ2  (not equal)
        Calculate t, find P-value (approximate, conservative)

There is a third way of doing these; the "pooled two-sample t-procedure."p.550.  It was the only choice in many circumstances before the above good  approximations were developed, computing power increased, and robustness was explored. It requires that the variances of the two populations be equal.  The newer ways are usually preferable in practice.  However, the pooling of the data to estimate the common variance is a device also used elsewhere, so is worth looking at.
- - - - - - - - - - - - - - - - - - - -
Read pp. 518-523.
Read ahead 7.2, pp. 537-549, then continue.  We'll do all of 7.2.  You will NOT need to remember the d.f. formula p. 549. 

Hand in Friday (7.1):  The last assignment ( SPSS work we cleaned up in class today) 
Sign tests HW postponed till Friday. 
Sign tests can be done easily by hand.  (do at least one by hand.)
Try SPSS. (On the handout you have)..
7.43 a, b (turn page!) sign test, rt. threads
7.44 sign test, summer institute.
7.45 sign test??

Log transformations.  Need SPSS.
7.46  guinea pigs
7.47 failure time Do you see a common thread here in the kind of data that is lognormal? (Hint--book title in problem)

 Read, discuss
 
 
 
 
 
 

 

Optional
(more practice) 
 
Sec. 7.2,  part of the next assignments. Those that need to be done on the computer are labeled SPSS (two-sample is on the handout you have)

p. 556, 7.48, 49, 50 (SPSS) bread vitamins
7.51, 52 (SPSS)  piano lessons again 
7.53  compare all the piano lesson analyses.  I don't necessarily believe all of the book answer.
7.57 cocaine and birthweight by hand--the unequal sigma method.
7.64rowing a, b,c (turn page) Use the unequal sigma method (note the std dev's are quite different). 



Pooled-sample (equal sigma's).  Pooled-sample computation gets a bigger d.f. and therefore a shorter CI & smaller p-value than the unequal variances method, on the same data. 
7.65 and 7.77 rowing--weight.  (unequal and equal methods compared.)
7.75 social insight.  this is Example 7.16, p. 546, not 526.


Some algebra:  General advice is to put equal numbers into each sample if you can.  Here's some hints why. 
A)  If n1= n2 then the expression for the standard error of (xbar1 -xbar2 ), i.e. the denominator of the t-statistic, is the same for the unequal variances version p. 541  amd for the pooled-t p.551.  Use algebra to show they are the same (set n= n1= n2).  [Thus the only difference in computing with the different versions in this case will be the d.f. you use] 

B) If n1= n2 = n and  s1= s2 =s, the complicated df formula on p. 549 collapses into 
df = 2n - 2 = n1 + n2 - 2.  Use algebra to show it. [So at least for equal n's and similar s's, the complicated  df formula will not lose you much sharpness compared to the pooled  version.  ]

 Read, discuss
 
 
 
 
 
 

 

Optional
(more practice) 

Sievers home  Math251-Fall01/DayP36.htm  3pm   11/29/01
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.