MATH 251, Probability and Statistics I, Fall 2005, Mon. Nov. 28, Day 38after class

Read: Finish 7.2.  I won't require you to remember the details of the pooled two-sample t, but the example and comments are instructive. Next, 7.3, pp. 515-16, NOT including the F-test for equality of spread; pp. 518-19 Robustness ONLY.  Chapter 8 next.
Hand in: everything: 
Sec. 7.2 Those that need to be done on the computer are labeled SPSS (two-sample is on the Handout.  ) Unless otherwise instructed, use the "Equal variances not assumed" results.
A) (SPSS)  Redo the analysis on the  Handout. .
p. 511, 7.77, 78 (SPSS)  piano lessons again .  The given data file conveniently gives the grouping variable in 2 forms, "piano/control" and "0/1".  Piano/control gives nicer labels on your output if you can make it work.
7.79  compare all the piano lesson analyses. A lurking variable in the previous problems (7.29-30) was the passage of six months' time; during which preschool children learn a lot of stuff. 
p. 532, 7.140 (SPSS) home prices  The SPSS file doesn't exist! You'll have to type it in.  If you're the first, do the rest of us a favor and email us the file?

p.508, 7.68, 7.69, 7.70 (SPSS) bread vitamins again. See notes here: 
Notes:  for 7.68 our sample size is too small to use the "Equal variances not assumed" df (need 5 or more each.)  IPS says to use min(n1 -1, n2 -1), here it = 1.  So the SPSS t is correct but the sig. is not.  Use the T-table (table D) to approximate the P-value. 
For 7.69 you can't do it with the data file given.  You have to rearrange the data in the correct form to use the Paired Samples t-test.

= = = = = = = = = = = = = = = = = = 
Pooled-sample (equal sigma's).  Pooled-sample computation gets a bigger d.f. and therefore a shorter CI & smaller p-value than the unequal variances method, usually, on the same data. 
p. 528, 7.122 a, b,  and 7.123 (both by hand). 

Some algebra:  General advice on designing experiments is to put equal numbers into each sample if you can.  Here's some hints why. 
A)  If n1= n2 then the expression for the standard error of (xbar1 -xbar2 ), i.e. the denominator of the t-statistic, is the same for the unequal variances version p. 489  and for the pooled-t p.500, 499.  Use algebra to show they are the same (set n= n1= n2 in all the formulas, and simplify the two denominators ).  [Thus the only difference in computing with the different versions in this case will be the d.f. you use] 

B) If n1= n2 = n and  s1= s2 =s, the complicated df formula on p. 498 collapses into 
df = 2n - 2 (= n1 + n2 - 2).  Use algebra to show it. [So at least for equal n's and similar s's, the complicated  df formula will not lose you much sharpness compared to the pooled  version.  ]

 Read, discuss
 
 
 
 
 
 

 

Optional
(more practice) 
Exams not finished; Sorry!.

Homework questions Day 37
Two-sample procedures, review  Example
Using SPSS for two-sample:  Handout.  Analyze> Compare Means> Independent-Samples T-test
    Define groups using values of grouping variable.  Use equal variances not assumed row of results.


"Equal Variances" assumption, "pooled sample" p.499ff.)
Pooled estimator "s2p" of the common variance:
  Rationale:  Give each individual data point equal weight in estimating sigma.  The sigmas are the same but the means are not!
         If the values from sample 1 are x1, x2,...xn1, and those from sample 2 are y1, y2,...yn2,
our standard deviation-making table would look like this
value  | value - mean  | (value - mean)2
x1          x1 - xbar       (x1 - xbar)2
x2          x2 - xbar       (x2 - xbar)2
. . .
xn1         xn1 - xbar      (xn1 - xbar)2
y1          y1 - ybar       (y1 - ybar)2
y2          y2 - ybar       (y2 - ybar)2
. . .                                                 Sum the right hand column
yn1        yn1 - ybar      (yn1 - ybar)    to get the numerator.
__________________________________________________________________
The degrees of freedom is the total number of points (n1 + n2) minus one for each estimated mean, xbar and ybar.
                                                                              (n1 + n2- 2)  is the denominator.
    If you already have the separate sample variances, s12 and s22 ,  you can get the same numerator this way:  Multiply each one by its separate denominator (degrees of freedom) and add.  (n1 - 1)s12 + (n2- 1)s22
          (This is the book's formula, p. 499: [(n1 - 1)s12 + (n2- 1)s22 ] / (n1 + n2- 2))
    This only estimates the common variance sigma2.  To get the standard error of the difference,  you need to do the analogous thing to dividing each estimate of sigma  by sqrt(n).  Since variances are what add, and sigma is assumed to be the same in both groups, the result is that we multiply sp by sqrt(1/n1 +1/ n2) (Hypotenuse rule again. p. 499 middle)
    The nice thing about this approach is that the resulting "pooled two-sample t-statistic" really does have a t distribution
(with  (n1 + n2 - 2) degrees of freedom.)  The not-nice thing is that it's quite hard to know if two variances are equal if you only have small  n's.  Until modern computing methods tested out the "unequal variance" methods, it was the only t procedure.   Use the unequal variances method!
The reasoning of the pooled-sample method is used in comparing 3 or more treatments, with "Analysis of Variance" (ANOVA), Ch. 12&13.  (ANOVA is fairly robust, though pooled two-sample t is not so.)


Sievers home  Math251-Fall05/Dayps38.htm  1:40pm   11/28/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.