MATH 251, Probability and Statistics I, Fall 2006, Mon. Nov. 26, Day 38..

Read: Finish 7.2.  I won't require you to remember the details of the pooled two-sample t, but the example and comments are instructive.  Then 7.3, pp. 515-16, NOT including the F-test for equality of spread; pp. 518-19 Robustness ONLY.  Chapter 8 next.
Hand in:
Sec. 7.2 Those that need to be done on the computer are labeled SPSS (two-sample is on the Handout.  ) Unless otherwise instructed, use the "Equal variances not assumed" results.
A) (SPSS)  Redo the analysis on the  Handout. .
p. 511, 7.77, 78 (SPSS)  piano lessons again .  The given data file conveniently gives the grouping variable in 2 forms, "piano/control" and "0/1".  Piano/control gives nicer labels on your output if you can make it work.
7.79  compare all the piano lesson analyses. A lurking variable in the previous problems (7.29-30) was the passage of six months' time; during which preschool children learn a lot of stuff. 
p. 532, 7.140 (SPSS) home prices  The SPSS file doesn't exist! You'll have to type it in.  If you're the first, do the rest of us a favor and email us the file?

p.508, 7.68, 7.69, 7.70 (SPSS) bread vitamins again. See notes here: 
Notes:  for 7.68 our sample size is too small to use the "Equal variances not assumed" df (need 5 or more each.)  IPS says to use min(n1 -1, n2 -1), here it = 1.  So the SPSS t is correct but the sig. is not.  Use the T-table (table D) to approximate the P-value. 
For 7.69 you can't do it with the data file given.  You have to rearrange the data in the correct form to use the Paired Samples t-test.

= = = = = = = = = = = = = = = = = = 
Pooled-sample (equal sigma's).  Pooled-sample computation gets a bigger d.f. and therefore a shorter CI & smaller p-value than the unequal variances method, usually, on the same data, but we usually can't justify the equal variances assumption.. 
p. 528, 7.122 a, b,  and 7.123 (both by hand).  Note you're given the SE's, so for the pooled estimator also, don't forget to use s.d.'s to plug into the formula p. 499.  We can "justify" the pooled sample method because the s.d.'s are quite close.

Some algebra:  General advice on designing experiments is to put equal numbers into each sample if you can.  P. 503 notes that the pooled t-procedure is fairly robust against nonnormality and differing sigmas if the n's are equal. Here are two other reasons why equal n's is nice.
A)  If n1= n2 then the expression for the standard error of (xbar1 -xbar2 ), i.e. the denominator of the t-statistic, is the same for the unequal variances version p. 489  and for the pooled-t p.500, 499.  Use algebra to show they are the same (set n= n1= n2 in all the formulas, and simplify the two denominators).  [Thus the only difference in computing with the different versions in this case will be the d.f. you use] 

B) If n1= n2 = n and  s1= s2 =s, the complicated df formula on p. 498 collapses into 
df = 2n - 2 (= n1 + n2 - 2).  Use algebra to show it. [So at least for equal n's and similar s's, the complicated  df formula will not lose you much sharpness compared to the pooled  version.  ]

 Read, discuss
 
 
 
 
 
 

 

Optional
(more practice) 
Exams not finished; Sorry!.

Homework questions Day 37
Two-sample procedures, review  Example
Using SPSS for two-sample:  Handout.  Analyze> Compare Means> Independent-Samples T-test
    Define groups using values of grouping variable.  Use equal variances not assumed row of results.

Sec. 7.3: "Pooled two-sample t-procedure " == "Equal variances assumed"
"Equal Variances" assumption, "pooled sample" p.499ff.)
 Recall, looking at diff xbar1 - xbar2 .
We're trying to estimate

but now we're assuming the two variances are equal, so we can pull the common s.d. out from under the square root:

 
We now need to estimate the common s.d..
Pooled estimator "s2p" of the common variance:   Recall standard deviation formula (day 4)
  Rationale:  Give each individual data point equal weight in estimating sigma.  The sigmas are the same but the means are not!
         If the values from sample 1 are x1, x2,...xn1, and those from sample 2 are y1, y2,...yn2,
our standard deviation-making table would look like this
value  | value - mean  | (value - mean)2
x1          x1 - xbar       (x1 - xbar)2
x2          x2 - xbar       (x2 - xbar)2
. . .
xn1         xn1 - xbar      (xn1 - xbar)2
y1          y1 - ybar       (y1 - ybar)2
y2          y2 - ybar       (y2 - ybar)2
. . .                                                 Sum the right hand column
yn2        yn2 - ybar      (yn2 - ybar)    to get the numerator.
__________________________________________________________________
The degrees of freedom is the total number of points (n1 + n2) minus one for each estimated mean, xbar and ybar.
                                                                              (n1 + n2- 2)  is the denominator.
    If you already have the separate sample variances, s12 and s22 ,  you can get the same numerator this way:  Multiply each one by its separate denominator (degrees of freedom) and add.  (n1 - 1)s12 + (n2- 1)s22
          (This is the book's formula, p. 499: [(n1 - 1)s12 + (n2- 1)s22 ] / (n1 + n2- 2))  = sp2.
    This only estimates the common variance sigma2.  To get the standard error of the difference,  you  multiply sp by sqrt(1/n1 +1/n2)   (This is the analogous thing to dividing a single estimate of sigma by sqrt(n).)
    The nice thing about this approach is that the resulting "pooled two-sample t-statistic" really does have a t distribution
(with  (n1 + n2 - 2) degrees of freedom).  The not-nice thing is that it's quite hard to know if two variances are equal if you only have small  n's.  Until modern computing methods tested out the "unequal variance" methods, it was the only t procedure.   Use the unequal variances method!
The reasoning of the pooled-sample estimation of sp2 is used in comparing 3 or more treatments, with "Analysis of Variance" (ANOVA), Ch. 12&13.
 Sec. 7.3:  Brief comments
 "Pooled two-sample t-procedure " == "Equal variances assumed"
was the only choice in many circumstances before the good (Equal variances not assumed)  approximations were developed, computing power increased, and robustness was explored.
Big problem: How do we know that we have equal variances?  We don't.  The usual test for equal variances has these problems:
1) the Null hypothesis is that the variances are equal, and we gather evidence only against a null hypothesis.  So we don't have a way of assessing evidence for equal variances (the null hypothesis).  Best we can say is we don't have strong evidence against.
2) the usual test on variances is highly NONRobust (highly sensitive) to departures from normality in the populations.
So don't bother.  But: the usual "F" test provides an example of a different approach to estimation and testing; uses the ratio,  s12/s22.  All our techniques so far have been for differences, but ratios are used too.


Sievers home  Math251-Fall07/Day2s38.htm  9pm   11/25/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.