MATH 251, Probability and Statistics I, analysis of variance

A brief nod at Analysis of Variance.  This can be done for more than 2 groups--we'll just go through the idea for 2 groups.
Basic assumption:  All groups have the same population variance.

Think again about two groups:  Do they have a common mean?
    H0 : µ1 - µ2 = 0 same as µ1 = µ2 , "no difference"
In the Pooled-sample t distribution, we had:
--on the bottom of the fraction the estimate of the common standard deviation got by using each data point once, taking the sum of squared deviations from the individual group means, dividing by (n1 + n2- 2);
If we stop here instead of taking the square root, we have what is called MSE, the Mean Squared Error:
it is the SSE (Sum of squares of the Error) divided by the degrees of freedom (n1 + n2- 2).
The name "Error" is often replaced (as in SPSS) by "Within Groups"

--on the top of the fraction we had xbar - xbar2 . A little fussing led to a t distribution when the pop. means were =.
If we make the null hypothesis assumption that the two groups have a common population mean  µ,  we can think about the differences
(xbar - µ) and (xbar- µ).    Since we don't know µ, substitute for it the overall sample mean xbarbar gotten by adding all the observations from both groups and dividing by (n1 + n2):
If µ is really the common mean, then (xbar1 - xbarbar)2 and (xbar- xbarbar)2 properly weighted, should give an estimate of  the common variance.
The weighting is n1(xbar1 - xbarbar) + n2(xbar2 - xbarbar)2 :  This is called the SSG (Sum of Squares Between Groups).
Then divide by 2, the number of groups, and get the MSG (Mean Squares Between Groups).

Take the fraction MSG/MSE.  If there is a common mean, this should average around 1, since top and bottom both estimate the common variance.  Its exact behavior follows the F distribution family.
If there is  not a common mean, then the numerator MSG, which looks at the distances of the group means from the common mean,  will be bigger than expected, and the MSG/SSG ratio will be bigger than expected.  The P-value is the right tail of the F distribution.

More groups than 2?  Just add in more terms.  The not-nice thing is that Ha is "not all groups have equal means" (notice that's not the same as "all groups have different means".  Sorting out how different the groups are is messy.


Sievers home  Math251-Fall05/anova.htm     12/5/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.