| Hand in:
Two-sample, p. 566 8.31 choose which. 8.41, 42 downloading music 8.48 drunken cyclists 8.47, and p.574, 8.66, 8.67 gender bias in 10 textbooks This is an example of how the same data set can be re-analyzed in many ways. If you get tired of doing these by hand, at least read and understand the solutions. (The text gives no HW problems where the two-sample plus-four technique needs to be implemented. Sorry about that!) |
Read, discuss
8.33 what's wrong? p.570, 8.54 presenting results |
Optional |
Ch. 8, Inference for proportions.
Sec. 8.2, Comparing 2 proportions (from independent samples)
Comparing means from an experiment with two treatments (usually
control and "treatment").
/--- Group 1, n1---- Treatment 1---\
/
\
Random asst.
Compare results --"proportions"
\
/
\--- Group 2, n2---- Treatment 2---/
To examine the difference of the two proportions, p1
-p2:
We use the difference of the two sample proportions, D =
, and assume it's approximately Normal.
We find the standard deviation of D, and
make estimates of the p's as before.
Large sample CI: (90%-99% C, all successes and
failures > 10) : D + z* SED,where
(Plus four (p. 559): Add 2 to each n, one to each of the successes
and the failures. Good down to n's >5!)
Test: H0 : p1 =
p2
As usual, find D =
; the mean of D is 0 under the null hypothesis, so all that remains to do the
test is to divide by the standard deviation of D to get a z-value, and find
a P-value from the normal table.
What should we use for the standard deviation of D, under the null hypothesis?
We aren't assuming that we know either p1 or p2 now, only
that they are the same.
We could use SED, as in the CI.
BUT since we're assuming the two p's are equal, we can use a "pooled"
technique, which gives each observation equal weight.
Assuming p1=p2 =
p, the common value,
We still need to estimate the common p.
Do it by throwing both set of data into the same pot, so we have a
total of
n1 + n2 observations, and
we have X1 + X2 total "successes",
so our pooled estimate is ,
and we use this to build a "pooled" SE,
.
Note how this development parallels the development for the pooled two-sample
t.
Another approach to comparison of two proportions--Relative risk
(pp. 563-4)
Looks at the ratio of the two proportions, p1/
p2. For instance,
if the proportion of people who die from disease A under treatment 1 is
.30, and the proportion who die from disease 2 is .60, then the relative
risk of treatment 1 to treatment 2 is .30/.60 = .5; the risk of treatment
1 is half that of treatment 2.
CI's for relative risk can be built based on sample proportions; they
are not of the form estimate + m, and aren't symmetrical
around the estimate. I delved into SPSS to find out where they calculated
these, and it's buried very deep, inside "log linear" analyses.
Some links, to show this stuff is "real" (and often much more complex than
we've covered): The census bureau not only does the census of everybody,
but does complex sampling with the "long form" census form, and interim sampling
between the ten-year censuses. Then it has to make estimates based on
these samples.
County median income, proportion of poor, etc.
http://www.census.gov/hhes/www/saipe/county.html
Their discussion of CI's: http://www.census.gov/hhes/www/saipe/techdoc/stcty/ci.html
3 or more independent samples have "entirely different"
analyses:
3 or more independent samples: comparing proportions--
use two-way
tables and "Chi-square" statistics (Ch. 9) to test if proportions different;
Extension: two "dimensions" of table (color of medicine package, willingness
to buy it) are ?? independent. (Research methods of Sociology)
"Chi-square goodness-of
fit" (9.4)--Biology models.
(Chi-square distribution
is based on the sum of squared independent Z's (Z's are standard normal))
3 or more independent samples: comparing means--
Analysis of variance (Ch. 12&13) (Quantitative Research Methods of Psychology)
Analysis of variance nod, if time. (No more HW.)
| Sievers home | Math251-Fall07/Day2s40.htm | 3pm | 11/30/07 |