MATH 251, Probability and Statistics I, Fall 2005, Fri. Dec. 2, Day 40 After class

(Re)Read Ch 8.1, inference for a single proportion.  Read 8.2, two independent sample proportions.
Hand in: 
p. 549 sample size: 8.26, 8.27 Sample size  Note, no surprise I hope, that your graph is a parabola.
Two-sample, p. 566
8.31 choose which. 
8.41, 42 downloading music
8.48 drunken cyclists
8.47, and p.574, 8.66, 8.67 gender bias in 10 textbooks  This is an example of how the same data set can be re-analyzed in many ways.  If you get tired of doing these by hand, at least read and understand the solutions.
(The text gives no HW problems where the two-sample plus-four technique needs to be implemented. Sorry about that!)
Read, discuss 
8.33 what's wrong?
p.570, 8.54 presenting results
Optional 
Exams still not finished.  Monday for certain sure.
Homework questions?  Day 39

Ch. 8, Inference for proportions.
Plan ahead for desired CI Margin of Error (pp. 545-6)
(Large n) CI:   

 Decide on desired Margin of error m,  and C (thus z*).  Guesstimate a true p, p*. (p*=1/2 requires the largest sample size--safest.  Remember  you showed that p(1-p) had its maximum at p = .5).
Solve margin-of-error equation for n.   (Some results,   p. 546.  If you wonder why so many polls feature a 3% margin of error and a sample size of 1060 or so, this tells you why. )

Sec. 8.2, Comparing 2 proportions (from independent samples)

Comparing means from an experiment with two treatments (usually control and "treatment").
                /--- Group 1, n1---- Treatment 1---\
              /                                    \
 Random asst.                                       Compare results --"proportions"
              \                                    /
               \--- Group 2, n2---- Treatment 2---/
To examine  the difference of the  two proportions, p1 -p2:

We use the difference of the two sample proportions,  D =  , and assume it's approximately Normal.
We find the standard deviation of D, and make estimates of the p's as before.
Large sample CI:  (90%-99% C, all successes and failures > 10)  :  D + z* SED,where

(Plus four (p. 559):  Add 2 to each n, one to each of the successes and the failures.  Good down to n's >5)

Test:  H0  : p1 = p2
  As usual, find D = ; the mean of D is 0 under the null hypothesis, so all that remains to do the test is to divide by the standard deviation of D to get a z-value, and find a P-value from the normal table.
What should we use for the standard deviation of D, under the null hypothesis?  We aren't assuming that we know either p1 or p2 now, only that they are the same.
We could use SED, as in the CI.
BUT since we're assuming the two p's are equal, we can use a "pooled" technique, which gives each observation equal weight.
Assuming p1=p2 = p, the common value, 
We still need to estimate the common p.
Do it by throwing both set of data into the same pot, so we have a total of
  n1 + n2 observations,  and we have X1 + X2 total "successes",
so our pooled estimate is , and we use this to build a "pooled" SE,
.

Note how this development parallels the development for the pooled two-sample t.


Another approach to comparison of two proportions--Relative risk (pp. 563-4)
Looks at the ratio of the two proportions,  p1/ p2.  For instance, if the proportion of people who die from disease A under treatment 1 is .30, and the proportion who die from disease 2 is .60, then the relative risk of treatment 1 to treatment 2 is .30/.60 = .5; the risk of treatment 1 is half that of treatment 2.
CI's for relative risk can be built based on sample proportions; they are not of the form estimate + m, and aren't symmetrical around the estimate.  I delved into SPSS to find out where they calculated these, and it's buried very deep, inside "log linear" analyses.


Some links, to show this stuff is "real" (and often much more complex than we've covered):  The census bureau not only does the census of everybody, but does complex sampling with the "long form" census form, and interim sampling between the ten-year censuses.  Then it has to make estimates based on these samples.
County median income, proportion of poor, etc.   http://www.census.gov/hhes/www/saipe/county.html
Their discussion of CI's:   http://www.census.gov/hhes/www/saipe/techdoc/stcty/ci.html

Sievers home  Math251-Fall05/Dayps40.htm    4pm   12/2/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.