MATH 251, Probability and Statistics I, Fall 2005, Wed. Nov. 30, Day 39

Read Ch 8.1, inference for a single proportion.  8.2 next.
Hand in: 
p. 549
8.1 choosing
8.3 what's wrong
8.5, 8.6 Gambling athletes' CI's
8.7 Student engagement CI
8.8 rats, plus four CI
8.22 Coffee, test & CI (For part d, notice that this doesn't quite meet the criterion given for large-sample.  Do it anyway .)
8.23a Kerrich coin-flip test(Q: Where did he find the time?  A: He was a prisoner of war...)
Postpone sample size: 8.26, 8.27 Sample size  Note, no surprise I hope, that your graph is a parabola.
p. 576, 8.77 a,b legal discrimination case  This is a fairly old case now. I believe that the Rehnquist court cut down on the usefulness of statistical arguments in discrimination cases, though I don't know the details.
Read, discuss 
8.9 compare plus four and large-sample for large n.
Optional 
 
Exams still not finished.  Blame my husband's germs.
Homework questions?  Day 38
 Sec. 7.3: "Pooled two-sample t-procedure " == "Equal variances assumed" was the only choice in many circumstances before the good (Equal variances not assumed)  approximations were developed, computing power increased, and robustness was explored.
Big problem: How do we know that we have equal variances?  We don't.  The usual test for equal variances has these problems:
1) the Null hypothesis is that the variances are equal, and we gather evidence only against a null hypothesis.  So we don't have a way of assessing evidence for equal variances (the null hypothesis).  Best we can say is we don't have strong evidence against.
2) the usual test on variances is highly NONRobust (highly sensitive) to departures from normality in the populations.
So don't bother.
But:  it provides an example of a different approach to estimation and testing; uses the ratio,  s12/s22.  All our techniques so far have been for differences, but ratios are used too.

Ch. 8, Inference for proportions.
Sample survey:  statistic is proportion who say "yes"out of n.  Let q = 1-p.

Our procedures will parallel those for means, using the fact that for "large" n, the sampling distribution of the sample proportion p-hat  is approximately Normal.

Level C confidence interval for population proportion p:  (Large n, approximate)

How large n?  For usual CI's (90%, 95%, 99%): Number of successes and number of failures both > 15.  (So n > 30, and p is not too close to the end of the range.)

What if your n is smaller?  (but n >10)
Modification:  "plus four estimate" p. 539:  Add 4 "fake" observations, so our new sample size is n + 4.  Call 2 of these observations "Successes" and 2 "Failures", and proceed as before!  Moore uses p-wiggle (p-wiggle) for this modified estimate.
Example:   n = 16, with 5 "successes". p-hat=5/16 =.3125; but we'll use
p-wiggle = (5+2)/(16+4) = 7/20 = .35
SEp-wiggle = sqrt[(.35· .65)/20] = .107.  So an approximate  95% CI for the population proportion p is .35 + 1.96·.107,   or  .35 + .209; roughly .14 to .56.  Wide because the sample size is small.

Test: (Large n) p. 540-3
How large n?  Expected number of successes, and failures >10:  npo >10, nqo>10.  Also:  Need that we don't "use up" too much of the population; population should be at least 10n.  (Pp.336-7, using Binomial for sample proportion, said 20n. This is looser)
  H0 : p = po  ; usual possibilities for Ha.
Base the test on the fact that IF H0 is true, then p-hat is approximately N().  Standardize p-hat , and find the P-value from the z-normal table.  CI gives more information than test; also, it's rare to have a particular p to test against.  (Some appropriate situations:  Is a coin fair? Is Coke preferred to Pepsi? (Cf. sign test.) Is present proportion of Caesarian-section births different from proportion 20 years ago?)

Example:  U.S. Consumer Product Safety Comm. says 90% of American homes have smoke detector(s). Fire department in our city  runs a big publicity campaign to raise awareness and use.  Have they raised the level in this city?  Data:  Building inspectors visit 400 (random) homes, find 376 have detectors.   p is the proportion of detectors in the population (the whole city)
Ho : p = .9  (unchanged after campaign)
Ha : p > .9  (raised after campaign)
Assumptions: random sample. n = 400. Population (city) > 4000 homes (10n rule).  npo= 400·.9 = 360, nqo= 400·.1400 = 40 so success/failure rule met.  Sample proportion p-hat modeled by Normal OK.
Computations:  n = 400, successes x = 376. p-hat = .940.   SD(p-hat ) = sqrt(.9 ·.1/400)= .015 since we're assuming Hotrue.  z=(.940-.9)/.015 = .04/.015 = 2.67.   One-sided alternative, evidence for it is to the right.  So P-value is proportion in the tail above z = 2.67; P=P(z> 2.67) = 1 - .9962 = .0038 ~ 0.4%
--Conclusions:  P-value is quite low, <.01 (but not <.001):  Strong evidence that the city proportion has risen.  (Reject Ho)
Problems?  We don't know that our city conformed to the American proportion before the campaign.  Would be better to have taken a sample before the campaign and another after, and compared (Sec. 8.2).
Start here Friday
Plan ahead for desired CI Margin of Error (pp. 545-6)  Decide on desired Margin of error m,  and C (thus z*).  Guesstimate a p*. (p*=1/2 requires the largest sample size--safest.  Remember  you showed that p(1-p) had its maximum at p = .5).
Solve margin-of-error equation for n.   (Some results,   p. 546.  If you wonder why so many polls feature a 3% margin of error and a sample size of 1060 or so, this tells you why. )

Next:  Sec. 8.2, Comparing 2 proportions (from independent samples)



Sievers home  Math251-Fall05/Dayps39.htm    8pm   11/29/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.