Math 151 , Spring 2005, Day 29 Wed. April 13Hit reloadAfter Class

Day 29(Wed. Apr.  13): Reading: Ch. 18, p. 341on for means. Please read: Ch. 19, Confidence Intervals for Proportions. ActivStats 18-1 is extremely good for the concept of the sampling distribution of the proportion, and 18-2 is good for the Central Limit Theorem. Activstats 19 does confidence intervals for means, not proportions, so not useful here.
Hand in (All D&V)
 Sampling dist. of means, cont. p. 350
Add:B: "Normal" body temperature 98.6 deg. on average.  (Assume this is true.) 
 Assume normal distribution, & s.d.among many people is 0.6.  What is the--
   Probability that one (random) healthy individual's normal temperature is above 98.8? 
   Probability that the mean of a sample of 4 is above 98.8? 
   Probability that the mean of a sample of 36 is above 98.8? 
   Probability that the mean of a sample of 100 is above 98.8? 
 Note as n grows, SD shrinks, but only by square root of n
21 c, d Pregnancy
23 Pregnancy skewed
22 Rainfall
24 At work
28 Potato chip bags

Postpone: Ch. 19p. 386
5, 6 Conclusions. Do these with the "Don't misstate..." section, pp. 361-2.
9 Cars
A.   Use the Normal table to find z* for a 99% CI (This is the z* for which 99% of the area is between -z* and +z*).  Find z* for a 90% CI.  Check your results by comparing with the corresponding results in Table T p. A-53.
11 Ghosts
13 Teenage drivers
21 Rickets
3, 4 Conditions
16 Local news

Read,
  to 
discuss 
Optional 
If you haven't done the computation: A) If you got your 30 slips of paper from the green shoebox and counted how many 1's you have--Calculate the proportion p-hat for your sample.  Also plug in p-hat (instead of p) to the formula for SD(p-hat): so if you got 12/30, p-hat = .4.  "q-hat" = 1 - p-hat = 1-.4 = .6.   SD formula: square root of (p ·q/n)= square root of (.4 ·.6/30) = square root of .008 = .089  Bring these estimates of p and SD(p-hat) to class

Homework questions? Day 28

Sampling DistributionsCh. 18
Take a Sample from a population. SRS!.
 Imagine (simulate) what would happen if you took "all possible" SRS's.   For each sample, calculate a statistic.
    Your coin flips:  p-hats, n = 25, p = 1/2.    P(phat >.60) = P(phat > .60) = 16% using the normal model.
    62 SRS's. Would expect (16% of 62=) 10.9 to be "above" .60.   9 were at .6, 8 were > .6.  More SRS's would probably do somewhat better....

Sampling distribution of the mean, y-bar:    Day 28
Distribution of all means from all possible random samples of size n from a population.
   Need Random Sample, Independence (in particular, for sampling without replacement, n < 10% of population.)
Population has mean µ and standard deviation sigma. Whatever the shape of the population distribution  that we draw the sample from, 
IF the population is Normal,the sampling distribution of the y-bars is Normal.
"The Central Limit Theorem (CLT) In any case, for "large" n, the sampling distribution of the y-bars is Approximately Normal.
SPSS simulation: average of  spinners which can land on any number between 0 and 1.
How large is "large"?  How approximate is "approximate"?
    If the population was close to normal, n doesn't need to be very large.
    Even if the population is pretty weird, n=25 gives a pretty good approximation to normal.  But if we have really Big outliers or really BAD skewness, many need much more.
Pictures on overhead.    Moore applet, Central Limit theorem

START HERE FRIDAY
Next job:  We usually DON'T KNOW the population parameter; use the statistic from our sample to ESTIMATE it.
YOU don't know the real proportion of 1's in the green shoebox.  Each of you has an estimate.  In "real life" you won't have a bunch of classmates with other samples; you'll only have your own. (Also, in this case, I know the real proportion. Not so in "real life")  How "good" is your estimate of the real p?

You know how the sampling distributions of sample proportions (and sample means) behave; we'll use that.  But we want to know how much they are spread, and for that we need the parameter p (and q) for proportions, (and the parameter sigma for means)
And we don't know those!  So we use the sample statistics p-hat and s in place of them.

Standard Error (p. 347):  When we estimate the standard deviation of a sampling distribution of a statistic, using the data from our sample, we call that the Standard Error  of the statistic.
Confidence Interval Estimate of p: (Chapter 19)
p-hatis your best guess at p, but it's bound to be wrong, almost always.  (see p. 356)
Make an interval estimate of p, by adding and subtracting a Margin of Error (ME)
   For instance, 39% + 2%.
Say "This interval contains (captures) the true proportion p."  Wrong.  It may or may not, and you have no way of knowing.

What we can do  is use a rule to construct the ME so that intervals made using the rule will contain p a known proportion of the time.  The "known proportion" is our confidence level.  If our rule makes ME's that capture p 95% of the time, we've made 95% confidence intervals.  "I have 95% confidence that this interval captures the true proportion p"

A level C confidence interval for a parameter  is an interval, usually of the form estimate + margin of error,
  found from data, in such a way that
C% of all random samples will yield intervals that capture the true parameter value.

Rule for ME:   ME = z* SE(p-hat), where z* is the "critical value" from the Standard Normal table that has C% of the area in the symmetric central interval between -z* and +z*.
Level C confidence interval for population proportion p:  "One -proportion  z-interval"

(Why it works:  later.)
Example:  You drew a sample of size n =30. p is the (unknown) proportion of 1's in the shoebox. You found the sample  proportion, and you calculated the SE for the sample proportion.  Use z* =1.  Then C is about 68%.
Calculate:  if I got 12/30, p-hat = .400.  "q-hat" = 1 - p-hat = 1-.4 = .6.   SD formula: square root of (p ·q/n)= square root of (.4 ·.6/30) = square root of .008 = .089 = SE(p-hat).
68% Confidence Interval:  .400 + .089, or  (.311, .489).
Whose intervals captured the real proportion?  (Expect roughly 68% of you to do so.)

Usually, want higher Confidence Level:  90%, 95%, 99%....
     For 95%:  z* = (approximately 2) = 1.96  (How?  95% in the middle.  2.5% in each tail.  .0250 to the left of ?? -1.96.)
           && Shortcut: Table T, p. A-53, bottom two rows.  ("infinity" row is the Standard Normal values)
      z*·SE(p-hat) = 1.96·.089 = .174  95% Confidence Interval:  .400 + .174, or  (.226, .574).

Note Trade-off:  Higher Confidence ---Wider interval (bigger ME. Less "precision")

Assumptions/conditions:  Assumes Central Limit Theorem for proportions is appropriate.
  Independence:¿¿Data values shouldn't affect each other.   ¿¿ Randomization helps!   ¿¿n < 10% of population.
  Sample Size:  Expect at least 10 successes and 10 failures (rephrase of  np, nq > 10)

  Bias?  Here's why we studied bias in sampling.  Biases or other bad sampling methods can make our computations worthless! p. 363.


Sievers home  Math151-Sp05/Days29.htm  1:30pm 4/13/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.