### MATH 251, P&SI, Fall 2011, Fri. Nov. 5, Day 30.After class...correction shoebox results added

HW Day 30 (Re)Read Sec. 5.2. I have covered pp. 312-19, mean and s.d. pp. 319-21, proportions pp. 321-3. Now  the binomial formula pp. 328-9, Memorize the binomial formula and the mean and s.d. for a binomial distribution. Then pp. 323-327 (proportions and normal approx.)  Skip "continuity correction" pp 327-8.
Next:  Read about Weibull distributions, pp. 330-2 . And Ch 6, Inference.
Read 6.1;  I'll go through "Choosing the sample size" (352-3) in class, but read now, Cautions, p. 354-5 carefully. I won't lecture on it or outline it on the webpage.
 Hand in:  The Formula p.334, 5.53 blood types (Calculate the values using the formula, check with the table) p. 336, 5.64 scuba trip marketing (note, b is NOT answerable with the Binomial distribution.) 5.66 binomial coefficient facts.  (Not sure they're true?  Check with your computations in #5.53 for plausibility. Remember in any binomial situation you can reverse the roles of Success and Failure.  Note b and c together give nC1 = n. p. 338, 5.74 common names  (What are your chances if you're taking a statistics course? Lower...) .Postpone the rest .Normal approximation p. 335, 5.59 alcohol study (this reasoning is that of significance tests, coming up) 5.65 proportion of sample survey 5.25 Mult. choice tests 5.54,  5.56, Gallup poll: effect of n on margin of error 5.55, Gallup poll: effect of pop. p on spread of phat? .. Sec. 6.1, confidence intervals, p. 357ff., mostly. p. 347, 6.4 80% CI, with Applet:  Confidence Interval 6.30 Modified: Do 10 sets of 50 at C=.95, not 30.  Complete the problem.  Use Applet:  Confidence Interval p. 346, 6.1, 2, 3 intro to idea of calculated CI, using 95% "rule". using formula:  ( ConfidenceInterval.xls Excel spreadsheet automates it-to check answers.) P. 351, 6.5, 6.6  anxiety C 6.22, 6.23 apartment rents 6.11, 6.12 changes of m with n and C 6.19  all students/ in major Assume "your school" is the authors'--Purdue, with 30,000 students. Cautions (pp. 354-5)  6.9 Sally Mae, margin of error 6.35ab five intervals (Notice, these questions are about a binomial distribution, B(5, .95)) 6.33 radio poll Read, discuss  .Postpone the rest.. 6.21 sweet 6.27 Job satisfaction, Gallup, margin of error Optional  ..Postpone the rest. (more practice)  6.15 sports 6.17, 18 biomarkers
You took 4 Numbers (random sample) from the Birkenstock box:  Found mean xbar.  Found xbar + .841.  This is your interval estimate of the unknown mean of the box's population.  ("margin of error" is .841) (Returned your numbers afterward.)
If your xbar = 8.0       7.159|_____________8.0_____________|8.841

Quiz (Probably Wednesday (not before)): Knowing and using:
Mean and st. dev. for X-bar from SRS of size n
(Summary p. 308)
Normal?  Central limit theorem: says yes for all parent distributions, approximately, for n large.
If population(s) normal
to start with, linear combinations stay normal (including X-bar), mean and s.d. follow algebra rules (as last quiz.)
Binomial distribution formula (p. 329 bottom)
mean and st. dev. for Binomial:  X (count), p-hat (proportion)
(Summary p.332)

Homework questions? Day 29

General issue from Central Limit Th. (don't remember saying this...) What if the population is not 10 to 20 times the sample size?  The real s.d. of the x-bars will be narrower than the sigma over square root of n formula.   You may not "get to" normal as a shape.  Sample of 4 grades from a population of 10 All possible samples. ...

Binomial:  Tell me mean & s.d. of X, p-hat.
Some sample p-hats;
(n = 25, p = .6, .2)  Compare with your graphs of distributions.

Growth of binomial X as n increasesMean grows like n, but spread grows like sqrt(n).
Sample Proportion as n increases: Mean stays at p, but spread shrinks like 1/sqrt(n).
Applet
"Normal approximation to Binomial", Excel graph of Binomial

Binomial:  formula,
Did a bunch of HW Friday, only new material was Binomial formula.
Monday,

Normal approximation
Day 29

Ch. 6: Introduction to Statistical Inference:
Requires: Random sample or Randomized experiment.  (Our theory: Simple Random Sample usually)
First example:  Use sample mean xbar  to "estimate" (unknown) population mean µ

Mean of 4 grades estimates population mean of all 10 ("known"= 69.4) Population dist, Dist. of all sample means. Some samples
E.g. 69.75,  64.25,  73.5    (Each is a "point estimate")

• "xbar IS µ"   Never true exactly
• "xbar is close to µ"  True for most xbars, depending on "close", and sample size n.
• "xbar is probably close to µ" ["probably"?? It is or it isn't. We have "confidence" ours is a close one]
• "xbars are usually close to µ" True.
• If we have only one sample -- one xbar, we have NO IDEA if ours is one of the "close" ones.
• Quantify "usually" and "close."

• Fall 2002: 33% (16 of 48) xbars  recorded were within 1 of µ. (between 68.4 and 70.4).
83% (40 of 48) xbars  recorded were within 4 of µ. (between 65.4 and 73.4).
94% (45 of 48) xbars  recorded were within 5 of µ. (between 64.4 and 74.4).
Note tradeoff between "close"(accuracy) and "usually" (confidence)
Confidence intervals (sec. 6.1) This is one of the two big ideas of inference that we will study.  Chapter 7 will extend this simple idealized situation, so this needs to be firmly in place.

Interval estimate:  xbar + margin of error (fudge factor)  estimates population mean µ (69.4)

69.75 + 1:   "µ is between 65.75 and 73.75"  True
69.75 + 4:   "µ is between 65.75 and 73.75"  True
73.5 + 4:    "µ is between 69.5 and 77.5"  False
73.5 + 5:    "µ is between 68.5 and 78.5"  True
64.25 + 4:   "µ is between 60.25 and 68.25"  False
64.25 + 5:   "µ is between 59.25 and 69.25"  False

A level C  Confidence interval
estimate of a(n unknown) population parameter: (p. 347)

• an  Interval computed from the sample data,
• by a method  that has probability C of producing an interval that will capture the true, unknown, parameter.

• (Many repeated samples would capture the unknown paramater  C proportion of the time. Unfortunately, we only get one sample, as a rule.)

Birkenstock Shoebox: You're constructing a
Confidence Interval of the form  estimate + margin-of-error  for the mean µ with Confidence level C: (p.346)
Does yours capture the real shoebox mean?

Applet:  Confidence Interval.  Many sample means: shows it's not the individual interval that C describes, but the Method.

Formula:
Confidence Interval of the form  estimate + margin-of-error  for the mean µ with Confidence level C: (p.349)
• the estimate  is xbar
• margin of error m is :  z* times Standard deviation of sample mean
• z* from Normal table.  Probability C is between -z* and +z*.
(Table A, or Table D (back flyleaf), t dist. bottom row)
Standard deviation of sample mean:  Sigma /sqrt(n)
Must know standard deviation of population!
or, If sample size is large, use s (standard deviation calculated from sample)

Example:  Sample of size 9 from a Normal population with unknown mean and pop. s.d. sigma = 6,  xbar = 12.
Find a 90% CI estimate for the unknown mean µ:  z* = 1.645,  (sigma)/ sqrt(n) = 6/3=2, so m = 3.290;
CI is 12 + 3.290, or  8.710 to 15.290.

Got to here Monday
The Birkenstock box contains numbers from a normally distributed population, with population standard deviation 2.
You each constructed a 60% confidence interval for the unknown mean: (proof:)
n = 4.
Standard deviation of sample mean = 2/sqrt(4) = 2/2 = 1
z* for C = 60% is .841, so margin of error m is .841 times 1= .841.
To get the z* for C = 60% from the normal table, note that this is the middle 60%, which leaves 40% to be split between the 2 tails.  So 20% above z*,  and 80% below.  Go into the body of table A, find 80%= .8000 is between values .7995 and .8023, closer to .7995.  The z value with .7995 below it is .84.  Table D gives it more precisely as  .841.
How many people captured the true mean?

Previous classes,11/20 = 55% ,  22/29= 76%.   9/18 = 50% , 11/20 = 55%,  15/22= 68%,  16/24 = 67%, 16/18 = 89%,  7/13 = 54%, 8/16 = 50%, 7/14 = 50%, 5/10 = 50%,  11/14=79%,. 8/17 = 47%, 10/16 = 63%, 12/19 = 63%. Combined 168/270 = 62.2%. This year's classes, (now) 24/31 = 77%, Combined 192/301 = 63.8%
Quite variable for small samples, but settling down?)

Graphed results, (151 combined with 251):  Fall '07 CI's
Sp. '08 CI's. Fall'08 CI's.   Sp. '09 CI's. Sp. '10 CI's.Compare with Applet:  Confidence intervals.  This year's  shows more uniformity (some  who were "out" didn't graph!)

Extension:  If n is  large, we can use the formula even if population is not normal.
(Because only the distribution of Xbar is used, and Xbar is normal!  Central Limit Theorem)