| Hand in
(All D&V)
Sampling dist. of means, cont. p. 350 Add:B: "Normal" body temperature 98.6 deg. on average. (Assume this is true.) Assume normal distribution, & s.d.among many people is 0.6. What is the-- Probability that one (random) healthy individual's normal temperature is above 98.8? Probability that the mean of a sample of 4 is above 98.8? Probability that the mean of a sample of 36 is above 98.8? Probability that the mean of a sample of 100 is above 98.8? Note as n grows, SD shrinks, but only by square root of n! 21 c, d Pregnancy 23 Pregnancy skewed 22 Rainfall 24 At work 28 Potato chip bags Postpone: Ch. 19p. 386
|
Read,
to discuss |
Optional |
Homework questions? Day 28
Sampling DistributionsCh.
18
Take a Sample from a population. SRS!.
Imagine (simulate) what would happen if you
took "all possible" SRS's. For each sample, calculate
a
statistic.
Your coin flips: p-hats, n = 25, p
= 1/2. P(phat >.60) = P(phat > .60) =
16% using
the normal model.
62 SRS's. Would expect (16% of 62=) 10.9 to be
"above"
.60. 9 were at .6, 8 were > .6. More SRS's would
probably
do somewhat better....
Sampling distribution of the
mean,
y-bar: Day 28
Distribution of all means from all possible random samples of size
n from a population.
Need Random Sample, Independence (in particular,
for sampling without replacement, n < 10% of population.)
Population has mean µ and
standard deviation
.
Whatever the shape of the population
distribution that we draw the sample from,
IF the population is Normal,the
sampling distribution of the y-bars is Normal.
The Central
Limit Theorem (CLT) In
any case, for "large" n, the sampling distribution of the y-bars is
Approximately Normal.
SPSS simulation: average of
spinners
which
can land on any number between 0 and 1.
How large is "large"? How approximate is
"approximate"?
If the population was close
to normal, n doesn't need to be very large.
Even if the population is
pretty weird, n=25 gives a pretty good approximation to
normal.
But if we have really Big outliers or really BAD skewness, many need
much
more.
Pictures on overhead.
Moore applet, Central Limit theorem
START HERE FRIDAY
Next job: We usually DON'T KNOW the population parameter;
use the statistic from our sample to ESTIMATE it.
YOU don't know the real proportion of 1's in the green shoebox.
Each of you has an estimate. In "real life" you won't have a
bunch
of classmates with other samples; you'll only have your own. (Also, in
this case, I know the real proportion. Not so in "real life") How
"good" is your estimate of the real p?
You know how the sampling distributions of sample proportions (and
sample
means) behave; we'll use that. But we want to know how much they
are spread, and for that we need the parameter p (and q) for
proportions,
(and the parameter
for means)
And we don't know those! So we use the sample
statistics
p-hat and s in place of them.
Standard Error (p. 347): When we estimate
the standard deviation of a sampling distribution of a statistic, using
the data from our sample, we call that the Standard Error
of the statistic.
Confidence Interval Estimate
of p: (Chapter 19)
is
your best guess at p, but it's bound to be wrong, almost
always.
(see p. 356)
Make an interval estimate of p, by adding and subtracting a
Margin of Error (ME)
For instance, 39% + 2%.
Say "This interval contains (captures) the true proportion p."
Wrong. It may or may not, and you have no way of knowing.
What we can do is use a rule to construct the ME so that intervals made using the rule will contain p a known proportion of the time. The "known proportion" is our confidence level. If our rule makes ME's that capture p 95% of the time, we've made 95% confidence intervals. "I have 95% confidence that this interval captures the true proportion p"
A level C confidence interval for a parameter is an interval,
usually of the form estimate + margin of error,
found from data, in such a way that
C% of all random samples will yield intervals that capture the true
parameter value.
Rule for ME: ME = z* SE(p-hat), where z* is the
"critical
value" from the Standard Normal table that has C% of the area in the
symmetric
central interval between -z* and +z*.
Level C confidence interval for population
proportion
p: "One -proportion z-interval"
(Why it works: later.)
Example: You drew a sample of size n =30. p
is the (unknown) proportion of 1's in the shoebox. You found the
sample
proportion, and you calculated the SE for the sample proportion.
Use z* =1. Then C is about 68%.
Calculate: if I got 12/30, p-hat = .400. "q-hat"
= 1 - p-hat = 1-.4 = .6. SD formula: square
root
of (p ·q/n)= square root of (.4 ·.6/30) =
square
root of .008 = .089 = SE(p-hat).
68% Confidence Interval: .400 + .089, or (.311,
.489).
Whose intervals captured the real proportion? (Expect roughly
68% of you to do so.)
Usually, want higher Confidence Level: 90%, 95%, 99%....
For 95%: z* = (approximately
2)
= 1.96 (How? 95% in the middle. 2.5% in each
tail. .0250 to the left of ?? -1.96.)
&&
Shortcut: Table T, p. A-53, bottom two rows. ("infinity"
row
is the Standard Normal values)
z*·SE(p-hat) = 1.96·.089
= .174 95% Confidence Interval: .400 + .174,
or
(.226, .574).
Note Trade-off: Higher Confidence ---Wider interval (bigger ME. Less "precision")
Assumptions/conditions: Assumes Central Limit Theorem
for
proportions is appropriate.
Independence:¿¿Data values shouldn't affect
each other. ¿¿ Randomization
helps!
¿¿n < 10% of population.
Sample Size: Expect at least 10 successes and 10
failures (rephrase of np, nq > 10)
Bias? Here's why we studied bias
in
sampling. Biases or other bad sampling methods can make our
computations
worthless! p. 363.
| Sievers home | Math151-Sp05/Days29.htm | 1:30pm | 4/13/05 |