| Hand in Wednesday (All
D&V)
Ch. 11 pp. 219-20 1 Coin flip 4 Games Ch.12 p.238ff.
23 Sampling methods
|
Read,
to discuss |
Optional |
Chapter 11 (transition)
Understanding Randomness
"Random" An event is a "random"
event if we know what outcomes could happen, but not which particular
values did or will happen. I like to use "chance",
or "probabilistic."
"Random numbers" Lists
of digits (or sets of digits of equal length, like 135, 238, 099) that
are not only "random" (unpredictable) but equally likely.
Generating a series of equally
likely numbers: List
the possible values you want, then:
--Roll a die(1,2,3,4,5,6
is usual). Repeat.
--Pick a card from a well-shuffled
deck (Mark the cards with as many values as you need) Replace and repeat
.
--Put labeled slips, or balls,
in a "hat". Stir, pick. Replace, repeat.
--Go to Random.com (from radio
static)
--Ask a computer program
(SPSS--we'll learn).
--Look at p. A-49 (Table of Random
Digits). Close your eyes and put your pencil point on the page.
Start there. Read digits, as many as you need.
(The last two are "pseudo-random"--look
random but in fact are computed in a predictable way. Good
enough.)
--Pick a digit yourself? Look at the digits you picked. (stemplot) **
Compare with digits from a
Table
of random digits (Simulates rolling a die with 0,1,....9, over and
over...) (p.A49)
Every digit, every sequence of digits, is equally
likely to be "next" in any direction.
We see much variability
in the results from the table--but no consistent patterns.
The more data we pool, the closer we get to "equally likely" -- 10% of
each of the 10 digits.
In class: in pairs, tally 25 random numbers from table A-49 (each start on a different line) Hand in at end of class.
People are very bad at making
choices that would qualify as "equally likely".
So we need
to use an impersonal mechanism.
Rest of Ch. 11 (simulating random
events) --postpone to before Ch. 14.
To now: Get a data
set to tell us its secrets -- (Exploratory) Data Analysis. Only about
the set itself.
Organize, Display,
Summarize, Describe (with Models--Normal distribution, Straight
line Regression)
Sample Surveys (D&VCh.
12 )
Goal: Use data
to tell us something about the larger world that it came from (Statistical
Inference--later)
Idea 1: Population:
the whole group we'd like to learn something about. (Almost always
too big, too expensive to get at.)
Sample:
A (much?) smaller group from the population--which we actually can examine.
Hope:
The Sample will be representative of the Population.
"Bias": a systematic failure of the Sample to represent the
Population.
Idea 2: Use
Randomization
to pick us a representative sample (almost always;
and we can quantify "almost always")
Protects us from bias; allows
us to do statistical inference.
Idea 3: Larger
sample will be more representative. But not because
it's a larger proportion of the whole; just because it's a larger number
of individuals. Spoonful samples a (stirred) cauldron of soup
as well as it does a small pot. Toothpick doesn't do as well.
(If the sample is more than about 10% of the population, the proportion
issue begins to matter. )
Census: (try to) get whole population! (pp. 225-6) Expensive, difficult, impractical (cf. "destructive testing." Homeless. Illegal immigrants. "Snowbirds."). May be less accurate than careful sample.
We know that a sample from a population will not exactly represent the population. If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population. Knowing the pattern will be as good as we can do. Ch.14 on.
Sample Chosen
from a Population
(varies)
(fixed, but usually unknown)
Calculate
Numerical summary: Statistic
(Latin)
Parameter(Greek
letter) (D&Vp227)
Examples:
Sample mean xbar Population
mean mu (µ)
Sample st. dev. s Pop.
standard dev. sigma
Sample median
Pop.
median
Sample proportion p-hat Pop.
proportion p
Sample line height y-hat Pop.
regression line height y
Sample line slope b1
Pop. regression line slope beta1
The actual value of the Statistic will vary,
depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.
We hope it is close to the parameter. If we choose simple random
samples, we can understand the pattern of values the statistic can
take.
Start here Wednesday
after break:
Simple Random Sample (SRS)
of size
n: n
individuals
chosen in such a way that every possible set of n
individuals has an equal chance of being chosen.
HOW? A chance mechanism: Cards, dice, computer program
or Table of random digits.
Need a list of the population, labeled, usually with numbers.
Different methods: (D&V p. 229-30): 1/10 of population; list random
digits next to population list, take all "4's" .
SPSS will take a random sample from a population listed in a
data file. Using SPSS to Sample.
Get
Handout.
"List of the Population"?? fat chance.
Sampling frame: the list of individuals
from the population that you actually choose the sample from.
May differ a little (or a lot!) from the population you desire to
study.
Bias in sampling:
any systematic failure of a sample (its method) to represent its
population. (E.g. sampling frame excludes "different" part of population.)
Other (good) sampling designs: Cheaper, or avoid a problem...
--To make sure important groups are represented proportionately:
Stratified
random sample: Divide into subpopulations (strata) with different
characteristics (M/F, income strata: 5ths, etc.) Decide how many
from each, then random sample within each stratum.
--Save cost, time: Cluster sample:
Choose (randomly) clusters, then within each cluster take a
sample. (sometimes all). (Door
to door survey: pick city block at random, then 3 households at random
from the block (or all households on the block) Cluster
was rewritten since day 19.
--Multistage sampling: Combine
several design types in sequence to get to final sample.
--Systematic sample:
For a 1-in-6 systematic sample from a phonebook: Roll a die to get
the number for the first individual. Suppose the 5th. Then
choose every 6th after that (11th, 17th, 21st.....). If there's no
relationship between variables we're looking at and "every 6th", it's ok.
(Every individual is equally likely, but not every sample:
so Not a simple random sample. All that's randomized is the
starting point.) (But remember Playground; jockeying to be with friend
after countoff.)
If time, look at #21, p.240
Sources of bias, next
| Sievers home | Math151-Fall05/Dayf19.htm | 3pm | 10/7/05 |