| Hand in Friday (All
D&V) Ch. 11 pp. 219-20 1 Coin flip 4 Games Ch.12 p.238ff. Start now, hand in after break: 23 Sampling methods 19 Accounting |
Read,
to discuss |
Optional |
Chapter 11
(transition)
Understanding Randomness
"Random" An event is a
"random"
event if we know what outcomes could happen, but not which
particular
values did or will happen. I like to use
"chance",
or "probabilistic."
"Random numbers"
Lists
of digits (or sets of digits of equal length, like 135, 238, 099) that
are not only "random" (unpredictable) but equally likely.
Generating a series of
equally
likely numbers: List
the possible values you want, then:
--Roll a die(1,2,3,4,5,6
is usual). Repeat.
--Pick a card from a
well-shuffled
deck (Mark the cards with as many values as you need) Replace and
repeat
.
--Put labeled slips, or balls,
in a "hat". Stir, pick. Replace, repeat.
--Go to Random.com (from radio
static)
--Ask a computer program
(SPSS--we'll learn: Using SPSS
to Sample
Get Handout.).
--Look at p. A-49 (Table of
Random
Digits). Close your eyes and put your pencil point on the
page.
Start there. Read digits, as many as you need.
(The last two are
"pseudo-random"--look
random but in fact are computed in a predictable way.
Good
enough.)
--Pick a digit yourself? Look at the digits you picked. (stemplot) **
Compare with digits from
a
Table
of random digits (Simulates rolling a die with 0,1,....9, over and
over...) (p.A49)
Every digit, every sequence of digits, is equally
likely to be "next" in any direction.
We see much variability
in the results from the table--but no consistent patterns.
The more data we pool, the closer we get to "equally likely" -- 10% of
each of the 10 digits.
In class: in pairs, tally 25 random numbers from table A-49 (each start on a different line) Hand in at end of class.
People are very bad at
making
choices that would qualify as "equally likely".
So we need
to use an impersonal mechanism.
Rest of Ch. 11 (simulating
random
events) --postpone to before Ch. 14.
To now: Get a
data
set to tell us its secrets -- (Exploratory) Data Analysis. Only
about
the set itself.
Organize,
Display,
Summarize, Describe (with Models--Normal distribution,
Straight
line Regression)
Sample Surveys (D&VCh.
12 )
Goal: Use
data
to tell us something about the larger world that it came from (Statistical
Inference--later)
Idea 1: Population:
the whole group we'd like to learn something about. (Almost
always
too big, too expensive to get at.)
Sample:
A (much?) smaller group from the population--which we actually can
examine.
Hope:
The Sample will be representative of the Population.
"Bias": a systematic failure of the Sample
(sampling process) to represent
the
Population.
Idea 2: Use
Randomization
to pick us a representative sample (almost
always representative;
and we can quantify "almost always")
Protects us from bias;
allows
us to do statistical inference.
Idea 3: Larger
sample will be more representative. But not
because
it's a larger proportion of the whole; just because it's a larger
number
of individuals. Spoonful samples a (stirred) cauldron of soup
as well as it does a small pot. Toothpick doesn't do as
well.
(If the sample is more than about 10% of the population, the proportion
issue begins to matter. )
Census: (try to) get whole population! (pp. 225-6) Expensive, difficult, impractical (cf. "destructive testing." Homeless. Illegal immigrants. "Snowbirds."). May be less accurate than careful sample.
We know that a sample from a population will not exactly represent the population. If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population. Knowing the pattern will be as good as we can do. Ch.14 on.
Sample Chosen
from a Population
(varies)
(fixed, but usually unknown)
Calculate
Numerical summary: Statistic
(Latin)
Parameter(Greek
letter) (D&Vp227)
Examples:
Sample mean xbar Population
mean mu (µ)
Sample st. dev. s Pop.
standard dev. sigma
Sample median Pop.
median
Sample proportion p-hat Pop.
proportion p
Sample line height y-hat Pop.
regression line height y
Sample line slope b1
Pop. regression line slope beta1
The actual value of the Statistic will vary,
depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.
We hope it is close to the parameter. If we choose simple
random
samples, we can understand the pattern of values the statistic can
take.
Simple Random Sample (SRS)
of size
n: n
individuals
chosen in such a way that every possible set of n
individuals has an equal chance of being chosen.
HOW? A chance mechanism: Cards, dice, computer program
or Table of random digits.
Need a list of the population, labeled, usually with
numbers.
Different methods: (D&V p. 229-30): 1/10 of population; list random
digits next to population list, take all "4's" .
SPSS will take a random sample from a population listed in a
data file. Using SPSS to Sample
Get Handout.
Start here Friday:
"List of the Population"?? fat chance.
Sampling frame: the list of
individuals
from the population that you actually choose the sample from.
May differ a little (or a lot!) from the population you desire
to
study.
Bias in
sampling:
any systematic failure of a sample (its method) to represent
its
population. (E.g. sampling frame excludes "different" part of
population.)
Other (good) sampling designs: Cheaper, or
avoid a
problem...
--To make sure important groups are represented proportionately: Stratified
random sample: Divide into subpopulations (strata) with
different
characteristics (M/F, income strata: 5ths, etc.) Decide how
many
from each, then random sample within each stratum.
--Save cost, time: Cluster sample:
Choose (randomly) clusters, then within each cluster take a
sample. (sometimes all). (Door
to door survey: pick city block at random, then 3 households at random
from the block (or all households on the block)
--Multistage sampling: Combine
several design types in sequence to get to final sample.
--Systematic sample:
For a 1-in-6 systematic sample from a phonebook: Roll a die to
get
the number for the first individual. Suppose the 5th. Then
choose every 6th after that (11th, 17th, 21st.....). If there's
no
relationship between variables we're looking at and "every 6th", it's
ok.
(Every individual is equally likely, but not every sample:
so Not a simple random sample. All that's randomized is
the
starting point.) (But remember Playground; jockeying to be with
friend
after countoff.)
If time, look at #21, p.240
Using Random Number Table to sample
(p. A-49) Example: Ch. 11 pp. 216-7 The
Step-by-step
simulation effectively takes a random sample of size 3 from the 57
students.
Every digit, every sequence of digits, is equally
likely to be "next" in any direction. (Divisions
into
5 is just for legibilty)
To use: label everyone in the population
with a number.
Important: Every labeling number needs the
same
number of digits.
To label 9 people, use the labels 1,2,3,....9
(1-digit
chunks)
To label 15 people, use the labels 01, 02, ...10,
11, ...15 (2-digit chunks)
To label 125 people, use the labels 001, 002, ...
124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading
across in that size chunk. Get n eligible
numbers (discard repeats)
For example : 07511
88915
41267 16853 84569 79367 ..
From 9 people, a sample n = 5: 0,7,
5,
1,
1, 8, 8, 9,
1, 5, 4, (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample 07,
51, 18, 89, 15,
41, 26, 71, 68, 53, 84, 56, 97, 93, 67.... keep reading,
go to next line (or back to top line) if you need
more. Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075,
118,
891, 541, 267, 168, 538, 456, 979, 367...keep reading.
Individuals
75, 118, ...
Sources of bias, next
| Sievers home | Math151-Sp06/Daysp20.htm | 2:40pm | 10/15/06 |