| Hand in (All D&V)
Ch. 11 pp. 219-20 1 Coin flip 4 Games A. Making & Examining SRS's with SPSS. Do the assignment
on the handout Using SPSS to find a Simple Random
Sample.
1,4,6,7,8, 9 Do parts a,b,c,d, of these and save
your work for next time, when I'll assign parts e, f.
|
Read,
to discuss |
Optional
|
Chapter 11 (transition)
Understanding Randomness
"Random" An event is a "random"
event if we know what outcomes could happen, but not which particular
values did or will happen. I like to use "chance",
or "probabilistic."
"Random numbers" Lists
of digits (or sets of digits of equal length, like 135, 238, 099) that
are not only "random" (unpredictable) but equally likely.
Generating a series of equally
likely numbers: List
the possible values you want, then:
--Roll a die(1,2,3,4,5,6
is usual). Repeat.
--Pick a card from a well-shuffled
deck (Mark the cards with as many values as you need) Replace and repeat
.
--Put labeled slips, or balls,
in a "hat". Stir, pick. Replace, repeat.
--Go to Random.com (from radio
static)
--Ask a computer program
(SPSS--we'll learn).
--Look at p. A-49 (Table of Random
Digits). Close your eyes and put your pencil point on the page.
Start there. Read digits, as many as you need.
(The last two are "pseudo-random"--look
random but in fact are computed in a predictable way. Good
enough.)
--Pick a digit yourself? Look at the digits you picked. (stemplot) **
Compare with digits from a
Table of random digits (Simulates rolling a die with
0,1,....9, over and over...) (p.A49)
Every digit, every sequence of digits, is equally
likely to be "next" in any direction.
We see much variability
in the results from the table--but no consistent patterns.
The more data we pool, the closer we get to "equally likely" -- 10% of
each of the 10 digits.
In class: in pairs, tally 25 random numbers from table A-49 (each start on a different line).
People are very bad at making
choices that would qualify as "equally likely".
So we need
to use an impersonal mechanism.
Rest of Ch. 11 (simulating random
events) --postpone to before Ch. 14.
To now: Get a data
set to tell us its secrets -- (Exploratory) Data Analysis. Only about
the set itself.
Organize, Display,
Summarize, Describe (with Models--Normal distribution, Straight
line Regression)
Sample Surveys (D&VCh.
12 )
Goal: Use data
to tell us something about the larger world that it came from (Statistical
Inference--later)
Idea 1: Population:
the whole group we'd like to learn something about. (Almost always
too big, too expensive to get at.)
Sample:
A (much?) smaller group from the population--which we actually can examine.
Hope: The Sample will be representative of the Population.
Bias: a systematic failure of the Sample to represent the
Population.
Idea 2: Use
Randomization to pick us a representative sample (almost
always; and we can quantify "almost always")
Protects us from bias; allows
us to do statistical inference.
Idea 3: Larger
sample will be more representative. But not because
it's a larger proportion of the whole; just because it's a larger number
of individuals. Spoonful samples a (stirred) cauldron of soup
as well as it does a small pot. Toothpick doesn't do as well.
(If the sample is more than about 10% of the population, the proportion
issue begins to matter. )
Census: (try to) get whole population! (pp. 225-6) Expensive, difficult, impractical (cf. "destructive testing." Homeless. Illegal immigrants. "Snowbirds."). May be less accurate than careful sample.
We know that a sample from a population will not exactly represent the population. If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population. Knowing the pattern will be as good as we can do. Ch.14 on.
Sample Chosen
from a Population
(varies)
(fixed, but usually unknown)
Calculate
Numerical summary: Statistic
(Latin)
Parameter(Greek
letter) (D&Vp227)
Examples:
Sample mean xbar Population
mean mu (µ)
Sample st. dev. s Pop.
standard dev. sigma
Sample median
Pop. median
Sample proportion p-hat Pop.
proportion p
Sample line height y-hat Pop.
regression line height y
Sample line slope b1
Pop. regression line slope beta1
The actual value of the Statistic will vary,
depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.
We hope it is close to the parameter. If we choose simple random
samples, we can understand the pattern of values the statistic can
take.
Simple Random Sample (SRS)
of size
n: n
individuals
chosen in such a way that every possible set of n
individuals has an equal chance of being chosen.
HOW? A chance mechanism: Cards, dice, computer program
or Table of random digits.
A list of the population, labeled, usually with numbers.
Different methods: (D&V p. 229-30): 1/10 of population; list random
digits next to population list, take all "4's" .
SPSS will take a random sample from a population listed in a
data file. Using SPSS to Sample. Get
Handout.
"List of the Population"?? fat chance.
Sampling frame: the list of individuals
from the population that you actually choose the sample from.
May differ a little (or a lot!) from the population you desire to
study.
Bias in sampling:
any systematic failure of a sample (or its method) to represent
its population. (E.g. sampling frame excludes "different" part of
population.)
Start here Wednesday
Other (good) sampling designs: Cheaper, or avoid a problem...
To make sure important groups are represented proportionately:
Stratified random sample: Divide
into subpopulations (strata) with different characteristics (M/F, income
strata: 5ths, etc.) Decide how many from each, then random sample
within each stratum.
Save cost, time: Cluster sample:
Choose (randomly) clusters, then within each cluster take sample.
(Door to door survey: pick city block at random, then some number of
households in the block.)
Multistage sampling: Combine
several design types in sequence to get to final sample.
Systematic sample:
For a 1-in-6 systematic sample from a phonebook: Roll a die to get
the number for the first individual. Suppose the 5th. Then
choose every 6th after that (11th, 17th, 21st.....). If there's no
relationshop between variables we're looking at and "every 6th", it's ok.
(Every individual is equally likely, but not every sample:
so Not a simple random sample. All that's randomized is the
starting point.) (But remember Playground; jockeying to be with friend
after countoff.)
If time, look at #21, p.240
Sources of bias, next
| Sievers home | Math151-Sp05/Days19.htm | 3pm | 3/14/05 |