Confounding: Two variables (explanatory
or lurking) are confounded when you can't sort out their effects
on a response variable.
--Used to be: coffee drinking and smoking--most
people did both, or neither...
______________________
Ch. 3.1 Designing Samples
>>Population: Entire group that we want information about
>>Sample: The part of the population we actually examine.
Hope: Sample will be representative
of the population.
(SAMPLING) BIAS: The design of a study is biased if
it systematically favors certain outcomes.
Pick a digit (from 0,1,2,3,4,5,6,7,8,9).
Write it down.
Some refinements:
*Sampling frame: Moore p. 179 problem 3.13: the group from which
the sample is actually chosen--as different from the "population"--the
group you want information about. The sampling frame is often, unfortunately,
smaller than the population. The sample is (usually
much) smaller than the sampling frame.
* "Chosen" sample may not turn out to be actual sample, if some individuals
don't respond--"Nonresponse", p. 178.
Non-probability samples:
Simple Random Sample (SRS) of size n:
n individuals chosen in such a way that every possible set
of n individuals has an equal chance
of being chosen.
HOW? A chance mechanism: Cards, dice, computer program, or
Table of random digits (Simulates rolling a die with 0,1,....9,
over and over...) (Table B, back flyleaf)
Every digit, every sequence of digits, is equally
likely to be "next" in any direction.
To use: label everyone in the population
with a number.
Important: Every labeling number needs the
same
number of digits.
To label 9 people, use the labels 1,2,3,....9 (1-digit
chunks)
To label 15 people, use the labels 01, 02, ...10,
11, ...15 (2-digit chunks)
To label 125 people, use the labels 001, 002, ...
124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading
across in that size chunk. Get n eligible
numbers (discard repeats)
Read Row 150: 07511 88915 41267
16853 84569 79367 ..
From 9 people, a sample n = 5: 0, 7,
5,
1,
1, 8, 8, 9,
1, 5, 4, (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample 07,
51, 18, 89, 15, 41, 26, 71, 68, 53,
84, 56, 97, 93, 67.... keep reading,
go to next line (or back to top line) if you need
more. Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075,
118,
891, 541, 267, 168, 538, 456, 979, 367...keep reading. Individuals
75, 118, ...
Why the same number of digits in each label? Each individual 3-digit chunk is as likely as any other 3-digit chunk. But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So 2 will come up more often than 12, but 02 will come up just as often as 12.
Why across? For consistency on
HW, go the way they say (so you get the answer in the book). In practice,
you can read up, down, backwards, as long as you decide beforehand, and
don't change in the middle of choosing the sample.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
We will focus on the mathematics of the SRS,
the most basic. In practice, more sophisticated sampling methods
may be preferred. The math needed to analyze their effects is beyond
our course.
With Wednesday's class!:
Here are some other probability samples:
Stratified Random Sample: population is
cut into natural segments ('strata'). A specific number of
individuals is chosen
from each stratum (within each stratum we
take a simple random sample). Advantage: Every stratum is represented
with a known proportion of the sample; a simple random sample might under-
or over-represent a stratum, by chance.
Multistage Sample: Useful when individuals are at the bottom of a sequence of categories: E.g. to chose a sample of college women, first select 10 colleges, at random, then from those colleges select 2 dorms at random, then from each dorm select 10 students to interview. Total sample = 200. Advantage: you only have to visit 10 colleges, 2 dorms in each. An SRS from the whole country, even if you could do it, might mean 200 colleges. (You can also mix this with stratification, for instance selecting the 10 colleges in a stratified way from large coed, small coed, womens,...)
Systematic Random Sample (p.184, problem
3.27) Using a list, to pick a sample of 1/20 of the list: First pick
a number at random from 1,2,....20. Suppose you get 8. The
8th individual in the list is the first one in the sample. Then take
every 20th individual after that, numbers 28, 48, 68,.... Advantage:
Easy to implement, avoids "clumps" that might occur with SRS.
- - - - - - -
Sources of bias,
even
in probability samples:
| Sampling: If you didn't do the asterisks
in ACT ch. 7, do them: good examples!
Look up in Moore and read about Stratified, Systematic Random Samples, Multistage Sample. Designing Experiments: Know from ACT ch. 11 (These are also all in Moore ch. 3.2) ACT p.11-1 rules of Exp. Design Activity 2, Randomized Comparative experiment; Placebo (Activity 3), Blinding & Double Blinding (Activity 4) p. 11-2 Treatment-Response, Experimental Units (Subjects) Factor/Level The pencil-reviews are good. |
HW assignment Day 19, Monday March 11,
Moore, from The Basic Practice of Statistics
Reading: Ch. 3 thru 3.1. Ahead in 3.2
| Hand in, all from Moore
Ch.3 Intro:
|
Read, to discuss (all Moore)
Ch.3 Intro:
- - - - - - - - - - - p.180 3.14 ring-no-answer
|
Optional
- - - - - - - - - - - - - - - - - - - - - - p. 3.24SRS
|
| Sievers home | Math151-Sp02/Day19.htm | 4pm | 3/11/02 |