Math 151 , Fall 2002, Wednesday Day 18, Oct. 9Hit reload to get most current versionAfter class

Pick a digit (from 0,1,2,3,4,5,6,7,8,9).  Write it down.
HW questions?
Cars.sav relevant to econ graduates problem.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
HW   Reading:  Ch. 3 thru 3.1.  Ahead in  3.2
Hand in Friday
Ch.3 Intro: 
p. 167, 3.1, 3.2, 3.3 exp, obs
= = = = = = = = = = = =
Sampling
 p. 170, 3.4employed women Also: What is the sampling frame? (Def. p. 179, #3.13)
 3.6 letters to Congress
- - - - - - - - - - - - - - - - 
p. 173 3.7 SRS
p. 207, 3.65 SRS
p. 184, 3.26 Random digits
- - - - - - - - - - - - - - - -
Postpone to next asst:
p. 185 3.30 survey questions
- - - - - - - - - - - - - - - - -
p. 181  3.16 bigger sample size
p.185 3.31 sampling error for men
= = = = = = = = = = = = = = 
Probability Samples (other):
 p. 176 3.11 stratified sample, accounts
3.12 multistage design, schoolkids
p. 184, 3.27 Systematic.
3.28 same chance for each.  SRS?
Read, to discuss 
Ch.3 Intro: 
p. 170, 3.5 pop, samp...
p.182, 3.17 obsn/exp
    3.18 novel--pop, samp.
= = = = = = 
Sampling
p. 183, 3.22 president
3.23black police
- - - - - - - - - - -

- - - - - - - - - - - 
Postpone to next asst:
p.180 3.14 ring-no-answer
3.15 2 campaign questions

Optional 
 
 
 

= = = =
 

- - - - - - - -

p. 3.24SRS
 
 
 
 
 

 

Activstats: Ch. 10, then 11.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Chapters 1 and 2 have covered analyzing data that was given to us--what it said about itself.
    Informally, develop guesses, suspicions, hypotheses about the world the data came from.
Ch. 3:  Producing Data:  Aim:  create data sets that will allow us to make inferences to a larger world than just the data we have.
       Observational Study:  Observes individuals, measures variables, does not influence the responses. (3.1)
                    Take Sample from a population, examine it,
                           hope it's representative so we can infer population is like sample.
                            (Not very useful for cause-and-effect--see Day 17)
        Experiment: Imposes treatment  on individuals, to see how the treatment influences  the response. (3.2)
                            Best for cause-and-effect.

Confounding:  Two variables (explanatory or lurking) are confounded when you can't sort out their effects on a response variable.
--Used to be: coffee drinking and smoking--most people did both, or neither...
______________________
Ch. 3.1 Designing Samples

>>Population: Entire group  that we want information about
>>Sample: The part of the population we actually examine.
      Hope:  Sample will be representative of the population.

(SAMPLING) BIAS:  The design of a study is biased if it systematically favors certain outcomes.
    Check our "sample" of digits

Some refinements:
*Sampling frame: Moore p. 179 problem 3.13: the group from which the sample is actually chosen--as different from the "population"--the group you want information about. The sampling frame is often, unfortunately, smaller than the population.  The sample is (usually much) smaller than the sampling frame.
* "Chosen" sample may not turn out to be actual sample, if some individuals don't respond--"Nonresponse", p. 178.

Non-probability samples:

Probability samples--each member of population has a known chance of being chosen (pp. 174-6)
We can't guarantee the sample is representative, but with a probablility sample we can calculate how often (or seldom) it isn't. (Part 3 of the course).
The METHOD is what matters--it will guarantee that most of the time we'll get a representative sample.  (Sometimes we'll do everything right and still have bad luck.  Better than doing it wrong, systematically getting an unrepresentative sample.)

Simple Random Sample (SRS) of size n n individuals chosen in such a way that every possible set of n individuals has an equal chance of being chosen.
HOW?  A chance mechanism: Cards, dice, computer program, or
Table of random digits (Simulates rolling a die with 0,1,....9, over and over...) (Table B, back flyleaf)
    Every digit, every sequence of digits, is equally likely to be "next" in any direction.
To use:  label everyone in the population with a number.
    Important:  Every labeling number needs the same number of digits.
    To label 9 people, use the labels 1,2,3,....9 (1-digit chunks)
    To label 15 people, use the labels 01, 02, ...10, 11, ...15 (2-digit chunks)
    To label 125 people, use the labels 001, 002, ... 124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading across in that size chunk.  Get n eligible numbers (discard repeats)
                    Read Row 150:   07511   88915   41267   16853   84569   79367 ..
From 9 people, a sample n = 5:  0, 7, 5, 1, 1, 8, 8, 9, 1, 5, 4,     (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample   07, 51, 18, 89, 15, 41, 26, 71, 68, 53, 84, 56, 97, 93, 67.... keep reading,
    go to next line (or back to top line) if you need more.  Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075, 118, 891, 541, 267, 168, 538, 456, 979, 367...keep reading.  Individuals 75, 118, ...

    Why the same number of digits in each label?  Each individual 3-digit chunk is as likely as any other 3-digit chunk.  But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So 2 will come up more often than 12, but 02 will come up just as often as 12.

    Why across?  For consistency on HW, go the way they say (so you get the answer in the book).  In practice, you can read up, down, backwards, as long as you decide beforehand, and don't change in the middle of choosing the sample.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~  ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Start here Friday
Sources of bias, even in probability samples:

Inference to the population: Sample results will vary.
   Different samples will represent the population with differing accuracy.
   Well-designed Random (probability) sampling will avoid systematic bias.
   In general,  A larger random sample will give more accurate information about the population than a smaller random sample.
- - - - - - - - - - - -  -- - - - - - - - - - - - - - - - -
More kinds of probability samples:
We will focus on the mathematics of the SRS, the most basic.  In practice, more sophisticated sampling methods may be preferred.  The math needed to analyze their effects is beyond our course.
   Here are some other ways to design a probability sample:
Stratified Random Sample: population is cut into natural segments ('strata').  A specific number of individuals is chosen from each stratum (within each stratum we take a simple random sample).  Advantage: Every stratum is represented with a known proportion of the sample; a simple random sample might under- or over-represent a stratum, by chance.

Multistage Sample: Useful when individuals are at the bottom of a sequence of categories: E.g. to choose a sample of college women, first select 10 colleges, at random, then from those colleges select 2 dorms at random, then from each dorm select 10 students to interview.  Total sample = 200.  Advantage: you only have to visit 10 colleges, 2 dorms in each.  An SRS from the whole country, even if you could do it, might mean 200 colleges.  (You can also mix this with stratification, for instance selecting the 10 colleges in a stratified way from large coed, small coed, womens,...)

Systematic Random Sample (p.184, problem 3.27)  Using a list, to pick a sample of 1/20 of the list: First pick a number at random from 1,2,....20.  Suppose you get 8.  The 8th individual in the list is the first one in the sample.  Then take every 20th individual after that, numbers 28, 48, 68,....   Advantage: Easy to implement, avoids "clumps" that might occur with SRS.


Sievers home  Math151-Fall02/Day-18.htm  1pm 10/9/02
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.