MATH 251, Probability and Statistics I, Fall 2005, Oct. 7, Day 19 After class

Reading for Wednesday:  Finish 3.3 (Sampling)  Read  3.4
Hand in: Sec. 3.3 p. 225ff.
3.54 ring-no answer
3.47, 3.48 systematic
3.49 a.  For b, don't find the sample, but tell what the type of sampling is. random digit dialing
3.52 stratified over/under 21. (Don't find the sample)
3.44 census tracts (use table B)

Sec. 3.4 p. 240ff
3.62, 63, 64, 65 parameter/statistic

Postpone the rest:
3.66 bias/variability
3.75 grades sample:  Take 3 SRS's and find their means: i.e. repeat part a 3 times.   To start, decide which way you'll read in Table B. Then close your eyes and put your finger  down in the table to pick your starting place.  Bring your results to class to pool, to get an idea of the "sampling distribution of the mean of an SRS of 4 grades". 
3.70 n = 61,239
3.68 Canada/U.S.

Read, discuss 
3.45 different starts
3.57, 58   questions
3.39 movies
3.50, 3.53 strata
3.46 census 

Postpone 3.69
3.69 states

Optional
Exams not finished yet.  Exams+solutions back Wednesday.
Homework questions?  Day 17
    p.231, "how many children in your family?"  What's the bias?
  Ask:  How are missing data handled?  Police response time:  calls that were never answered were entered as "0" time.
Literary Digest poll, narrative
Some types of samples:
Non-probability Samples
   Voluntary Response Sample
   Convenience Sample (not in text)--grab whatever individuals are handy.  Stats class.  Interview people in mall.

Probability Samples:
   SRS--Simple Random Sample (& my initials)
   Systematic Random Sample
   Stratified Random Sample
   Multistage Sample
(All our  later theory will be for SRS; modifications need to be made for other probability samples)

Stratified Random Sample: population is cut into natural segments ('strata').  A specific number of individuals is chosen from each stratum (within each stratum we take a simple random sample).  Advantage: Every stratum is represented with a known proportion of the sample; a simple random sample might under- or over-represent a stratum, by chance.  "Strata" are like "blocks"--different subcultures, different jargon.

Multistage Sample: Useful when individuals are at the bottom of a sequence of categories: E.g. to choose a sample of college women, first select 10 colleges, at random, then from those colleges select 2 dorms at random, then from each dorm select 10 students to interview.  Total sample = 200.  Advantage: $$, time: you only have to visit 10 colleges, 2 dorms in each.  An SRS from the whole country, even if you could do it, might mean 200 colleges.  (You can also mix this with stratification, for instance selecting the 10 colleges in a stratified way from large coed, small coed, womens,...)

Systematic Random Sample (p.228, problem 3.47)  Using a list, to pick a sample of 1/20 of the list: First pick a number at random from 1,2,....20.  Suppose you get 8.  The 8th individual in the list is the first one in the sample.  Then take every 20th individual after that, numbers 28, 48, 68,....   Advantage: Easy to implement, avoids "clumps" that might occur with SRS.  Another description:  a "one-in-twenty" sample from a list.

How to take an SRS using SPSS--Handout after vacation (No SPSS HW over vacation)

3.4, Toward Statistical Inference.
Chance  behavior (a random phenomenon): Unpredictable in the short run,  predictable regular pattern in the long run.
  (Random numbers:  equally likely in the long run.  "Random" in this chapter  is more general--pattern is not necessarily equally likely)
25 digits from the random number table: Individual sets of 25 show much variability.  Pooled  shows more "flatness" --but still much variability.  You would be right to be skeptical when I told you that your "pick-a-number" choices were not random, on the basis of just this class's data.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
We know that a sample from a population will not exactly represent the population.  If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population.  Knowing the pattern will be  as good as we can do.
Sec. 3.4
        Sample Chosen from a  Population
          (varies)             (fixed, but usually unknown)
Calculate
Numerical summary: Statistic (Latin) Parameter(Greek letter)
    Examples:           Sample mean xbar    Population mean mu (µ)
                       Sample st. dev. s    Pop. standard dev. sigma
                        Sample median     Pop. median
                Sample proportion p-hat  Pop. proportion p
                Sample line height y-hat  Pop. regression line height y
The actual value of the Statistic will vary, depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.  We hope it is close to the parameter.  If we choose simple random samples, we can understand the pattern of values the statistic can take.
Some examples of  statistics:
    Height:   U.S. young women: pop. mean= 64.5", pop. s.d. 2.5"  (text.  Caveat: rounded?)
                                              Math 151, Spring '01,  xbar = 64.2,     s = 3.75.
                                                                      Fall '01,   xbar = 65.01,    s = 3.22.
                                                                   Spring '02,  xbar = 64.53,    s = 2.91.
                                                                      Fall '02,    xbar = 63.89,    s = 2.48.
                                                                   Spring '03,  xbar = 64.98,    s = 3.29
                                                                   Spring '04,  xbar = 65.33,    s = 2.25
                                                                  Spring '05,  xbar = 64.31,    s =2.93
    Coin flip: Proportion of heads  p = 1/2 (?)       p-hat =  256/520 = .492  (combined data from many past classes)
    Thumbtack:  Proportion of point-up p =  (??)       p-hat =  441/691 = .6382  (one past class, Math 251)
Start here Wednesday
Sampling distribution of a statistic:  If we could repeat the sampling process, distribution of values for that statistic calculated from "all possible" samples (of the given size.)  Assumes probability sampling or randomized experiment design.
Shape, center, spread.
Shape:  mound-shape, often normal; wouldn't want bimodal or outliers.
Center of sampling distribution should be close to parameter value:
     systematically "under-or over-estimates" = "biased estimator"
Spread:  Want "tight" (around parameter value!)

--SRS produces unbiased estimators for most common statistics.
--Larger (random) sample produces less variability (spread)
          Size of sample matters, not proportion of population (as long as population is at least 10 times sample size).

Random sampling will allow us to do inferential statistics:
  How far off are we likely to be from the parameter value = Margin of error.
  How plausible is a claim based on the data (significance level)

Next: Looking at some sampling distributions; then Ch. 4, Ch. 5...



 
Sievers home  Math251-Fall05/Dayps19.htm    11pm   10/6/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.