MATH 251, P&S I, Fall 2007, Oct. 3, Day 18 .After class. hit reload

Reading:  Finish 3.3 (Sampling)  Read  3.4
Hand in: Sec. 3.3 p. 225ff.
3.36 students
3.41 SRS, use Table B
3.55 sampling frame
3.56 online poll
3.59 why biased?

3.77 CSDATA (SPSS) (looking ahead to sec. 3.4) Do with the  SPSS to Sample Handout.  Details on back. Find the means for each of your 5 samples, and hand in the means in a dotplot, the 6 histograms (1 population, 5 samples.), and comments.

3.54 ring-no answer
Postpone  the rest (non-SRS's)
3.47, 3.48 systematic
3.49 a.  For b, don't find the sample, but tell what the type of sampling is. random digit dialing
3.52 stratified over/under 21. (Don't find the sample)
3.44 census tracts (use table B) 

Postpone 3.4 (yes)
Sec. 3.4 p. 240ff

3.62, 63, 64, 65 parameter/statistic
3.66 bias/variability
3.75 grades sample:  Take 3 SRS's and find their means: i.e. repeat part a 3 times.   To start, decide which way you'll read in Table B. Then close your eyes and put your finger  down in the table to pick your starting place.  Bring your results to class to pool, to get an idea of the "sampling distribution of the mean of an SRS of 4 grades". 
3.70 n = 61,239
3.68 Canada/U.S.

Read, discuss 
3.45 different starts
3.57, 58   questions
3.39 movies
Postpone
3.50, 3.53 strata
3.46 census 

Postpone (3.4)
3.69 states

Optional
News just in:  NY Times Tuesday Science News, p. 10.  "While nearly as many patients receiving a sham form of acupunctur also reported relief, 34% of them needed extra pain pills, compared with just 15% of patients receiving legitimate acupuncture."[and 59% of the control group.]
Homework questions?  Day 17
 Sampling Design  Notes Day 17

Using SPSS to take a sample
from a "population" (or sampling frame) which is listed in a data file: SPSS toSample Handout.
 
Some types of samples:

Non-probability Samples
   Voluntary Response Sample
   Convenience Sample (not in text)--grab whatever individuals are handy.  Stats class.  Interview people in mall.

Probability Samples:
   SRS--Simple Random Sample (& my initials)
new: Start here Friday
   Systematic Random Sample
   Stratified Random Sample
   Multistage Sample
(All our  later theory will be for SRS; modifications need to be made for other probability samples)

Stratified Random Sample: population is cut into natural segments ('strata').  A specific number of individuals is chosen from each stratum (within each stratum we take a simple random sample).  Advantage: Every stratum is represented with a known proportion of the sample; a simple random sample might under- or over-represent a stratum, by chance.  "Strata" are like "blocks"--sampling & design of experiments are different subcultures, different jargon.

Multistage Sample: Useful when individuals are at the bottom of a sequence of categories: E.g. to choose a sample of college women, first select 10 colleges, at random, then from those colleges select 2 dorms at random, then from each dorm select 10 students to interview.  Total sample = 200.  Advantage: $$, time: you only have to visit 10 colleges, 2 dorms in each.  An SRS from the whole country, even if you could do it, might mean 200 colleges.  (You can also mix this with stratification, for instance selecting the 10 colleges in a stratified way from large coed, small coed, womens,...)

Systematic Random Sample (p.228, problem 3.47)  Using a list, to pick a sample of 1/20 of the list: First pick a number at random from 1,2,....20.  Suppose you get 8.  The 8th individual in the list is the first one in the sample.  Then take every 20th individual after that, numbers 28, 48, 68,....   Advantage: Easy to implement, avoids "clumps" that might occur with SRS.  Another description:  a "one-in-twenty" sample from a list.

3.4, Toward Statistical Inference.
Chance  behavior (a random phenomenon): Unpredictable in the short run,  predictable regular pattern in the long run.
  (Random numbers:  equally likely in the long run.  "Random" in this chapter  is more general--pattern is not necessarily equally likely)
25 digits from the random number table: Individual sets of 25 show much variability.  Pooled  shows more "flatness" --but still much variability.  You would be right to be skeptical when I told you that your "pick-a-number" choices were not random, on the basis of just this class's data.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
We know that a sample from a population will not exactly represent the population.  If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population.  Knowing the pattern will be  as good as we can do.
Sec. 3.4
        Sample Chosen from a  Population
          (varies)             (fixed, but usually unknown)
Calculate
Numerical summary: Statistic (Latin) Parameter(Greek letter)
    Examples:           Sample mean xbar    Population mean mu (µ)
                       Sample st. dev. s    Pop. standard dev. sigma
                        Sample median     Pop. median
                Sample proportion p-hat  Pop. proportion p
                Sample line height y-hat  Pop. regression line height y
The actual value of the Statistic will vary, depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.  We hope it is close to the parameter.  If we choose simple random samples, we can understand the pattern of values the statistic can take.
Some examples of  statistics:
    Height:   U.S. young women:  pop. mean= 64", pop. s.d. 2.7" (p. 86, 1.89)
                  (text.  Caveat:
Last edition: pop. mean= 64.5", pop. s.d. 2.5" Rounded?  Shrinking?)
                                         Math 151, Spring '01,  xbar = 64.2,     s = 3.75.
                                                               Fall '01,   xbar = 65.01,    s = 3.22.
                                                            Spring '02,  xbar = 64.53,    s = 2.91.
                                                               Fall '02,    xbar = 63.89,    s = 2.48.
                                                             Spring '03,  xbar = 64.98,    s = 3.29
                                                              Spring '04,  xbar = 65.33,    s = 2.25
                                                              Spring '05,  xbar = 64.31,    s =2.93   
                                                                   Fall '05  xbar =63.92 ,    s =2.80

                                                                Spring '06  xbar =62.93 ,    s =2.78
                                                                    Fall '06  xbar =62.81 ,   s =  2.65
                                                               Spring '07  xbar =65.18 ,    s =2.26
                                                                    Fall '07  xbar =65.67 ,   s =  2.73
    Coin flip: Proportion of heads  p = 1/2 (?)       p-hat =  256/520 = .492  (combined data from many past classes)
    Thumbtack:  Proportion of point-up p =  (??)       p-hat =  441/691 = .6382  (one past class, Math 251)
..
Sampling distribution of a statistic:  If we could repeat the sampling process, distribution of values for that statistic calculated from "all possible" samples (of the given size.)  Assumes probability sampling or randomized experiment design.
Shape, center, spread.
Shape:  mound-shape, often normal; wouldn't want bimodal or outliers.
Center of sampling distribution should be close to parameter value:
     systematically "under-or over-estimates" = "biased estimator"
Spread:  Want "tight" (around parameter value!)

--SRS produces unbiased estimators for most common statistics.
--Larger (random) sample produces less variability (spread)  (p.242-3, #3.71b,c)
          Size of sample matters, not proportion of population! (as long as population is at least 10 times sample size).

Random sampling will allow us to do inferential statistics:
  How far off are we likely to be from the parameter value = Margin of error.
  How plausible is a claim based on the data (significance level)

Next: Looking at some sampling distributions; then Ch. 4, Ch. 5...



Sievers home  Math251-Fall07/Day2s18.htm    4pm   10/9/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.