Math 151 , Spring 2006, Day 20 Wed. March 15 Hit reload... .After class, data file link added 3/17

Day 20 Reading: D&V Ch. 11 thru p. 13.  Ch. 12, thru 230 for tonight's HW, then rest.  AS12 is very good, and entertaining.
Hand in Friday (All D&V)
Ch. 11 pp. 219-20 
1 Coin flip 
4  Games 

Ch.12 p.238ff.
18 a,b,d Fuel economy
+ + + + + + + + + + + +
A.  Making & Examining SRS's with SPSS.  Do the assignment on the handout Using SPSS to find a Simple Random SampleThis is in the white folder outside my door, as well as at the link.  sorry!  Here's the data file: Oldfaith.sav

Start now, hand in after break:
 1, 4, 6, 7, 8, 9 Do parts a,b,c,d, of these and save your work for next time (Fri.), when I'll assign parts e, f. 

23 Sampling methods
21 Quality Control Use a cluster sample. (SPSS) (You can do the SPSS sampling now along with A above)  Get the individuals  like this: 
In SPSS, type in values of a variable for case code, with values 61, 62,....80.  Get a sample of size 3 & write down which cases are chosen.   Then  choose one from each case: Consider them labeled from 1 to 12: enter those numbers into a variable.  For each case, take a sample of size 1 to decide which bottle from that case.   (Are we running a risk if we take the same (place) bottle from each case?)  Write down which bottles were chosen. 
AND When we've covered sampling from the Random Number Table(p. A-49) Do it again! Use line 16, reading across, to first choose 3 cases from 61, 62,....80, then choose a bottle from each case .  You'll get a different sample from your SPSS sample, of course.

19 Accounting

Read,
  to 
discuss 
Optional 
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
If you didn't Monday, Pick a digit (from 0,1,2,3,4,5,6,7,8,9).  Write it down.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Fay is sick; won't be here at 12:30 today.  See me instead!
Exam 2 the Friday after break.  Sample soon.

Homework questions?
Day 18

Chapter 11 (transition)  Understanding Randomness
"Random" An event is a "random" event if we know what outcomes could happen, but not which particular values did or will happen.     I like to use "chance", or "probabilistic."
"Random numbers"  Lists of digits (or sets of digits of equal length, like 135, 238, 099) that are not only "random" (unpredictable) but equally likely.
Generating a series of equally likely numbers:  List the possible values you want, then:
--Roll a die(1,2,3,4,5,6 is usual). Repeat.
--Pick a card from a well-shuffled deck (Mark the cards with as many values as you need) Replace and repeat .
--Put labeled slips, or balls, in a "hat".  Stir, pick.  Replace, repeat.
--Go to Random.com (from radio static)
--Ask a computer program  (SPSS--we'll learn: Using SPSS to Sample Get Handout.).
--Look at p. A-49 (Table of Random Digits).  Close your eyes and put your pencil point on the page.  Start there.  Read digits, as many as you need.
(The last two are "pseudo-random"--look random but in fact are computed in a predictable way.  Good enough.)

--Pick a digit yourself? Look at the digits you picked.  (stemplot) **

Compare with digits from a Table of random digits (Simulates rolling a die with 0,1,....9, over and over...) (p.A49)
    Every digit, every sequence of digits, is equally likely to be "next" in any direction.
We see much variability in the results from the table--but no consistent patterns.  The more data we pool, the closer we get to "equally likely" -- 10% of each of the 10 digits.

In class:  in pairs, tally 25 random numbers from table A-49 (each start on a different line)  Hand in at end of class.

People are very bad at making choices that would qualify as "equally likely".
    So we need to use an impersonal mechanism.

Rest of Ch. 11 (simulating random events) --postpone to before Ch. 14.
To now:  Get a data set to tell us its secrets -- (Exploratory) Data Analysis.  Only about the set itself.
   Organize, Display, Summarize, Describe (with Models--Normal distribution, Straight line Regression)

Sample Surveys  (D&VCh. 12 )
Goal:  Use data  to tell us something about the larger world that it came from (Statistical Inference--later)
Idea 1:  Population:  the whole group we'd like to learn something about.  (Almost always too big, too expensive to get at.)
   Sample:  A (much?) smaller group from the population--which we actually can examine.
     Hope:  The Sample will be representative of the Population.
      "Bias":  a systematic failure of the Sample (sampling process) to represent the Population.
Idea 2:   Use Randomization to pick us a representative sample (almost always representative; and we can quantify "almost always")
 Protects us from bias; allows us to do statistical inference.
Idea 3:   Larger sample will be more representative.  But not because it's a larger proportion of the whole; just because it's a larger number of individuals.  Spoonful samples a (stirred) cauldron of soup as well as it does a small pot.  Toothpick doesn't do as well.  (If the sample is more than about 10% of the population, the proportion issue begins to matter. )

Census:  (try to) get whole population!  (pp. 225-6)  Expensive, difficult, impractical (cf. "destructive testing."  Homeless.  Illegal immigrants.  "Snowbirds.").  May be less accurate  than careful sample.

We know that a sample from a population will not exactly represent the population.  If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population.  Knowing the pattern will be  as good as we can do.  Ch.14 on. 

       Sample Chosen from a  Population
       (varies)             (fixed, but usually unknown)
Calculate
Numerical summary: Statistic (Latin) Parameter(Greek letter) (D&Vp227)
    Examples:        Sample mean xbar    Population mean mu (µ)
                    Sample st. dev. s    Pop. standard dev. sigma
                       Sample median    Pop. median
               Sample proportion p-hat  Pop. proportion p
               Sample line height y-hat  Pop. regression line height y
                  Sample line slope b1   Pop. regression line slope beta1
The actual value of the Statistic will vary, depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.  We hope it is close to the parameter.  If we choose simple random samples, we can understand the pattern of values the statistic can take.

Simple Random Sample (SRS) of size n n individuals chosen in such a way that every possible set of n individuals has an equal chance of being chosen.
HOW?  A chance mechanism: Cards, dice, computer program  or Table of random digits.
   Need a list of the population, labeled, usually with numbers.
Different methods: (D&V p. 229-30): 1/10 of population; list random digits next to population list, take all "4's" .
SPSS will take a random sample from a population listed in a data file.  Using SPSS to Sample Get Handout.

Start here Friday:
"List of the Population"?? 
fat chance.
Sampling  frame: the list of individuals from the population that you actually choose the sample from.  May differ a little (or a lot!) from the population you desire to study.

Bias in sampling: any systematic failure of a sample (its method) to represent its population.  (E.g. sampling frame excludes "different" part of population.)

Other (good) sampling designs:  Cheaper, or avoid a problem...
--To make sure important groups are represented proportionately: Stratified random sample:  Divide into subpopulations (strata) with different characteristics (M/F, income strata: 5ths, etc.) Decide how many from each, then random sample within each stratum.
--Save cost, time: Cluster sample:  Choose (randomly) clusters, then within each cluster take a sample. (sometimes all).  (Door to door survey: pick city block at random, then 3 households at random from the block (or all households on the block)
--Multistage sampling:  Combine several design types in sequence to get to final sample.
--Systematic sample:   For a 1-in-6 systematic sample from a phonebook:  Roll a die to get the number for the first individual.  Suppose the 5th.  Then choose every 6th after that (11th, 17th, 21st.....).  If there's no relationship between variables we're looking at and "every 6th", it's ok.  (Every individual is equally likely, but not every sample: so Not a simple random sample.  All that's randomized is the starting point.)  (But remember Playground; jockeying to be with friend after countoff.)

If time, look at #21, p.240

Using Random Number Table to sample (p. A-49)  Example: Ch. 11 pp.  216-7 The Step-by-step simulation effectively takes a random sample of size 3 from the 57 students.
    Every digit, every sequence of digits, is equally likely to be "next" in any direction. (Divisions into 5 is just for legibilty)
To use:  label everyone in the population with a number.
    Important:  Every labeling number needs the same number of digits.
    To label 9 people, use the labels 1,2,3,....9 (1-digit chunks)
    To label 15 people, use the labels 01, 02, ...10, 11, ...15 (2-digit chunks)
    To label 125 people, use the labels 001, 002, ... 124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading across in that size chunk.  Get n eligible numbers (discard repeats)
                    For example :   07511   88915   41267   16853   84569   79367 ..
From 9 people, a sample n = 5:   0,7, 5, 1, 1, 8, 8, 9, 1, 5, 4,     (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample   07, 51, 18, 89, 15, 41, 26, 71, 68, 53, 84, 56, 97, 93, 67.... keep reading,
    go to next line (or back to top line) if you need more.  Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075, 118, 891, 541, 267, 168, 538, 456, 979, 367...keep reading.  Individuals 75, 118, ...

    Why the same number of digits in each label?  Each individual 3-digit chunk is as likely as any other 3-digit chunk.  But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So 2 will come up more often than 12, but 02 will come up just as often as 12.
    Why across?  For consistency on HW, Start where I say and go across (so everyone who does it right gets the same answer.).  In practice, you can read up, down, backwards, as long as you decide beforehand, and don't change in the middle of choosing the sample.

Sources of bias, next


Sievers home  Math151-Sp06/Daysp20.htm  2:40pm 10/15/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
I predict that there will be none or only one from  0, 1, 5, 9.  At least one and probably more 7's. A"bias" against 0,1,5,9, and toward 7.