Math 151 , Fall 2005, Day 19 Fri. Oct. 7 Hit reload... After class, corrections

Day 19 (Fri. Oct. 7): Reading: D&V Ch. 11 thru p. 13.  Ch. 12, thru 230 for tonight's HW, then rest.  AS12 is very good, and entertaining.
Hand in Wednesday (All D&V)
Ch. 11 pp. 219-20 
1 Coin flip 
4  Games 

Ch.12 p.238ff.
17 Arm length  Do all parts of the problem; Measure your own arm and try to get 2 other people not in the class to measure their arms too,  to get a sample of size 3.  Bring results to class Wednesday to pool. 
18 a,b,d Fuel economy
+ + + + + + + + + + + +
Start now, hand in Friday (or don't start it.  We covered only a little of it)
A.  Making & Examining SRS's with SPSS.  Do the assignment on the handout Using SPSS to find a Simple Random Sample
1, 4, 6, 7, 8, 9 Do parts a,b,c,d, of these and save your work for next time (Wed.), when I'll assign parts e, f. 

23 Sampling methods
21 Quality Control Use a cluster sample. (SPSS) Get the individuals  like this: 
In SPSS, type in values of a variable for case code, with values 61, 62,....80.  Get a sample of size3 & write down which cases are chosen.   Then  choose one from each case: Consider them labeled from 1 to 12: enter those numbers into a variable.  For each case, take a sample of size 1 to decide which bottle from that case.   (Are we running a risk if we take the same (place) bottle from each case?)  Write down which bottles were chosen. 
19 Accounting

Read,
  to 
discuss 
Optional 
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
If you didn't Friday, Pick a digit (from 0,1,2,3,4,5,6,7,8,9).  Write it down.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Homework questions? Day 18
Anscombe's quartet; Summary values give stronger-looking relationship;  Association doesn't imply causation. Day18

Chapter 11 (transition)  Understanding Randomness
"Random" An event is a "random" event if we know what outcomes could happen, but not which particular values did or will happen.     I like to use "chance", or "probabilistic."
"Random numbers"  Lists of digits (or sets of digits of equal length, like 135, 238, 099) that are not only "random" (unpredictable) but equally likely.
Generating a series of equally likely numbers:  List the possible values you want, then:
--Roll a die(1,2,3,4,5,6 is usual). Repeat.
--Pick a card from a well-shuffled deck (Mark the cards with as many values as you need) Replace and repeat .
--Put labeled slips, or balls, in a "hat".  Stir, pick.  Replace, repeat.
--Go to Random.com (from radio static)
--Ask a computer program  (SPSS--we'll learn).
--Look at p. A-49 (Table of Random Digits).  Close your eyes and put your pencil point on the page.  Start there.  Read digits, as many as you need.
(The last two are "pseudo-random"--look random but in fact are computed in a predictable way.  Good enough.)

--Pick a digit yourself? Look at the digits you picked.  (stemplot) **

Compare with digits from a Table of random digits (Simulates rolling a die with 0,1,....9, over and over...) (p.A49)
    Every digit, every sequence of digits, is equally likely to be "next" in any direction.
We see much variability in the results from the table--but no consistent patterns.  The more data we pool, the closer we get to "equally likely" -- 10% of each of the 10 digits.

In class:  in pairs, tally 25 random numbers from table A-49 (each start on a different line)  Hand in at end of class.

People are very bad at making choices that would qualify as "equally likely".
    So we need to use an impersonal mechanism.

Rest of Ch. 11 (simulating random events) --postpone to before Ch. 14.
To now:  Get a data set to tell us its secrets -- (Exploratory) Data Analysis.  Only about the set itself.
   Organize, Display, Summarize, Describe (with Models--Normal distribution, Straight line Regression)

Sample Surveys  (D&VCh. 12 )
Goal:  Use data  to tell us something about the larger world that it came from (Statistical Inference--later)
Idea 1:  Population:  the whole group we'd like to learn something about.  (Almost always too big, too expensive to get at.)
   Sample:  A (much?) smaller group from the population--which we actually can examine.
     Hope:  The Sample will be representative of the Population.
      "Bias":  a systematic failure of the Sample to represent the Population.
Idea 2:   Use Randomization to pick us a representative sample (almost always; and we can quantify "almost always")
 Protects us from bias; allows us to do statistical inference.
Idea 3:   Larger sample will be more representative.  But not because it's a larger proportion of the whole; just because it's a larger number of individuals.  Spoonful samples a (stirred) cauldron of soup as well as it does a small pot.  Toothpick doesn't do as well.  (If the sample is more than about 10% of the population, the proportion issue begins to matter. )

Census:  (try to) get whole population!  (pp. 225-6)  Expensive, difficult, impractical (cf. "destructive testing."  Homeless.  Illegal immigrants.  "Snowbirds.").  May be less accurate  than careful sample.

We know that a sample from a population will not exactly represent the population.  If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population.  Knowing the pattern will be  as good as we can do.  Ch.14 on. 

       Sample Chosen from a  Population
       (varies)             (fixed, but usually unknown)
Calculate
Numerical summary: Statistic (Latin) Parameter(Greek letter) (D&Vp227)
    Examples:           Sample mean xbar    Population mean mu (µ)
                       Sample st. dev. s    Pop. standard dev. sigma
                      Sample median    Pop. median
               Sample proportion p-hat  Pop. proportion p
               Sample line height y-hat  Pop. regression line height y
                  Sample line slope b1   Pop. regression line slope beta1
The actual value of the Statistic will vary, depending on the particular sample. "Sampling variability"
The Statistic "estimates" the Parameter.  We hope it is close to the parameter.  If we choose simple random samples, we can understand the pattern of values the statistic can take.
Start here Wednesday after break:
Simple Random Sample (SRS) of size n n individuals chosen in such a way that every possible set of n individuals has an equal chance of being chosen.
HOW?  A chance mechanism: Cards, dice, computer program  or Table of random digits.
   Need a list of the population, labeled, usually with numbers.
Different methods: (D&V p. 229-30): 1/10 of population; list random digits next to population list, take all "4's" .
SPSS will take a random sample from a population listed in a data file.  Using SPSS to Sample. Get Handout.

"List of the Population"??  fat chance.
Sampling  frame: the list of individuals from the population that you actually choose the sample from.  May differ a little (or a lot!) from the population you desire to study.

Bias in sampling: any systematic failure of a sample (its method) to represent its population.  (E.g. sampling frame excludes "different" part of population.)

Other (good) sampling designs:  Cheaper, or avoid a problem...
--To make sure important groups are represented proportionately: Stratified random sample:  Divide into subpopulations (strata) with different characteristics (M/F, income strata: 5ths, etc.) Decide how many from each, then random sample within each stratum.
--Save cost, time: Cluster sample:  Choose (randomly) clusters, then within each cluster take a sample. (sometimes all).  (Door to door survey: pick city block at random, then 3 households at random from the block (or all households on the block) Cluster was rewritten since day 19.
--Multistage sampling:  Combine several design types in sequence to get to final sample.
--Systematic sample:   For a 1-in-6 systematic sample from a phonebook:  Roll a die to get the number for the first individual.  Suppose the 5th.  Then choose every 6th after that (11th, 17th, 21st.....).  If there's no relationship between variables we're looking at and "every 6th", it's ok.  (Every individual is equally likely, but not every sample: so Not a simple random sample.  All that's randomized is the starting point.)  (But remember Playground; jockeying to be with friend after countoff.)

If time, look at #21, p.240
Sources of bias, next


Sievers home  Math151-Fall05/Dayf19.htm  3pm 10/7/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
* * I predict that there will be none or only one from  0, 1, 5, 9.  At least one and probably more 7's. A"bias" against 0,1,5,9, and toward 7.