### Math 151 , Spring '12 Wednesday Day 20, March 7 Hit reload.....After class....

HW:  (Re) read pp. 142-146 (lurking / assoc. isn't cause).
We will Skip Ch. 6.  Read p. 198 (explore vs infer).  Then:  Chapter 8.  (Read p. 210-11 (Other designs) last (it's optional)).  Check: p. 217, 8.16-18, 23 at first, then 8.19-21 with Table B. 8.22 optional. In 8.24, there are 2 populations: All adults, and parents of children.  Which pct. is more accurate for its population?.  Ahead, Chapter 9.

* There are 376,740 different possible samples of size 6 from a list of 28.

Exams still not finished.  Friday there is no class (CSE--Activism Symposium) Schedule on the Globe (soon?) .  I expect to have the exams finished by then; and I will be on campus.   If you want to stop by my office and get yours, that will work; but email or phone (3210) to make sure I'll be there!  (and enjoy the Symposium!  Check out Florence Nightingale-(more, more)- Primordial Activist and Statistician--"To understand God's thoughts we must study statistics, for these are the measure of His purpose". )

Nicole--change in TA hours from Thursday 4:30-6 to Wednesday 1:30-2:30 and then 4:30 to 5 for this week only.
Danielle--also needs to change her hours this Thursday March 8th  Her hours will be moved to Friday March 9th 2:30-4:30.  "Only this week (hopefully!)".
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Homework questions, residuals etc?  Day 19
Leftovers:  Extrapolation

Reprise:Association does not imply causation  http://xkcd.com/552/

Establishing that x "causes" y:  difficult:
Best: Do an experiment in which we change x, keep lurking variables under control. (E.g.   Rats.  More in Ch.9)
Otherwise: Strong association. Consistent over many studies. Higher x-->stronger y.  X precedes y in time.  A plausible mechanism exists (parallel studies?)

Chapters 1 through 5 have covered analyzing data that was given to us--what it said about itself.
Informally, develop guesses, suspicions, hypotheses about the world the data came from.
From Exploration to Inference p. 198

Ch. 8&9:  Producing Data:  Aim:  create data sets that will allow us to make inferences to a larger world than just the data we have. ..

Ch. 8 p. 201ff.  Sampling
>>Population: Entire group  that we want information about. [Census:  measure them all. (TRY!)--2010-- foreclosures made it especially difficult.]
>>Sample: The part of the population we actually examine.
Hope:  Sample will be representative of the population.
>> Sampling Design:  Describes exactly how sample is to be chosen from population.

(SAMPLING) BIAS:  The design of a study is biased if it systematically favors certain outcomes.

Sample survey:  (attempt to) choose a representative sample from a large, varied population. Not Easy!
Some issues:  What population do we want to understand?  What exactly do we want to measure? (Pre-Election polls--pitfalls?) (2010 Census: Determines Congressional districts, and much else. Under Dept. of Commerce.  Can we use sampling techniques?  Not to determine Congressional Districts, but for other things.  Counting/sampling issue background: 3  )

• Voluntary response sample ( from a general appeal--Ann Landers, Cosmo, Hite Report, call-in and internet "instant polls"): biased toward  strong opinions, esp. negative. (Contenteds don't bother.) Whatever it is, I'm against it..   www.vote.com, captive (dying?).
• Convenience sample (whatever/whoever looks good, is handy) Unlikely to be representative.  Digits.  Math 151 is a convenience sample from Wells for heights, siblings, major... Mall surveys.

Simple Random Sample
(
SRS) of size n n individuals
chosen in such a way that every possible set of n individuals has an equal chance of being chosen.   Random sampling (p.210) uses chance to choose the samp.e.
HOW?  A chance mechanism: Label everyone in the population.  Use Cards, dice, lotto balls, computer program,
Simple Random Sample Applet, Enter population size, sample size, hit Reset, then Sample.
OR
.Start here Monday..
Table of random digits (Simulates rolling a die with 0,1,....9, over and over...) (Table B, p.692)
Every digit, every sequence of digits, is equally likely to be "next" in any direction.
To use:  label everyone in the population with a number.
Important:  Every labeling number needs the same number of digits.
To label 9 people, use the labels 1,2,3,....9 (1-digit chunks)
To label 15 people, use the labels 01, 02, ...10, 11, ...15 (2-digit chunks)
To label 125 people, use the labels 001, 002, ... 124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading across in that size chunk.  Get n eligible numbers (discard repeats)
Read Row 105:    95592  94007  69971  91481  60779  53791  17297  59335 ..
From 9 people, a sample n = 5:    9, 5, 5, 9, 2, 9, 4, 0, 0, 7, 6, ..    (sample is individuals 9, 5, 2, 4, 7)
From 15 people, a sample  95, 59, 29, 40, 07, 69, 97, 19, 14, 81, 60, 77, .... keep reading,
go to next line (or back to top line) if you need more.  Individuals 7, 14,...are chosen using this line so far.
From 125 people, a sample 955, 929, 400, 769, 971, 914, 816, 077, 953, 791, 172, 975, 933, 5...keep reading, next line, etc..  Individual 77 the only one chosen so far...

Why the same number of digits in each label?  Each individual 3-digit chunk is as likely as any other 3-digit chunk.  But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So 2 will come up more often than 12, but 02 will come up just as often as 12.

Why across?  For consistency on HW, go the way they say (so you get the answer in the book).  In practice, you can read up, down, backwards, as long as you decide beforehand, and don't change in the middle of choosing the sample.
+ + + + + + + + + + + + + + + + + +

Some more sources of bias, even in probability samples (p. 212-14):
**UndercoverageSome groups in the population are left out, or slighted,  in the process of choosing the sample.

One possible source of undercoverage: Sampling frame: Moore p. 221 problem 8.42: the group from which the sample is actually chosen--as different from the "population"--the group you want information about. The sampling frame is often, unfortunately, smaller than the population.  (Often a "list" that already exists.) The sample is (usually much) smaller than the sampling frame.
** "Chosen" sample may not turn out to be actual sample, if some individuals don't (won't / can't) respond--"Nonresponse".
**Response bias Lies, bad memory, pleasing interviewer (nutrition surveys) Interview technique
**Wording of questions Confusing? Leading? Limiting choices?  Order questions were asked in?

Suppose we've done it right....
A Random sample (p.210) is from a design where impersonal chance is used to pick the individuals.  Simple Random Sample (SRS) (p. 205) is the most straightforward.  More sophisticated methods are often used (pp. 210-11), but knowing them is optional this term. (More info)
+ + + + + + + +
We want to use the sample to make an inference about the population.  A sample will never exactly represent the population, but larger (RANDOM) samples give more accurate results than smaller random samples.
(almost always. Quantify "more accurate" and "almost always" in chapter 14.)
(Not in text:  Surprisingly (?), this isn't usually because you have more of the population.  A tablespoon of soup gives a pretty good sample, whether it's from a quart of soup or a 10-gallon vat (as long as it's well-stirred).  A toothpickful does not.

More discussion of terms used in sampling

 Sievers home Math151-Sp12/Days20.htm 2:30pm 3/7/12
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.