Math 151 , Fall 2007 Wednesday Day 18, Oct. 3 Hit reload....After class.

HW:  (Re) read pp. 133-136.Ch. 7 (Summary review)  Skip Chapter 6. Read p. 186.  Read Chapter 8.  Read p. 200 (Other designs) last.  Check p. 206, 8.17-22, 26 at first., then 8.23-25 with Table B.  Ahead, Chapter 9.

Hand in Friday OCT 12
Residuals plots:  Due Friday Oct. 12 (day 21) so you don't have to worry about needing SPSS  or a computer over break (these are the Postponed parts fromDay 17): 
 p. 129, 5.7 fuel residuals. You did the book questions. Now, in SPSS, Make a variable containing the residuals (Handout, bottom p. 4.  Also middle-bottom of Day16.)  The values should match the ones in the book/SPSS file.
SPSS Handout p. 3 (Governors' salaries):  You can now finish #12, the last question.  Hand it all  in now.
p.133, 5.9 (SPSS) Farm population You did a, b, c (read p. 132 for a good word to use in part c).  Now, make a variable containing the residuals, and plot it against the x (year) values.  Draw (in pencil) a horizontal line at height 0.  What pattern do you see in the residuals?
B.  Use Residuals.xls from the website or the lab to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
a)  x 1 2 8 4 6 9 
    y 1 3 6 6 7 5 
b) x 1 2 7 4 6 9
   y 7 6 2 4 2 1

& & & & & & & & & & & & & &
Hand in  NOTHING (enjoy the break and/or read/work ahead...)
the first class after break (Wed. Oct. 10, Day 20)

p. 136 5.13 hospitals: big = bad?

p. 192, 8.1, 8.2, 8.3 expt, obsn
p. 207, 8.27 Alcohol & heart attacks

p. 194, 8.4, 5, 6 population/sample
. . . . . .

p. 195, 8.7 Sampling badly on campus
- - - - - - -
p. 199 8.9 Apartment living, SRS. Use Table B.
p. 209, 8.36 Area code sample, SRS  Use Table B.
p. 211, 8.45 random digit dialing
p. 210, 8.41 random digit characteristics

p.209-10, 8.38 b only Traffic lights
p. 208, 8.30 movie viewing

Read, to discuss 

p. 136,  5.12 lurking variables
p. 208, 8.29 safety of anesthetics
p. 192 8.3 TV & aggression (lurking)
 . Postpone. .

p.195, 8.8 more Sampling badly on campus
- - - -
p. 211, 8.47 guns

p. 204, 8.14, 8.15 biases.
p. 208, 8.31 world affairs
p. 211, 8.46 wording survey questions


Optional 
p. 136, 5.11, lurking variables 
- - -
. Postpone. .

p. 209, 8.35 Use table B (more practice)

p. 209, 8.34 seat belt use


Pick a digit (from 0,1,2,3,4,5,6,7,8,9).  Write it down.  Write it to the left of your name on the sign in sheet .
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exam 2 Next class: Day 19 (Oct. 5.  Day before break.)  If you want to start early Friday (after 9:15), put your desired start time on the second sheet on the sign-in board.   Email me or phone me (607-257-7641Th, x3210F)about any emergencies!.
Exam starts with Ch. 3, Normal distribution, tables.  Thru Ch. 4, and almost all  of Ch.5:
 Finding  r (correlation coefficient) from a batch of data, no.  Guesstimating r from a scatterplot, yes.
 Finding a residual, yes.   Residual Plots (p.128), no.
  Influential points, effect of outliers, Extrapolation, yes. 
 Association/causation (134-36) No.
One sheet of notes: I will give you paper copies of the Normal table.

Sample exam handout, outside my door after class,  and linked Here
     Solutions: 1 outside my door, linked here 
You can do ALL the problems.

HW questions?  Day 17

--Don't trust just summary data. 
Need to see the scatterplot to see how suitable the summary numbers are. 
     ("Anscombe's quartet", Moore p.142, 5.34) (Overhead slide last time.  You can reconstruct these pictures using SPSS and Moore's problem, if you like.)
--Extrapolation.   
Watch out for it.

--Examples from HW, involving extrapolation and residuals plots:  ex 7-28 Soap 
data,    output
     ex5-9 Farm population
dataoutput   Your computation of the predicted value for year 2000 may differ quite a bit from the book's;  it's roundoff error:  This happens because the x-values are so big, in the thousands, that the roundoff error can be in the ten's if the slope b is only given with 2 decimal places.  (Often people use a different scale for years, say using 1900 = year 0,  to lessen this kind of roundoff error.)
Ours:  1166.93 -.59*2000 - 1166.93- 1180 = -13.07
Theirs:  1166.93-.5868*2000 =  1166.93-1173.6 = -6.67

Questions for exam?
- - -Exam material ends here - - -
Finishing Ch. 5: 
 
Plotting residuals:  Day 16
  SPSS makes residuals: 
Day 16
 Class today: Did the residuals work, above. Talked about lurking variables on Monday.
"Lurking" variable has an important effect, but not one of the variables studied. Day 16

Association does not imply causation Day 16

Establishing that x "causes" y:  difficult:
    Best: Do an experiment in which we change x, keep lurking variables under control. (E.g.   Rats.  Ch.9)
    Otherwise: Strong association. Consistent over many studies. Higher x-->stronger y.  X precedes y in time.  A plausible mechanism exists (parallel studies?)                 

  E.g. Proposed some years ago...
    Partially  hydrogenated oils = "trans fats" --> heart disease? yes.  Homocysteines --> heart disease? unclear.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Chapters 1 through 5 have covered analyzing data that was given to us--what it said about itself.
    Informally, develop guesses, suspicions, hypotheses about the world the data came from.
From Exploration to Inference p. 186

Ch. 8&9:  Producing Data:  Aim:  create data sets that will allow us to make inferences to a larger world than just the data we have.

  Observational Study:  Observes individuals, measures variables, does not influence the responses. (ch.8) 
                 Sometimes observe individuals who are (more or less) conveniently at hand, or, better,
                  Take Sample from a population, examine it.... (ch.8)
  Experiment: Imposes treatment  on individuals, to see how the treatment influences  the response. (ch.9)  

Confounding:  Two variables (explanatory or lurking) are confounded when you can't sort out their effects on a response variable.  (Rats:  Mothers' grooming causes sociability, or inherited sociability from mothers who like to groom?).

Ch. 8 p. 192ff.  Sampling
>>Population: Entire group  that we want information about.
>>Sample: The part of the population we actually examine.
        Hope:  Sample will be representative of the population.
>> Sampling design:  Describes exactly how sample is to be chosen from population.

(SAMPLING) BIAS:  The design of a study is biased if it systematically favors certain outcomes.
.. Check our "sample" of digits

Sample survey:  (attempt to) choose a representative sample from a large, varied population. Not Easy!
    Some issues:  What population do we want to understand?  What exactly do we want to measure?

Non-probability samples (sampling badly):


Simple Random Sample
(
SRS) of size n n individuals
chosen in such a way that every possible set of n individuals has an equal chance of being chosen.   A probability sample (p.200).
HOW?  A chance mechanism: Cards, dice, computer program, or
Table of random digits (Simulates rolling a die with 0,1,....9, over and over...) (Table B, p.686)
    Every digit, every sequence of digits, is equally likely to be "next" in any direction.
To use:  label everyone in the population with a number.
    Important:  Every labeling number needs the same number of digits.
    To label 9 people, use the labels 1,2,3,....9 (1-digit chunks)
    To label 15 people, use the labels 01, 02, ...10, 11, ...15 (2-digit chunks)
    To label 125 people, use the labels 001, 002, ... 124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading across in that size chunk.  Get n eligible numbers (discard repeats)
                    Read Row 150:   07511   88915   41267   16853   84569   79367 ..
From 9 people, a sample n = 5:   0,7, 5, 1, 1, 8, 8, 9, 1, 5, 4,     (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample   07, 51, 18, 89, 15, 41, 26, 71, 68, 53, 84, 56, 97, 93, 67.... keep reading,
    go to next line (or back to top line) if you need more.  Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075, 118, 891, 541, 267, 168, 538, 456, 979, 367...keep reading.  Individuals 75, 118, ...

    Why the same number of digits in each label?  Each individual 3-digit chunk is as likely as any other 3-digit chunk.  But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So 2 will come up more often than 12, but 02 will come up just as often as 12.

    Why across?  For consistency on HW, go the way they say (so you get the answer in the book).  In practice, you can read up, down, backwards, as long as you decide beforehand, and don't change in the middle of choosing the sample.
+ + + + + + + + + + + + + + + + + +

Some more sources of bias, even in probability samples (p. 201-3):
**Undercoverage:  Some groups in the population are left out, or slighted,  in the process of choosing the sample.
  
One possible source of undercoverage: Sampling frame: Moore p. 211 problem 8.45: the group from which the sample is actually chosen--as different from the "population"--the group you want information about. The sampling frame is often, unfortunately, smaller than the population.  (Often a "list" that already exists.) The sample is (usually much) smaller than the sampling frame.
** "Chosen" sample may not turn out to be actual sample, if some individuals don't respond--"Nonresponse".
**Response bias Lies, bad memory, pleasing interviewer (nutrition surveys) Interview technique
**Wording of questions Confusing? Leading? Limiting choices?

Sievers home   Math151-Fall07/Dayf18.htm  1:15pm 10/3/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.