Math 151 , Fall 2007 Monday Day 17, Oct. 1.After class.Hit reload...

Reading:   Reread, Finish Ch.5   The equation of the least-squares line (p. 120) & Fact 2, p.123. Continuing regression, p. 126-137.   Read Ch. 7, summary.  (Skip Ch. 6)
Hand in  Wednesday Also bring questions for the exam!
Residuals

p. 129, 5.7 (SPSS) does fast driving waste fuel? residuals  There is a data file for problem 5.7, and its third column is the residuals.  Do all the parts.
POSTPONE THIS Part: Also with 5.7, In SPSS, Make a variable containing the residuals (Handout, bottom p. 4.  Also middle-bottom of Day16.)  The values should match the ones in the book/SPSS file.

POSTPONE THIS Part: SPSS Handout p. 3 (Governors' salaries):  You can now finish #12, the last question.  Hand it all  in (sometime).

p.133, 5.9 (SPSS) Farm population Do a, b, c (read p. 132 for a good word to use in part c).  POSTPONE THIS Part: Also, make a variable containing the residuals, and plot it against the x (year) values.  Draw (in pencil) a horizontal line at height 0.  What pattern do you see in the residuals?

POSTPONE THIS: B.  Use Residuals.xls from the website or the lab to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
a)  x 1 2 8 4 6 9 
    y 1 3 6 6 7 5 
b) x 1 2 7 4 6 9
   y 7 6 2 4 2 1

p 179 7.28, 29, 30 (SPSS) Soap in the shower.  Also, look carefully at the graph and guess why there is no data after day 21.  (Read p. 132 for the word to describe using the line for day 30, and a discussion of the issue)

Postpone: p. 136 5.13 hospitals: big = bad?

Read, to discuss
 

..

C. Use Applet http://www.whfreeman.com/BPS4e Correlation/regression.   Make a cloud of data (about 15 points), put in the regression line.  Play with an outlier: drag a point to the far left (or right) and drag it up and down. 
Try it if it's in the middle range of x's.  (Drag it up and down.)  Answer: Where is it most influential? Now add a bunch more points (50 is max.)  Play with an outlier  againDoes the outlier have more or less influence with a larger data set?

..
p. 136,  5.12 lurking variables

Optional 
p. 179, 7.27 (review Normal)


..
p. 136, 5.11, lurking variables 







 
 
 
 
 
 
 

 

Exam 2 this Friday: Day 19 (Oct. 5.  Day before break.  Please email me Right Away if you need to take the exam early (Wed. or Th.)It can only be done with prearrangement before Wed!).  Starts with Ch. 3, Normal distribution, tables.  Thru Ch. 4, and what we cover of Ch.5 through  Monday.   One sheet of notes: I will give you paper copies of the Normal table.
Sample exam handout, outside my door after class,  and linked Here
     Solutions: 1 outside my door, linked here 
After today's class  you can do ALL the problems.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Continuing with regression: summary so far--
The Line formula yhat = a + bx tells us our best prediction or estimate of a response (y) value for a particular value of the explanatory (x) value.  It says NOTHING about how good that "best" is--that is, it says nothing about how tight or scattered the data is around the line.  R-squared does that job. Fact 4 (p.124-5):  r2 ("Coefficient of Determination") = Fraction of variability in y-values explained/predicted by knowing x and using the least squares regression line. 

Fact 1: Regressing Variable A on Variable B doesn't give the same line as regressing Variable B on Variable A: Line gives "best" vertical value for a given horizontal. value.

Facts
2 &3, give line formula, and more
! (Moore pp. 123-125)  (For details seeDay 15)
   b = r times (s.d. of y)/(s.d. of x)  (Equation p. 120)
  ybar = a + b (xbar).  Solve this for a, a = ybar - b (xbar).(OtherEquation p. 120)

Homework questions? Day 16
New today:
Least Squares Property, and Residuals
  (Details Day 16)   Outline:

"Residual at x" = (y - yhat)  = observed y - predicted y    = "prediction error" p. 119
  Residual:  (x,y) data pair.  The residual is the "leftover" amount of y after predicting a y using the line.  Visually, length of vertical line drawn from y to regression line (+ if point is above line, -  if point is below line)
 
Least squares principle:  "Least squares regression line" = the line that minimizes the sums of the squared residuals. Works well with "joint normal" data--elliptical clouds. For data of this sort, the line does  give the mean of the y's for each given x (at least in the abstract.)
<>BUT  Outliers get their residual distance squared:  May be very influential  =change slope of line a lot.
           (Outliers at low or high x's.  Outliers toward the middle x's may not change the slope, but may affect r, and r2.)
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Postponed: Finding one residual may be on the exam. Plotting or understanding the plot of all residuals will not be.
Plotting residuals:
Graph residual values against x (or against predicted y's): Eliminate visually the linear portion of the association. (No structure in residuals = Straight line is a "Good" fit.)

SPSS can make a new variable of residuals, Day 16 (Handout p. 4 and 3 bottoms)
  Use this on the vertical axis of a scatterplot, on original x's (or y's):  "Residual plot"


Cautions  pp. 132-136 Day 16
Plot the data: Correlation and regression line only describe a linear relationship properly; are not resistant to outliers, influential points.

Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for.  
  Dangerous, though sometimes unavoidable.


"Lurking" variable has an important effect, but not one of the variables studied. Time sequence  is a common one.  Look behind every tree.

Exam (& Day 17 HW) ends here.

Association does not imply causation Day 16


Sievers home   Math151-Fall07/Dayf17.htm  3:30pm 10/1/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.