Math 151 , Fall 2007 Monday Day 19, Mar. 12,.After class Hit reload...corrected 10:45am 3/14

HW:  Reading:   Reread, Finish Ch.5, the equation of the least-squares line (p. 120) & Fact 2, p.123. Continuing regression, p. 126-137.   Next,  Read Ch. 7, summary.  (Skip Ch. 6) Next, Ch. 8, 9

Hand in  Wednesday

pp. 143-4, 5.35, 37 (SPSS) Drilling into the past, silicon (one clear outlier) To graph the lines with and without the outlier on the same graph, make a new variable and put 1's in every case but the outlier--give the outlier 0.  Then use this variable as your legend or panel variable.  You'll also get a "nuisance" horizontal line at the outlier; ignore it.

Residuals
p. 129, 5.7 (SPSS) does fast driving waste fuel? residuals  There is a data file for problem 5.7, and its third column is the residuals.  Do all the parts, and
Also with 5.7, In SPSS, Make a variable containing the residuals (Handout, bottom p. 4.  Also bottom of this page.)  The values should match the ones in the book/SPSS file.

SPSS Handout p. 3 (Governors' salaries):  You can now finish#12, the last question.  Hand it all  in Wednesday!.

B.  Use Residuals.xls from the website or the lab to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
a)  x 1 2 8 4 6 9 
    y 1 3 6 6 7 5 
b) x 1 2 7 4 6 9
   y 7 6 2 4 2 1

p.133, 5.9 Farm population  (SPSS)  Do a, b, c (read p. 132 for a good word to use in part c).  Also, make a variable containing the residuals, and plot it against the x (year) values.  Draw (in pencil) a horizontal line at height 0.  What pattern do you see in the residuals?

p 179 7.28, 29, 30 (SPSS) Soap in the shower.  Also, look carefully at the graph and guess why there is no data after day 21.  (Read p. 132 for the word to describe using the line for day 30, and a discussion of the issue)
Postpone:
p. 136 5.13 hospitals: big = bad?

Read, to discuss
  If you haven't, Look at this, especially with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file RegressionSlope (or in the folder RegressionDemosExcel for D&V in ClassMaterial\Math151 D&V).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.   Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.'s in y for each s.d. run in x. 
<>C. Use Applet http://www.whfreeman.com/BPS4e Correlation/regression.   Make a cloud of data (about 15 points), put in the regression line.  Play with an outlier: drag a point to the far left (or right) and drag it up and down. 
Try it if it's in the middle range of x's.  (Drag it up and down.)  Answer: Where is it most influential? Now add a bunch more points (50 is max.)  Play with an outlier  againDoes the outlier have more or less influence with a larger data set?
Postpone:

p. 136,  5.12 lurking variables
Optional 
Postpone:
p. 136, 5.11, lurking variables 







 
 
 
 
 
 
 

 

Exams not finished; sorry!
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
HW questions?  Day 16
Review: Facts 2 &3, give line formula! (Moore pp. 123-125)  (Day 16)

2.   A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.
The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches)
In "algebra", b = r times (s.d. of y)/(s.d. of x)  (Equation p. 120)
       If we standardize both the x-values and the y-values, the slope will just = r !  RegressionSlope.xls

3.   The regression line goes through the point given by the two means, (xbar, ybar).
--If you know this, you know ybar = a + b (xbar).  You can solve this for a, a = ybar - b (xbar).(OtherEquation p. 120)

The line formula yhat = a + bx  from xbar, ybar, sx , sy , r:
     Find b:   b = r  sy / sx       (Fact 2r is slope if x and y are standardized. Equation p. 120)
      Find a:  Solve  ybar = a + b xbar for a:  a = ybar - b xbar    (Fact 3:  (xbar, ybar) lies on the regression line(s).  Equation p. 109)
 Example.  xbar = 5   ybar = 8 
sx = 10, sy = 6 , r = -.3: 
        b = -.3×6/10 = - 0.18.   8 = a + (-0.18)×5 = a  - .95    a = 8.95       yhat = 8.95 - 0.18x

New:
Least Squares Property, Residuals, Cautions  details Day 17
"Residual at x" = (y - yhat)  = observed y - predicted y    = "prediction error" p. 119 
Least squares principle:  Find the line that minimizes the sums of the squared residuals.  Good for oval cloud
     Outliers get their residual distance squared:  May be very influential  in determining where line sits.
             Especially if at lowest or highest x-values, may change slope of line a lot.
            (Outliers toward the middle x's may not change the slope, but may affect r, and r2.)
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Plotting residuals: Graph residual values against x (or against predicted y's), to eliminate visually the linear portion of the association.  Curving or other structure may stand out more visibly.  Straight line is a "Good" fit  <=> no structure in residuals.

SPSS :new variable of residuals: Do Analyze>Regression>Linear . Click your variables into Independent (X) and Dependent(Y). 
Hit the Button "Save...": Checkbox Residuals: Unstandardized. Continue, Ok. You'll get a new variable, the residuals.   Use this on the vertical axis, of a scatterplot:  "Residual plot"

Cautions  pp. 132-136
Plot the data: Correlation and regression line only describe a linear relationship properly.
Correlation and regression are not resistant to outliers, influential points.

Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for.  Linear relationships don't go on forever; straight line  is often a first approximation to a more complicated relationship.
Start here Wednesday:
"Lurking" variable has an important effect, but not one of the variables studied.
    The trouble with lurking variables is that by definition you don't know they're there. 

Association does not imply causation.    Establishing that x "causes" y:  difficult:  Ch. 9


Sievers home   Math151-Sp07/Daysp19.htm  10:45am 3/14/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.