Math 151 , Spring 2006, Day 16 Monday March 6. Hit reload After class

Day 16 (Mon. Mar. 6): Reading: Read D&V Ch8 & Ch9 thru 165 top, Do AS8 Regression.
      Ahead, rest of Ch9,  (AS9, lightly)
Hand in Wed.
SPSS Handout: Do problems 7, 8, 9, 11 p. 3.  Keep this with the previous work. 

(Rest:All D&V
p. 153ff unless otherwise noted)
21(SPSS) a,b,c & 23a,b,c,d Used cars  (23d you do by hand with your calculator, sketch the "up-and-over" on the graph.) Keep a copy of your equation. The SPSS data file is missing a value! age 4, price 6995 has been omitted.This gives price = 12519.62 - 940.04*age, R-square = .91. When the missing value is restored, we get price = 12319.59 - 924.0 * age, R-square = .89 The graphs don't look much different.
36 a thru d Gators  (See p. 149 for how to read  results)

1 a, b  line equation--You should be able to do c also! try it. 
17 SAT scores
26 Chicken (y = calories, x = fat)  a thru f only. 
Add 1d, find y-bar (try for r, but don't worry if you can't).

Postpone the rest
C
.  Use Residuals.xls from here or the lab(in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V) to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
   a)  x 1 2 8 4 6 9 
       y 1 3 6 6 7 5 
   b) x 1 2 7 4 6 9
      y 7 6 2 4 2 1
3 Residuals
.
32 Birthrates (type the data into SPSS. Make a plot of residuals also, to help with 32c) 
SPSS Handout p. 3 (Governer's salaries) : Add #12.  You should now have done  all but #10. Keep till we finish that.
Read, to discuss
Was assigned day 15:  Look at it again, with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file RegressionSlope (or in the folder RegressionDemosExcel for D&V in ClassMaterial\Math151 D&V).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.   Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.'s in y for each s.d. run in x. 
Optional 
Friday no formal class, but alternative work.  See Day 18
HW questions?
Day 15
Regression line: D&V Ch 8&9, AS8&9, A model that Predicts or estimates a y (vertical) value for a given x, using a straight line. ("line of best fit, least squares line") "Regressing y ON x"
See Day 15   yhat =  b0 + b1 x 
  "Up-and-over"--to show graphically what y is predicted for a given x, gostraight up from the x-value till it intersects the line; then over from that point horizontally to the y-axis, to measure the predicted y.  Use as a check on computation.
    Slope b1 (number that multiplies "x") :  For every unit x increases, y will "increase"  b1 units.
SPSS will fit a regression line to data (back page of handout).  While  Editing graph, Insert>Fit line>Regression.
Get line, Equation of line and R2 (the square of the correlation coefficient).  Govsal on avgpay

Extrapolation:  (p. 148&163-5) Using the line to predict for x's outside the range of the data:  The association may change away from what you have data for.  Be cautious!  especially in predicting far into future.

The Regression line equation:
    If we standardize both the x-values and the y-values, the slope will just = r !   zyhat = r zx
     And the intercept will be at  (0,0)  (Which was the point given by the two means, (xbar, ybar) in the original graph.)
         govsalstd.sav Govsalstd2.doc .   (also in SPSS for Class 05 folder--output file won't work)
     Also Excel,     RegressionSlope.xls in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V

To find the equation yhat =  b0 + b1 x  in "real" units:  calculate the "coefficients" b1, b0
   b1 : A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.
 The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches) .  In algebra (p. 140)
            b1  = r times (s.d. of y)/(s.d. of x)
   b0 :  The line goes through (xbar, ybar).  If you know this, you know ybar = b0+ b1(xbar).  You can solve this for b0 ,
           b0= ybar - b1(xbar).
  So, if you have the means and standard deviations and r, you can find the regression equation.
See p. 141, text. That example is incomplete:  they found the b's but didn't write the equation:
    Av.cost-per-person = 2,266.61 -36.21 Peak-fwy-speed.  Check units.
P.142   slope = -36.21 $/mph:  For every mph increase in peak freeway speed, there is a decrease in cost of  $36.21 per person.  Or: "Traffic delays cost each urban area resident about $36 for every mph the freeways are slowed at peakperiod."

Residual:  Look at an individual observed (x,y) data pair.  The residual is the "leftover" amount of y after predicting a y using the line.  Visually, length of vertical line drawn from y to regression line (+ if point is above line, -  if point is below line)
   Residual = observed - predicted = Data - Model   e = y -yhat.
Start here Wed:
 SPSS (handout, p. 3, bottom:  In Edit mode, Insert>Spikes: Spike to: Regression) Govsal-Deviations.doc

Pattern in graph of residuals:  (p.162) If you graph residual values against x (or against predicted y's), you eliminate visually the linear portion of the association. (The regression line "becomes" the new x-axis; a "shear" transformation)
   Excel Residuals.xls in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V
Curving or other structure may stand out more visibly.  "Good" fit = no structure in residuals.
.
SPSS:  (old wing) (Handout bottom p.4&3)  Analyze>Regression>Linear.   Plots button, *ZRESID on *ZPRED. Save button,  Residuals: Unstandardized calculates all the residuals and saves them as a new variable; you can graph residuals on "x".

"Least squares" (D&Vp.144, AS8-3Activity1&2) The regression line is the line that minimizes the sums of the squared residuals.  (RegressionLeastSqs.xls, or in Mac 101, ClassMaterials\Math151 D&V\ RegressionDemosExcel for D&V\RegressionLeastSqs.xls)
       &&This method of finding a "best fit" straight line for predicting y's from x's was derived mathematically to work well with "joint normal" data--elliptical clouds.  For data of this sort, the line does  give the mean of the y's for each given x (at least in the abstract.)
ActivStats Least Squares tool: AS8-3, rightmost button, with line and red dots. "Show" button.  Checkmark all possibilities. Uncheck "ShowLS Line". Choose number of points, Do "Regenerate". Move green line to minimize Sum of Squares (red bar), and observe residuals as you do.  Confirm your result by checking "ShowLSLine".   (The Moore web applet does the same but can't draw the residual lines.)
"Regenerate" created "good clouds" of data.  To use your own data, do "Reset"; click in the picture to make dots (but not too close to the green line or it will think you're dragging that.).

Next---
R-squared : The Line formula yhat =  b0 + b1 x   tells us our best prediction or estimate of a response (y) value for a particular value of the explanatory (x) value.  It says NOTHING about how good that "best" is--that is, it says nothing about how tight or scattered the data is around the line.  R-squared does that job.
 R2 (= r2 = "Coefficient of Determination") = Proportion of variability in y-values explained/accounted for by knowing x and using the  regression line model.  More on this next time.


Sievers home  Math151-Sp06/Daysp16.htm 2pm 2/6/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.