Math 151 , Fall 2005, Day 16 Fri. Sept.30. Hit reload After Class

I'm very sorry.  Exams back Monday for certain!
Day 16 (Fri. Sept 30): Reading: Read D&V Ch8 & Ch9 thru 165 top, Do AS8 Regression.
      Ahead, rest of Ch9,  (AS9, lightly)
Hand in Mon.
(All are at D&V p. 153ff unless otherwise noted)
1 a, b  line equation--You should be able to do c also! try it. 
17, 26 Should not have been assigned Day 15; do them for tonight:
17 SAT scores
26 Chicken (y = calories, x = fat)  a thru f only. 
C.  Use Residuals.xls from here or the lab(in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V) to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
   a)  x 1 2 8 4 6 9 
       y 1 3 6 6 7 5 
   b) x 1 2 7 4 6 9
      y 7 6 2 4 2 1
3 Residuals
Postpone the following problems:
32 Birthrates (type the data into SPSS. Make a plot of residuals also, to help with 32c) 
SPSS Handout p. 3 (Governer's salaries) : Add #12.  You should now have done  all but #10. Keep till we finish that.
Read, to discuss
Was assigned day 15:  Look at it again, with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file RegressionSlope (or in the folder RegressionDemosExcel for D&V in ClassMaterial\Math151 D&V).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.   Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.s in y for each s.d. run in x. 
Optional 
HW questions?
Regression line: D&V Ch 8&9, AS8&9, A model that Predicts or estimates a y (vertical) value for a given x, using a straight line. ("line of best fit, least squares line") "Regressing y ON x"
See Day 15
SPSS will fit a regression line to data (back page of handout).  While  Editing graph, Insert>Fit line>Regression.
Get line, Equation of line and R2 (the square of the correlation coefficient).  Govsal on avgpay

Extrapolation:  (p. 148&163-5) Using the line to predict for x's outside the range of the data:  The association may change away from what you have data for.  Be cautious!  especially in predicting far into future.

The Regression line equation:
    If we standardize both the x-values and the y-values, the slope will just = r !   zyhat = r zx
     And the intercept will be at  (0,0)  (Which was the point given by the two means, (xbar, ybar) in the original graph.)
         govsalstd.sav Govsalstd2.doc .   (also in SPSS for Class 05 folder--output file won't work)
     Also Excel,     RegressionSlope.xls in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V

To find the equation yhat =  b0 + b1 x  in "real" units:  calculate the "coefficients" b1, b0
   b1 : A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.
 The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches) .  In algebra (p. 140)
            b1  = r times (s.d. of y)/(s.d. of x)
   b0 :  The line goes through (xbar, ybar).  If you know this, you know ybar = b0+ b1(xbar).  You can solve this for b0 ,
           b0= ybar - b1(xbar).
  So, if you have the means and standard deviations and r, you can find the regression equation.
See p. 141, text. That example is incomplete:  they found the b's but didn't write the equation:
    Av.cost-per-person = 2,266.61 -36.21 Peak-fwy-speed.  Check units.
P.142  slope = -36.21 $/mph:  For every mph increase in peak freeway speed, there is a decrease in cost of  $36.21 per person.  Or: "Traffic delays cost each urban area resident about $36 for every mph the freeways are slowed at peakperiod."

Residual:  Look at an individual observed (x,y) data pair.  The residual is the "leftover" amount of y after predicting a y using the line.  Visually, length of vertical line drawn from y to regression line (+ if point is above line, -  if point is below line)
   Residual = observed - predicted = Data - Model   e = y -yhat.
 SPSS (handout, p. 3, bottom:  In Edit mode, Insert>Spikes: Spike to: Regression) Govsal-Deviations.doc

Pattern in graph of residuals:  (p.162) If you graph residual values against x (or against predicted y's), you eliminate visually the linear portion of the association. (The regression line "becomes" the new x-axis; a "shear" transformation)
   Excel Residuals.xls in  ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V
Curving or other structure may stand out more visibly.  "Good" fit = no structure in residuals.

Start here Monday:
SPSS:  (old wing) (Handout bottom p.4&3)  Analyze>Regression>Linear.   Plots button, *ZRESID on *ZPRED. Save button,  Residuals: Unstandardized calculates all the residuals and saves them as a new variable; you can graph residuals on "x".

"Least squares" (D&Vp.144, AS8-3Activity1&2) The regression line is the line that minimizes the sums of the squared residuals.  (RegressionLeastSqs.xls, or in Mac 101, ClassMaterials\Math151 D&V\ RegressionDemosExcel for D&V\RegressionLeastSqs.xls)
       &&This method of finding a "best fit" straight line for predicting y's from x's was derived mathematically to work well with "joint normal" data--elliptical clouds.  For data of this sort, the line does  give the mean of the y's for each given x (at least in the abstract.)
ActivStats Least Squares tool: AS8-3, rightmost button, with line and red dots. "Show" button.  Checkmark all possibilities. Uncheck "ShowLS Line". Choose number of points, Do "Regenerate". Move green line to minimize Sum of Squares (red bar), and observe residuals as you do.  Confirm your result by checking "ShowLSLine".
"Regenerate" created "good clouds" of data.  To use your own data, do "Reset"; click in the picture to make dots (but not too close to the green line or it will think you're dragging that.).

Next---
R-squared : The Line formulayhat =  b0 + b1 x   tells us our best prediction or estimate of a response (y) value for a particular value of the explanatory (x) value.  It says NOTHING about how good that "best" is--that is, it says nothing about how tight or scattered the data is around the line.  R-squared does that job.
 R2 (= r2 = "Coefficient of Determination") = Proportion of variability in y-values explained/accounted for by knowing x and using the  regression line model.  More on this next time.


Sievers home  Math151-Fall05/Dayf16.htm 2:20pm 9/30/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.