Math 151 , Day 15, Wednesday, Sept. 3, 2001 Final version

Questions on Homework?  SPSS?  I'll go to the lab after class with anyone still having trouble with regression.

Section 2.3, Regression line, continued:  Files in Green are Excel files, and are in RegressionDemos folder in Class Materials\Math 151, in lab 101.

Review:
ANY Straight line y = a + bx  (or bx + a):  b, the coefficient of x, is the slope of the line.  If x changes one unit, y changes b units, so b is the rate of change of y with respect to x.  (If y is weight in pounds, and x is height in inches, b is the number of pounds  we expect to see weight go up by, per inch that height goes up by. RegressionSlope.xls
From last time: Regression line formula (memorize these)

#2  A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.
      In "algebra", b = r times (s.d. of y)/(s.d. of x)   (Equation p. 104)
  The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.)
       If we standardize both the x-values and the y-values, the slope will just = r !Regression in SPSS
#3  The regression line goes through the point given by the two means, (xbar, ybar).
    --If you know this, you know ybar = a + b (xbar).
           You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 104)
     --So knowing 2 and 3, you can find the equation of the line from the means, s.d.'s, and r.
      --And if you draw the two lines, y on x and x on y, they will intersect at (xbar, ybar)

LEAST SQUARES PROPERTY

"Residual at x" = y - yhat  = distance between observed y and  predicted y (what's left over after predicting)
    ( Positive if observed is bigger than predicted, negative if observed is smaller than predicted)
Least squares principle:  Find the line that minimizes the sums of the squared residuals.( RegressionLine.xls, Squares tab)
       This method of finding a "best fit" straight line for predicting y's from x's was derived mathematically to work well with "joint normal" data--elliptical clouds.  For data of this sort, the line does  give the mean of the y's for each given x (at least in the abstract.)

Drawback if the data is not the "elliptical cloud" type:
     Outliers get their residual distance squared:  May be very influential  in determining where line sits.
Especially if at lowest or highest x-values, may change slope of line a lot.

Plotting residuals:  This amounts to making the regression line into a new x-axis--If you plot the residuals themselves vs. the original x values, without the distraction of the slanted line, outliers and patterns other than the linear (if any) can emerge.  (In-class demo ResidualsRSquared.xls , Graph of Residuals tab.)

r2  = Proportion of variability in y-values explained/predicted by knowing x and using  the least squares regression line.
More exact explanation of what the "variability" is we're taking a proportion of.  ( ResidualsRSquared.xls, R-squared tab)

SPSS residuals  Manual, Section 2.2 pp. 67-71 tells how. Runthru: Analyze>Regression>Linear.
Save: Unstandardized Residuals.  They become a new variable. Scatterplot them on y-axis, old x variable on x-axis.
Chart Editor: Chart>Reference line:Yscale: position at 0 (add).

Section 2.4 will be discussed next time.  Read it to be ready--lots of separate ideas!
Cautions  Sec. 2.4
Plot the data: Summary formulas and numbers don't tell the whole story.  (Anscombe's quartet, p.127, 2.46-7)

Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for.  Unavoidable if x is time; but inevitably dangerous--nothing says the mechanism you see will persist in a wider range.

Averaged data will produce a stronger relationship (higher correlation, R2) than the merged raw data from individuals (the averaging hides much variability)

Lurking variables and association/causation next time.

HW Day15  Reread section 2.3, esp. p.112-end.  Read sec. 2.4.  We'll skip 2.5 for now; Chapter 3 next.
Hand in:  Sec. 2.3
SPSS Manual, Section 2.2 pp. 67-71.(continuing with Sanchez heating).  Print and hand in the graph p. 70.

p. 122, 2.36 speed&gas again a, b, c.  Use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals.  Check that they match the ones in the text.  Use them to do the graphing, etc.

p 161, 2.99 SAT verbal as predictor of SAT math. In part a, also make and print the graph with the regression line.   In part b, use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals.  Then you can make the graph, and finish the problem.

p. 123, 2.38 Gesell first word-point in middle of x range.

p. 111, 2.32  Manatees Review problem.  Use SPSS for part a, and draw in the points for part b by hand. 
 Now make two other graphs, with regression lines:  X axis = year for both, through 1990.  Y axis = powerboat registration for one, and Y axis = manatees killed for the other. 
Now add the four more years of data given in part b to your data file. Remake the two graphs with the four new years.  For each pair (year, powerboats) and (year, kills), answer this: Does the pattern shown before 1990 persist, or does it change, in the four new years? 
= = = = = = = = = = = = = = = = 
Sec. 2.4  Will be assigned with Day 16, on Friday
p. 131, 2.53 farm population (SPSS)
   Also connect the dots, or plot the residuals--is there any curve to the relationship?
p. 132  2.54 Dow average/stocks
p. 138 2.63 math&verbal r, states/individuals

Read, to discuss
 

 

Optional

Use the Excel files 
(Green ref's above) 
to reinforce understanding 
of least squares,  residuals, R2

 


Sievers home  Math151-Fall01/DayS15.htm  1:30 pm 10/03/01
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.