Math 151 , Day 15, Friday, March 2, 2001

Questions on Homework:

Section 2.3, Regression line, continued:  Files in Green are Excel files, and will be in RegressionDemos folder in Class Materials\Math 151, in lab 101.  Or downloadable from here, perhaps.

ANY Straight line y = a + bx  (or bx + a):  b, the coefficient of x, is the slope of the line.  If x changes one unit, y changes b units, so b is the rate of change of y with respect to x.  (If y is weight in pounds, and x is height in inches, b is the number of pounds  we expect to see weight go up by, per inch that height goes up by. RegressionSlope.xls download?

LEAST SQUARES PROPERTY (again)
"Residual at x" = y - yhat  = distance from observed y to predicted y (what's left over after predicting)
Least squares principle:  Find the line that minimizes the sums of the squared residuals.

Outliers get their residual distance squared:  May be very influential  in determining where line sits.
Especially if at lowest or highest x-values, may change slope of line a lot.  (In-class demo RegressionLine.xls download?)

Plotting residuals:  This amounts to making the regression line into a new x-axis--If you plot the residuals themselves vs. the original x values, without the distraction of the slanted line, outliers and patterns other than the linear (if any) can emerge.  (In-class demo ResidualsRSquared.xls download?)
SPSS Manual, Section 2.2 pp. 67-71 tells how.

r2  = Proportion of variability in y-values explained/predicted by knowing x and using
     the least squares regression line.   More exact explanation of what the "variability" is we're taking a proportion of.
 (In-class demo ResidualsRSquared.xls download?)

Cautions  Sec. 2.4
Plot the data: Summary formulas and numbers don't tell the whole story.  (Anscombe's quartet, p.127, 2.46-7)

Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for.  Unavoidable if x is time; but inevitably dangerous--nothing says the mechanism you see will persist in a wider range.

Averaged data will produce a stronger relationship (higher correlation, r2) than the merged raw data from individuals (the averaging hides much variability)

Lurking variables and association/causation next time.

HW Day15  Reread section 2.3, esp. p.112-end.  Read sec. 2.4.  We'll skip 2.5 for now; Chapter 3 next.
Hand in:  Sec. 2.3
p. 122, 2.36 speed&gas again a, b, c.  Use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals.  Check that they match the ones in the text.  Use them to do the graphing, etc.

p 161, 2.99 SAT verbal as predictor of SAT math. In part a, also make and print the graph with the regression line.   In part b, use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals.  Then you can make the graph, and finish the problem.

p. 123, 2.38 Gesell first word-point in middle of x range.

p. 111, 2.32  Manatees Review problem.  Use SPSS for part a, and draw in the points for part b by hand. 
 Now make two other graphs, with regression lines:  X axis = year for both, through 1990.  Y axis = powerboat registration for one, and Y axis = manatees killed for the other. 
Now add the four more years of data given in part b to your data file. Remake the two graphs with the four new years.  For each pair (year, powerboats) and (year, kills), answer this: Does the pattern shown before 1990 persist, or does it change, in the four new years? 
= = = = = = = = = = = = = = = = 
Sec. 2.4
p. 131, 2.53 farm population
   Also connect the dots, or plot the residuals--is there any curve to the relationship?
p. 132  2.54 Dow average/stocks
p. 138 2.63 math&verbal r, states/individuals

Read, to discuss
 
 
 
 
Optional
 
 
 
 


Sievers home  Math151-Sp01/Day15.htm  3/2/01
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.