Section 2.3, Regression line, continued: Files in Green are Excel files, and will be in RegressionDemos folder in Class Materials\Math 151, in lab 101. Or downloadable from here, perhaps.
ANY Straight line y = a + bx (or bx + a): b, the coefficient of x, is the slope of the line. If x changes one unit, y changes b units, so b is the rate of change of y with respect to x. (If y is weight in pounds, and x is height in inches, b is the number of pounds we expect to see weight go up by, per inch that height goes up by. RegressionSlope.xls download?
LEAST SQUARES PROPERTY (again)
"Residual at x" = y - yhat = distance from observed y to predicted
y (what's left over after predicting)
Least squares principle: Find the line that minimizes the sums
of the squared residuals.
Outliers get their residual distance squared: May be very
influential
in determining where line sits.
Especially if at lowest or highest x-values, may change slope
of line a lot. (In-class demo RegressionLine.xls
download?)
Plotting residuals: This amounts to making the regression
line into a new x-axis--If you plot the residuals themselves vs. the original
x values, without the distraction of the slanted line, outliers and patterns
other than the linear (if any) can emerge. (In-class demo ResidualsRSquared.xls
download?)
SPSS Manual, Section 2.2 pp. 67-71 tells how.
r2 = Proportion of variability in y-values explained/predicted
by knowing x and using
the least squares regression line.
More exact explanation of what the "variability" is we're taking a proportion
of.
(In-class demo ResidualsRSquared.xls download?)
Cautions Sec.
2.4
Plot the data: Summary formulas and numbers
don't tell the whole story. (Anscombe's quartet, p.127, 2.46-7)
Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for. Unavoidable if x is time; but inevitably dangerous--nothing says the mechanism you see will persist in a wider range.
Averaged data will produce a stronger relationship (higher correlation, r2) than the merged raw data from individuals (the averaging hides much variability)
Lurking variables and association/causation next time.
HW Day15 Reread section 2.3, esp. p.112-end. Read sec. 2.4. We'll skip 2.5 for now; Chapter 3 next.
| Hand in: Sec. 2.3
p. 122, 2.36 speed&gas again a, b, c. Use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals. Check that they match the ones in the text. Use them to do the graphing, etc. p 161, 2.99 SAT verbal as predictor of SAT math. In part a, also make and print the graph with the regression line. In part b, use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals. Then you can make the graph, and finish the problem. p. 123, 2.38 Gesell first word-point in middle of x range. p. 111, 2.32 Manatees Review problem.
Use SPSS for part a, and draw in the points for part b by hand.
|
Read, to discuss
|
Optional
|
| Sievers home | Math151-Sp01/Day15.htm | 3/2/01 |