Section 2.3, Regression line, continued: Files in Green are Excel files, and are in RegressionDemos folder in Class Materials\Math 151, in lab 101.
Review:
ANY Straight line y = a + bx (or bx + a): b,
the coefficient of x, is the slope of the line. If
x changes one unit, y changes b units, so b is the rate of change of
y with respect to x. (If y is weight in pounds, and x is height
in inches, b is the number of pounds we expect to see
weight go up by, per inch that height goes up by. RegressionSlope.xls
From last time: Regression line formula
(memorize
these)
#2 A change of one standard deviation in x corresponds
to a change of r standard deviations in y, along the regression line.
In "algebra", b = r times (s.d.
of y)/(s.d. of x) (Equation p. 104)
The slope b expresses change in
y-units per x-unit. (Suppose x is inches, y is pounds.
Then b is in pounds per inch.)
If we standardize both the
x-values and the y-values, the slope will just = r !Regression
in SPSS
#3 The regression line goes through the point given
by the two means,
(xbar,
ybar).
--If you know this, you know ybar = a
+ b (xbar).
You can
solve this for a, a = ybar - b (xbar). (OtherEquation
p. 104)
--So knowing 2 and 3, you can find the equation
of the line from the means, s.d.'s, and r.
--And if you draw the two lines, y on
x and x on y, they will intersect at (xbar, ybar)
LEAST SQUARES PROPERTY
"Residual at x" = y - yhat = distance between observed y and
predicted y (what's left over after predicting)
( Positive if observed is bigger than predicted,
negative if observed is smaller than predicted)
Least squares principle: Find the line that minimizes
the sums of the squared residuals.( RegressionLine.xls,
Squares tab)
This method of finding
a "best fit" straight line for predicting y's from x's was derived mathematically
to work well with "joint normal" data--elliptical clouds. For data
of this sort, the line
does give the mean of the y's for each
given x (at least in the abstract.)
Drawback if the data is not the "elliptical cloud" type:
Outliers get their residual distance
squared: May be very
influential in determining where
line sits.
Especially if at lowest or highest x-values, may change slope
of line a lot.
Plotting residuals: This amounts to making the regression line into a new x-axis--If you plot the residuals themselves vs. the original x values, without the distraction of the slanted line, outliers and patterns other than the linear (if any) can emerge. (In-class demo ResidualsRSquared.xls , Graph of Residuals tab.)
r2 = Proportion of variability in y-values explained/predicted
by knowing x and using the least squares regression line.
More exact explanation of what the "variability" is we're taking a
proportion of. ( ResidualsRSquared.xls, R-squared
tab)
SPSS residuals Manual, Section
2.2 pp. 67-71 tells how. Runthru: Analyze>Regression>Linear.
Save: Unstandardized Residuals. They become
a new variable. Scatterplot them on y-axis, old x variable on x-axis.
Chart Editor: Chart>Reference line:Yscale: position
at 0 (add).
Section 2.4 will be discussed
next time. Read it to be ready--lots of separate ideas!
Cautions Sec.
2.4
Plot the data: Summary formulas and numbers
don't tell the whole story. (Anscombe's quartet, p.127, 2.46-7)
Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for. Unavoidable if x is time; but inevitably dangerous--nothing says the mechanism you see will persist in a wider range.
Averaged data will produce a stronger relationship (higher correlation, R2) than the merged raw data from individuals (the averaging hides much variability)
Lurking variables and association/causation next time.
HW Day15 Reread section 2.3, esp. p.112-end. Read sec. 2.4. We'll skip 2.5 for now; Chapter 3 next.
| Hand in: Sec. 2.3
SPSS Manual, Section 2.2 pp. 67-71.(continuing with Sanchez heating). Print and hand in the graph p. 70. p. 122, 2.36 speed&gas again a, b, c. Use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals. Check that they match the ones in the text. Use them to do the graphing, etc. p 161, 2.99 SAT verbal as predictor of SAT math. In part a, also make and print the graph with the regression line. In part b, use the SPSS Manual, Section 2.2 pp. 67-71 to generate the residuals. Then you can make the graph, and finish the problem. p. 123, 2.38 Gesell first word-point in middle of x range. p. 111, 2.32 Manatees Review problem.
Use SPSS for part a, and draw in the points for part b by hand.
|
Read, to discuss
|
Optional
Use the Excel files
|
| Sievers home | Math151-Fall01/DayS15.htm | 1:30 pm | 10/03/01 |