| Hand
in Monday (D&V p.152 ff, unless
otherwise noted) <Day 18 stuff due
Monday also> C. Use Residuals.xls from here or the lab(in ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V) to graph these data sets, along with a graph of the residuals. Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) a) x 1 2 8 4 6 9 y 1 3 6 6 7 5 b) x 1 2 7 4 6 9 y 7 6 2 4 2 1 3 Residuals 32 a-h Birthrates (type the data into SPSS and get the equation from SPSS. Make a plot of residuals also, to help with 32c) SPSS Handout p. 3 (Governer's salaries) : Add #12. You should now have done all but #10. Keep till we finish that. 32a-h Birthrates (type the data into SPSS and let SPSS find the equation of the line. Make a plot of residuals also, to help with c. Do f,g,h by hand. ) RSquared : (For these parts, pattern on the language in the text and webpage. We'll probably talk more Monday about what "proportion of the variability in y which is accounted for by the regression on x" actually means.., but using the language right is the basic step.) SPSS Handout p. 3 (Governors' salaries): You can now finish all the questions. Hand it in as part of Day 17! 36 e Gators, how good. Also, Graph this line(by hand), and use it to estimate the weight of a 60-inch alligator. 21 d,e,f Used cars ch8 #21/23The SPSS data file is missing a value! age 4, price 6995 has been omitted.This gives price = 12519.62 - 940.04*age, R-square = .91 When the missing value is restored, we get price = 12319.59 - 924.0 * age, R-square = .89 The graphs don't look much different. 31 a-g El Nino A. Income depends on height?! Read
the article and answer this. |
Read,
to discuss |
Optional:
Use Activstats Least Squares tool, (see below) and play with datasets; especially drag points around and see what they do. |
Extrapolation: (p. 148&163-5) Using the line to predict for x's outside the range of the data: The association may change away from what you have data for. Be cautious! especially in predicting far into future.
Regression line:
D&V
Ch 8&9, AS8&9, "Regressing y ON x"
Formula yhat = b0 + b1 x,
b1 = r times
(s.d.
of y)/(s.d. of x) = r sy / sx,
b1 is in y-units per (/) x-unit
, slope, rate of change
b0=
ybar
- b1(xbar) from ybar =
b0
+ b1(xbar).
Residual:
Residual
= observed - predicted
Pattern in graph of residuals: (Ch9 p.162-3) For
links and details,See Day
16
If
you graph
residual values against x (or against predicted y's),
you eliminate visually the linear portion of the
association--eliminate
the distraction of the slanted line. (The regression line "becomes" the
new x-axis; a "shear" transformation)
SPSS: (old wing) Analyze>Regression>Linear. Plots button, *ZRESID on *ZPRED. Save button, Residuals: Unstandardized calculates all the residuals and saves them as a new variable.....
"Least squares" (D&Vp.144, AS8-3Activity1&2)
The
regression line is the line that minimizes the sums of the squared
residuals. See Day 16
R-squared : The Line formula yhat = b0 + b1 x tells us our best prediction or estimate of a response (y) value for a particular value of the explanatory (x) value. It says NOTHING about how good that "best" is--that is, it says nothing about how tight or scattered the data is around the line. R-squared does that job.
R2 (= r2
= "Coefficient of Determination") = Proportion of
variability
in y-values explained/accounted for by knowing x and using the
regression
line model.
Un-accounted-for-variability =(1-r2) =
variance-of-residuals
/ total-variance-of-y's
More:R-Squared (ClassMaterials\Math151
D&V\ RegressionDemosExcel for D&V\RSquared.xls))
(Optional: Further
explanation
of
r2)
r2 is the square of the correlation
coefficient r! (-, + Sign gets lost.)
If r = .7, about half (.49) of the variability
in the y's is accounted for by using the regression line model to
predict y from x. (If weight and height have a correlation of .7, then
half of the variability in weight can be accounted for by height.)
NOTE: The standard deviation doesn't say anything about
the distance of any individual point from the mean; it's only
about
a kind of "average" variability. R2
doesn't say anything about the line and any particular (x,y)
pair
--just about a kind of "average" goodness of the fit of the
line
and the data.
Line is not symmetric: The
regression
of weight on height uses a different line from the regression
of
height on weight. (Minimizing vertical residuals
pulls
line "flatter" than the line that just goes through the middle of
the cloud, which would rise 1 s.d. up for one s.d. run. Related
to
the idea of "regression to the mean" p. 139)
Demonstration on overhead
projector; flip transparency to exchange axes.
| Sievers home | Math151-Sp06/Daysp17.htm | 2:30pm | 2/8/06 |