| Hand
in Wed. SPSS Handout: Do problems 7, 8, 9, 11 p. 3. Keep this with the previous work. (Rest:All D&V p. 153ff unless otherwise noted) 21(SPSS) a,b,c & 23a,b,c,d Used cars (23d you do by hand with your calculator, sketch the "up-and-over" on the graph.) Keep a copy of your equation. The SPSS data file is missing a value! age 4, price 6995 has been omitted.This gives price = 12519.62 - 940.04*age, R-square = .91. When the missing value is restored, we get price = 12319.59 - 924.0 * age, R-square = .89 The graphs don't look much different. 36 a thru d Gators (See p. 149 for how to read results) 1 a, b line equation--You should be able to do c also! try it. 17 SAT scores 26 Chicken (y = calories, x = fat) a thru f only. Add 1d, find y-bar (try for r, but don't worry if you can't). Postpone the rest C. Use Residuals.xls from here or the lab(in ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V) to graph these data sets, along with a graph of the residuals. Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) a) x 1 2 8 4 6 9 y 1 3 6 6 7 5 b) x 1 2 7 4 6 9 y 7 6 2 4 2 1 3 Residuals . 32 Birthrates (type the data into SPSS. Make a plot of residuals also, to help with 32c) SPSS Handout p. 3 (Governer's salaries) : Add #12. You should now have done all but #10. Keep till we finish that. |
Read, to
discuss Was assigned day 15: Look at it again, with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file RegressionSlope (or in the folder RegressionDemosExcel for D&V in ClassMaterial\Math151 D&V). Change x-y values in the yellow boxes and watch the line change. Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x. Fix it so the increase in x (the "run") is exactly 1. Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.'s in y for each s.d. run in x. |
Optional |
Extrapolation: (p. 148&163-5) Using the line to predict for x's outside the range of the data: The association may change away from what you have data for. Be cautious! especially in predicting far into future.
The Regression line equation:
If we standardize both the x-values and the
y-values,
the slope will just = r ! zyhat = r zx
And the intercept will be at
(0,0)
(Which was the point given by the two means,
(xbar, ybar)
in the original graph.)
govsalstd.sav,
Govsalstd2.doc . (also in SPSS for Class 05
folder--output
file won't work)
Also Excel, RegressionSlope.xls
in
ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V
To find the equation yhat
= b0 +
b1 x in
"real" units: calculate the "coefficients" b1,
b0
b1
:
A change of one standard deviation in x corresponds to a change of r
standard
deviations in y, along the regression line.
The slope b1
expresses change in y-units per x-unit. (Suppose
x
is inches, y is pounds. Then b1
is in pounds per inch.) You can
find b by multiplying r by the standard deviation of the y's (that's in
pounds) and dividing by the standard deviation of the x's (that's
in inches) . In algebra (p. 140)
b1 = r times (s.d. of
y)/(s.d. of x)
b0
: The line goes through (xbar,
ybar).
If you know this, you know ybar = b0+
b1(xbar). You can solve this for
b0
,
b0=
ybar - b1(xbar).
So, if you have the means and standard deviations and
r, you can find the regression equation.
See p. 141, text. That
example
is incomplete: they found the b's but didn't write the equation:
Av.cost-per-person = 2,266.61 -36.21
Peak-fwy-speed.
Check units.
P.142 slope = -36.21 $/mph: For every mph
increase
in peak freeway speed, there is a decrease in cost of $36.21 per
person. Or: "Traffic delays cost each urban area resident
about
$36 for every mph the freeways are slowed at peakperiod."
Residual: Look at an
individual
observed (x,y) data pair. The residual is the "leftover" amount
of
y after predicting a y using the line. Visually, length of
vertical
line drawn from y to regression line (+ if point is above line, -
if point is below line)
Residual = observed - predicted
=
Data - Model e = y -yhat.
Start here Wed:
SPSS (handout, p. 3, bottom: In Edit mode,
Insert>Spikes: Spike to: Regression)
Govsal-Deviations.doc
Pattern in graph of residuals: (p.162) If you
graph
residual values against x (or against predicted y's), you eliminate
visually the linear portion of the association. (The regression line
"becomes"
the new x-axis; a "shear" transformation)
Excel Residuals.xls in
ClassMaterial\Math151 D&V\RegressionDemosExcel for D&V
Curving or other structure may stand out more visibly. "Good"
fit = no structure in residuals.
.
SPSS: (old wing) (Handout bottom p.4&3)
Analyze>Regression>Linear.
Plots button, *ZRESID on *ZPRED. Save button, Residuals:
Unstandardized
calculates all the residuals and saves them as a new
variable; you can graph residuals on "x".
"Least squares" (D&Vp.144,
AS8-3Activity1&2) The
regression line is the line that minimizes the sums of the squared
residuals. (RegressionLeastSqs.xls,
or
in Mac 101, ClassMaterials\Math151 D&V\ RegressionDemosExcel for
D&V\RegressionLeastSqs.xls)
&&This
method of finding a "best fit" straight line for predicting y's from
x's
was derived mathematically to work well with "joint normal"
data--elliptical
clouds. For data of this sort, the line does give
the
mean of the y's for each given x (at least in the abstract.)
ActivStats Least Squares tool:
AS8-3,
rightmost button, with line and red dots. "Show" button.
Checkmark
all possibilities. Uncheck "ShowLS Line". Choose number of points, Do
"Regenerate".
Move green line to minimize Sum of Squares (red bar), and observe
residuals
as you do. Confirm your result by checking
"ShowLSLine".
(The Moore web applet does
the same but can't draw the residual lines.)
"Regenerate" created "good clouds" of data. To use your own data,
do "Reset"; click in the picture to make dots (but not
too
close to the green line or it will think you're dragging that.).
Next---
R-squared : The Line formula
yhat
= b0 + b1 x
tells us our best prediction or estimate of a response (y)
value
for a particular value of the explanatory (x) value. It says
NOTHING
about how good that "best" is--that is, it says nothing about how tight
or scattered the data is around the line. R-squared
does that job.
R2 (= r2
= "Coefficient of Determination") = Proportion of
variability
in y-values explained/accounted for by knowing x and using the
regression
line model. More on this next time.
| Sievers home | Math151-Sp06/Daysp16.htm | 2pm | 2/6/06 |