| Bring questions for exam. p. 125, 5.5 (SPSS. Let SPSS find the regression line. Get the mean yield and mean planting rate too--you need it for part c) Corn again, straight line is a "bad fit." My book has a misprint in (c). Should be "when xbar is the mean planting rate". . . . . . . . . . . . p 179, 7.28, 29, 30 (SPSS) Soap in the shower.
Also, look carefully at the graph and guess why there is no data after
day 21. (Read p. 132 for the word to describe using the line for
day 30, and a discussion of the issue) Cautions Residuals |
Read, A. Practice Calculating line formula, following up class work, notes below. Highlight space after question to see worked solution. B. Look at Excel spreadsheet
RegressionSlope07, |
Op Added:
|
Formula yhat = a + b x. Govsal = a
+
b avgpay
Govsal = 28,569.69 + 2.709*avgpay
Calculating:
Montana (17,895,
55,502) Govsal = 28,569.69 + 2.709*avgpay
Predicted
Govsal
= 28,569.69 + 2.709*17,895 = 28,569.69 48,477.56 = 77,047.25
(higher than actual)
a is y-intercept.
b is slope:
If x increases one unit, yhat increases b
units.
Governor's salaries increase (on the average across the states)
$2.71 for every increase of $1 of average pay.
(In a straight-line relationship, the amount that y
increases
for one unit increase in x is the same no matter what value of
x
you start with) RegressionSlope.xls
or
in ClassMaterial\Math151-BPS4e \RegressionDemos Excel BPS4e
r2 ("Coefficient of Determination") = fraction of the variation in y-values explained/predicted by knowing x and using the least squares regression line. (Fact 4)
HW: Income depends on height?!
What is "$789", and what kind of analysis
did they do? Regression.
$789 is the slope of the regression of Pay on Height. Less than
15% of the variability in Pay is explainable by (regression on ) height.
5.42 p. 146, a computer game, revisited. Can it really be
that only about 9% of the variability in speed of the right
hand is accounted for by the distance? The eye is fooled by the
graph, with the right hand data squashed down at the bottom and looking
really linear. Here is the right
hand by itself.
We all get the same line from a batch of data because we use the "least-squares best fit" criterion (p. 119): we'll investigate this more closely later.
Facts: 1, 2 lite, 3 first. Then 4. Then 2 &Formulas p. 120, from 2&3.
Facts again (Moore pp. 123-125)
New:
Facts 2 &3 give line formula! (Moore pp.
123-125)
2. A change of one standard deviation in x corresponds
to a change of r standard deviations in y, along the regression
line.
y = a + bx: The slope b
expresses change in y-units per x-unit. (Suppose
x is inches, y is pounds. Then b is in pounds per inch.)
You can find b by multiplying r by the standard deviation of the y's
(that's in pounds) and dividing by the standard deviation of the
x's (that's in inches)
In "algebra", b = r times (s.d. of
y)/(s.d. of x) (Equation p. 120)
If we standardize both the
x-values and the y-values, the slope will just = r !
govsalstd.sav, Govsalstd2.doc RegressionSlope.xls or RegressionSlope07.xls(for Excel07)
3. The regression line goes through the point given
by the two means, (xbar, ybar). http://www.whfreeman.com/bps4e
--If you know this, you know ybar = a
+ b (xbar). You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 120)
--So knowing 2 and 3 give you the equation of the line from the means,
s.d.'s, and r.
--And if you draw the two lines, y on x and x on y, they will intersect
at (xbar, ybar)
[Algebra lovers: have point
(xbar, ybar) and slope b of a line; can write equation]
The line formula yhat
= a + bx
from xbar, ybar, sx , sy , r:
Find b:
b = r sy / sx
(uses Fact 2: r is slope if x and y are
standardized. Equation p. 120)
Find a:
Solve ybar = a + b xbar for a: a = ybar - b xbar
(uses Fact 3: (xbar, ybar) lies on the
regression line(s). Equation p. 109)
Example. x is measured in
Rangs, y in Zobs
xbar = 5 Rangs, ybar = 8 Zobs, sx = 10
Rangs, sy = 6 Zobs , r = -.3:
b
= -.3×6/10 (Zobs/Rang) = - 0.18
Zobs/Rang.
8 = a + (-0.18)×5
8 Zobs = aZobs
+ (-0.18)(Zobs/Rang) ×5
Rangs
8 = a - .90
a = 8.90
Zobs yhat = 8.95
-0.18x Zobs
"A." Try it at
desk/home: xbar
= 7 cm, ybar = 8 oz.
sx = 4 cm, sy
= 10 oz , r = .6 (highlite
space just below here for solution.)
b
= .6×10/4 (oz/cm) = 1.5
oz/cm.
8 = a + (1.5)×7cm
8 oz = a oz + (1.5)(oz/cm)
×7cm
8 = a + 10.5
a = 8-10.5 = -2.5 oz
yhat = -2.5 +1.5x oz
Extrapolation-- extra (outside) polation (putting a point): Using the
line to predict outside the range of x's you have data for. Dangerous!
Linear relationships don't go on forever; straight
line is often a first approximation to a more complicated
relationship.
"Lurking" variable: has an important effect, but not one of the variables
studied.
Govsal vs. pay: Size of state (population
and/or area) should affect salary.
Meatloaf shrinkage vs.
placement in oven? (cooking thermometer/not had greatest
influence)
Time sequence of
observations a common lurker. (Learning, tiring, aging)
The trouble with lurking
variables is that by definition you don't know they're there.
Look behind every tree.
| Sievers home | Math151-Sp09/Daysp16.htm | 3:40pm | 3/2/09 |