Math 151 , Fall 2006, Wednesday Day 15, Sept
27Hit reload.. .
HW assignment Day 15
Reading: Ch.
5, Regression, thru p. 125 (check p. 137: 5.14 through 20,
basic line and regression line
facts and tools. 21 r and slope, 22 is harder--changing
units--don't worry about it. 23 If you sketch the graph and draw a line
thru the points, you should be able to guesstimate the slope well
enough to choose among the 3 answers.) Next, Continuing
regression, p.
126-137.
Regression
C. Use the SPSS Scatterplot handout and graph the
regression
line for govsal
on avgpay (as shown, back
page), also the lines for the 4 separate groups (either on one graph or
on panels.) Print them out and keep them. Answer
questions 6-9, 11, on p. 3 of the handout. Keep with the previous ones
till you can
answer all questions.(only 10, 12 to go)
Hand in Friday--
p. 118, 5.1 IQ and reading scores. Graph, slope,
predict. notice we don't have a scatterplot of the data,
only this straight-line summary.
p. 118, 5.2 equation from info. As written, this
is an algebra problem, not too hard, but not in the main focus of
the course. I will tell you that the intercept is -50, and
now the question is in the main focus of the course.
That is, what is the slope, and what is the equation?
p. 122, 5.4 (SPSS) Sparrowhawk colonies Use SPSS to make
the scatterplot, with the line, and find r. Do (c) and (d) by
hand. Now use the "up and over" method of Fig. 5.1, p.116,
with a
pencil and straightedge to mark the predicted value from (d) on the
y-scale. Write down your computed
answer
next to it. Make sure the
two methods give consistent answers.
p. 139, 5.24 Penguins diving
p. 148, 5.54 (Applet) regression suitability
p. 140, 5.26 (SPSS) sisters & brothers
p. 146, 5.42 (SPSS) A computer circle game The
last part of the last question, "Give numerical measures that describe
the success of the two regressions," is asking for you to use
Fact 4.
|
Read,
to
discuss
Regression: Use
http://www.whfreeman.com/bps4e,
Correlation and Regression applet to do p. 148, 5.55 , guessing
lines
|
Op
tion
al
|
= = = = = = = = = = = = = = = = = = = =
= =
Day13 HW (Scatterplots). Many people didn't realize the
SPSS work was due, all except for the questions on the handout. I
decided to not take off for this, but unfortunately, after Jennifer had
marked the papers. Please hand back in your returned paper
with the -- along with your SPSS work for that day, to get the
correct grade.
HW questions? Day 14 educ-v-mortality.sav
Leftover: Timeplots: are
scatterplots, where the x axis shows time.
(Time is often
a lurking variable: plot data against order of taking
observations)
- - - - - - - - - - -
Regression
line: Ch. 6, Predicts or estimates a y (vertical)
value for a given
x (horizontal) value: Straight line!
"Regressing y ON
x" .
P104, 4.10, corn plant density. Made a regression
CURVE!
"Regression" with no other description means "Least squares best fit
line"--STRAIGHT line.
Experimenting
http://www.whfreeman.com/bps4e,
Correlation and Regression Applet.
SPSS--back of handout. Govsal
on avgpay
Formula yhat = a + b x. Govsal = a
+
b avgpay
Govsal = 28,569.69 + 2.71*avgpay
To predict
or
estimate a y-value for a given x-value, plug the x value into
the
formula and calculate.
To do it graphically, use the Up-and-Over method (Fig. 5.1, p.116):
Find the x, go straight up to the line, then go over to the y-axis;
that
y-value is the predicted y.
Calculating:
Montana (17,895,
55,502) Govsal = 28,569.69 + 2.71*avgpay
Predicted
Govsal
= 28,569.69 + 2.71*17,895 = 28,569.69 + 48,495.45 = 77,065.14
(higher than actual)
(Graphing a straight line: pick an x-value at one end of the
useful range. Plug in to the formula and calculate the
corresponding y. Graph the (x,y) pair. Repeat with an x
value at the other end of the range. Connect the 2 dots with a
line (see pretest). Insurance: Pick a third x and calculate
the y. This point must also lie on the line, if you did it right.)
a is y-intercept.
b is slope:
If x increases one unit, yhat increases b
units.
If you know that yhat increases 12 units for every one that x
increases, you know that the slope of the line b = 12.
Governor's salaries increase (on the average across the states)
$2.71 for every increase of $1 of average pay.
This is a summary of the linear
relationship, in the same way that the mean of a distribution is one
summary of the distribution. Particular states won't match this
exactly.
(In a straight-line relationship, the amount that y
increases
for one unit increase in x is the same no matter what value of
x
you start with) RegressionSlope.xls
or
in ClassMaterial\Math151-BPS4e \RegressionDemos Excel BPS4e
We all get the same line from a batch of data because we use the
"least-squares
best fit" criterion (p. 119): we'll investigate this more closely later.
Facts: 1, 3 first. Then 2. Then
4. Formulas p. 120, from 2&3. More on 4 .
Facts (Moore
pp.
123-125)
- Which is explanatory, which is response, is crucial for
regression! The Regression line is trying to predict the
"average y" for a given
x (with the added requirement that it is a straight
line).
See "residual" lines for govsal on
avgpay.
Unless the data lies perfectly on a straight line, the line
for predicting weight from height -- "regressing weight on height"
--(for
example) will NOT be the same line
as that for predicting height from weight--"regressing height on
weight".
(In-class demonstration) (Example 5.3, Fig. 5.4 pp.123-4 is about this.
)
- A change of one standard deviation in x corresponds to
a
change of r
standard deviations in y, along the regression line.
The slope b expresses
change
in y-units per x-unit. (Suppose x is
inches,
y is pounds. Then b is in pounds per inch.) You can find b
by
multiplying r by the standard deviation of the y's (that's in
pounds)
and dividing by the standard deviation of the x's (that's in inches)
In "algebra", b = r times (s.d. of
y)/(s.d. of x) (Equation p. 120)
If we standardize both the
x-values and the y-values, the slope will just = r !
govsalstd.sav,
Govsalstd2.doc RegressionSlope.xls
- The regression line goes through the point given by
the
two means, (xbar, ybar). http://www.whfreeman.com/bps4e
--If you know this, you know ybar = a
+ b (xbar). You can solve this for a, a
= ybar - b (xbar). (OtherEquation
p. 120)
--So knowing 2 and 3 give you the equation of the line from the means,
s.d.'s, and r.
--And if you draw the two lines, y on x and x on y, they will intersect
at (xbar, ybar)
- r2 ("Coefficient
of Determination") = fraction of the variation in y-values
explained/predicted by
knowing x and using the least squares regression line. (Exactly
what that means mathematically is hard. Just get used to it as a
measurement.) More:R-Squared (or
R-squared tab in ResidualsRSquared.xls:
ClassMaterial\Math151\RegressionDemos)
r2 is the square of the correlation
coefficient r! (-, + Sign gets lost.)
If r = .7, about half (.49) of the variability
in the y's is explained by using the regression line relationship to
predict
y from x. (If weight and height have a correlation of .7, then half of
the variability in weight can be explained by knowing height. Or vice
versa...)
This page belongs to Sally Sievers who is solely
responsible
for its content. Please see our statement
of responsibility.