| Hand
in Wed. Chapter 9: D&V p. 174ff. unless
otherwise noted: 1,3 Marriage age (Type your data into SPSS. Your answers may vary. Cf.#13p.73) 2 Age difference Groups, from ActivStats Ch7 HW (same datasets you used
before, different
questions)
Find Datasets in SPSS from this page B) Bear neck/weight (TRE-58-26) i) Make a single
scatterplot
with the regression lines for the 2 sexes. (Get rid of the Total
line) ii) Then do a graph with Panel variable Sex and regression
lines, better to see them separately. iii) Describe any bears
which
are outliers and/or influential points, and any ways in which the
data are not well modeled by the straight lines. 15, 16 Gestation (note, these are summarized
data) |
Read,
to discuss Important! Ch.9 7a-d Reading |
Optional:
Use Activstats Least Squares tool, (see below) and play with datasets; especially drag points around and see what they do. |
Regression line:
D&V
Ch 8&9, AS8&9, "Regressing y ON x"
Formula yhat = b0 + b1 x,
b1
= r times (s.d. of y)/(s.d. of x) = r sy /
sx,
b1 is in y-units per (/) x-unit
b0=
ybar
- b1(xbar) from ybar =
b0 + b1(xbar).
Residual:
Residual = observed - predicted
"Least squares" (D&Vp.144, AS8-3Activity1&2)
The
regression line is the line that minimizes the sums of the squared
residuals. See Day 16
R-squared : The Line formula
yhat = b0 + b1 x
tells us our best prediction or estimate of a response (y)
value
for a particular value of the explanatory (x) value. It says
NOTHING
about how good that "best" is--that is, it says nothing about how tight
or scattered the data is around the line. R-squared
does that job.
Chapter
9:
Regression (& correlation) wisdom: What
can go wrong, things to watch out for.
Groups (subsets) may benefit from being considered
separately.
Sometimes analyzing residuals can alert us to important subsets.
SPSS: Fitting lines to groups: Govsal_vs_pay Put a
grouping
variable in a Legend Variables box and Insert >FitLine>Regression
will
make a line for the whole and lines for each group. In Edit mode: Click
on a regression equation; then Edit>Regression Parameters allows
eliminating
the line for the Total (or the Subgroups lines) (Using Panel
Variables
box makes each group on a separate graph) Residuals
graphs and variables are only generated for the total group. To
do
separately, you'd need to do Data>Select Cases in the editor, and
work
with one group at a time.
Shape after linear trend removed: discussed with
Patterns
in Graphs of residuals.
Extrapolation p.163: last class also.
Linear
approximations may be good for short term segments, lousy in long term.
Outliers and Influential points: pp. 165-7 (Use
Moore http://www.whfreeman.com/scc,
or ASLeastSquaresTool)
(HW: "Read" not to hand in, but important.)
A point (or more) outside of the pack--an outlier--can :
--Weaken or strengthen r (& r2):
If it's in the same
direction
as the general trend, strengthens. Against the trend, weakens
.
-- Affect the slope of the regression line a lot (has
high "leverage"= is an "influential point"), if it's an outlier in the
x measurement. (Teeter-totter principle) We
won't
calculate leverage.
-- Affect the slope little, but:
-- strengthen r2 if it's
along the main trend but farther out.
-- pull the whole line up or down a bit, if
it's in the center of the data on the x -measurement and an
outlier
in y (Not an "influential point"
)
&& Two clusters with little internal trend could look like
a strong association when "combined." (Trend could even be
reversed!)
Anscombe's Quartet--4 (made-up) data sets with identical summary statistics (AS9HW: MRB127-46 Always Plot your Data!)
Summary values p.169: If the x and/or y data
have
already been averaged or summarized, the relationship you plot and/or
use
correlation/regression to describe, will look stronger than it
would
if you used raw data (you've already gotten rid of much of the
variability.)
&& Watch out for Data relating states, nations, groups .
Association does not imply causation---"Lurking"
variable:
(p. 168) has an important effect, but not one of the variables studied.
Meatloaf shrinkage vs.
placement
in oven? (cooking thermometer/not had greatest influence)
Time sequence of observations
a common one. (Learning, tiring, aging)
The trouble with lurking
variables is that by definition you don't know they're there.
Look
behind every tree.
| Sievers home | Math151-Sp06/Daysp19.htm | 4pm | 3/10/06 |