"Regression line of weight on height": height = horizontal (x) axis, weight = vertical (y) axis.
LEAST SQUARES PROPERTY
"Residual at x" = y - yhat = distance between observed y and
predicted y (what's left over after predicting)
( Positive if observed is bigger than predicted,
negative if observed is smaller than predicted)
Least squares principle: Find the line that minimizes
the sums of the squared residuals.(Here,
or
in Mac 101, ClassMaterials\Math151\ RegressionDemos\RegressionLine.xls,
Squares tab)
This method
of finding a "best fit" straight line for predicting y's from x's was derived
mathematically to work well with "joint normal" data--elliptical clouds.
For data of this sort, the line
does give the mean of the
y's for each given x (at least in the abstract.)
Drawback if the data is not the "elliptical cloud" type:
Outliers get their residual distance
squared: May be very
influential in determining where
line sits.
Especially if at lowest or highest x-values, may change slope
of line a lot.
(Activstats Least Squares tool: p. 9-2, Show
residuals, show #points.)
(Outliers
toward the middle x's may not change the slope, but may affect r and r2.)
Education and mortality in cities (ACT p. 8-3, bottom). Outliers?
Plotting residuals: This amounts to making the regression
line into a new x-axis--If you plot the residuals themselves vs.
the original x values, without the distraction of the slanted line, outliers
and patterns other than the linear (if any) can emerge.
(Here or ClassMaterials\Math151\RegressionDemos\ResidualsRSquared.xls
, Graph of Residuals tab.)
SPSS can make a new variable of residuals. Optional HW.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Will do all of 2.4 next time.
Cautions Sec. 2.4
Plot the data: Summary formulas and numbers
don't tell the whole story. (Anscombe's quartet, p.127, 2.46-7, also
in ACT HW ch.9)
Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for. Unavoidable if x is time; but inevitably dangerous--nothing says the mechanism you see will persist in a wider range.
Averaged data will produce a stronger relationship (higher correlation, R2) than the merged raw data from individuals (the averaging hides much variability) You did a problem on height vs. age--they were averaged values.
Lurking variables and association/causation
next time.
| I'll finish Moore 2.4, so read it. I
probably won't start Ch. 3 till
Monday, but here's (all or most of) the preclass work for it.
Activstats, Sample surveys, 10-1 Know Sample/Population, Simple Random Sample. (Don't get bogged down in taking your own potato sample. If it's confusing, skip it) Do the last activity p. 10-1, pop. size doesn't matter. 10-2 Know Bias, Voluntary Response bias, Nonresponse, Undercoverage. Do 2nd activity, write down Literary Digest Prediction, Actual vote percentages. |
HW assignment Day 17, Wednesday March 6,
ACT: From Activstats Homework, Moore: From The Basic
Practice of Statistics
Reading: Finish 2.3, read 2.4. Skip 2.5. Ahead in
Ch. 3.
| Hand in
A. Use ResidualsRSquared to graph these data sets, along with a graph of the residuals. Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) a) x 1 2 8 4 6 9 y 1 3 6 6 7 5 b) x 1 2 7 4 6 9 y 7 6 2 4 2 1 Moore p. 122, 2.36 speed&gas again a, b, c, d. There is a data file for problem 2.36, and its third column is the residuals (check them against the book). Moore p. 123, 2.38 Gesell first word-point in middle of x range. Get the data into SPSS, delete child 19, graph and get the regression line and r2. Use the formula on p.117 and graph the line for the full data set by hand on your printout. r2 for the full data set is on p. 122. Moore p. 122, 2.37 Calories (This data
set is in ACT ch 8 HW MRB-3)(or, from Moore's files, in TA02-04)Graph
and get lines in SPSS with and without the outliers. Graph the line
for "without outliers" by hand on the printout for "with outliers" so you
can compare them better. Print one more graph (with outliers) and
keep it for problem B below.
|
Read, | Optional
SPSS will make residuals: Do Analyze>Regression>Linear (a new menu for us) Click your variables into Independent (X) and Dependent(Y). Hit the Button "Save...": Checkbox Residuals: Unstandardized. (Also here is Distance:Leverage values, as in ACT 9-4) Continue, Ok out of the menus. You'll get output; ignore it. You'll get a new variable, the residuals. (and another, the leverages, if you do that) Try it with the data file for problem 2.36, with speed and gas. You'll get a fourth variable that should be the same as the residuals variable.
|
| Sievers home | Math151-Sp02/Day17.htm | 11pm | 3/05/02 |