SPSS: Residuals and DIFFITs--Linear
Regression, Save button--adds columns of these values to your data
file; then you can analyze them however you want.
SPSS gives 5 choices for residuals. Unstandardized
is the raw. What Moore calls "Studentized" (p. 163) is called
"Deleted" in SPSS--"Deleted" will emphasize outliers, since it compares
each value with the pack not including itself. (SPSS's Studentized
uses a "standard deviation" that factors in how far a point is from
the middle on the x-line. It also tends to emphasize outliers.)
It doesn't seem to make a lot of difference which you use.
Perhaps start with the normal probabality plots to spot oddballs. If you used Analyze>Desc.Stats.>Explore: Set Markers By, then Plots: Normality plots, you can identify the oddballs with Point ID. Explore's normal probability plot sometimes leaves off the smallest value. Bad bug! Use Graphs/QQ plot instead (at least to see if they're the same). Downside--can't label points except with row numbers. But ask for Outliers in the Explore: Statistics button and it will give you a list, numbers and labels if you chose labels in the main box.
Other profitable explorations are the residuals
vs. the independent variable ("detrending" the values), Diffits vs. independent
variable, to see where the outliers came from.
Also either vs. order of observation (looking
for a "fatigue" or "running in" factor) (Graphs>Sequence)
- - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - -
CAUTIONS:
Association does not prove causation! (Sec.
2.7)
Correlation/regression only capture linear
association (lots of things are almost linear over a short interval)
Extrapolation (but
maybe not linear over a longer interval)
Restricted-range problem
(range not enough to uncover true relationship)
Lurking variables
influential points, outliers (squared
errors make very non-resistant)
Mixing 2 (or more) groups can diffuse
or even reverse association (pp. 167-8--"Simpson's Paradox")
Averaged data will make stronger correlation
than nonaveraged. (country data)
Day 11, Monday Sept 24, finishing text
2.3, 2.4. SPSS manual sec. 2.2, pp.62-66top.
Next: Proceed onward through ch. 2: 2.5
next, then 2.6, 2.7
| Hand in:
p. 151, 2.42 degree days, predict both ways. Also Graph both. If you have the computer skills, bring both graphs into Word, and use the drawing tools to flip one around the diagonal so they both have the same axes. This is (more or less) what was done for Fig. 2.16 (Hubble). Residuals/influential points p. 171, 2.54 gas chromatography-plot residuals p. 176, 2.64 particulates Also with part c, find the DIFFITs values, plot them (do a histogram and a QQ plot from the Graphs menu) and see if its results match your eyeball. |
Read, discuss
2.67 mean stride rates/raw
2.68 Baseball salaries--resid
|
Optional
2.53 golf Use SPSS, and find the DIFFITS for the 11 points. See how these pick out the outlier/influential point.. |
| Sievers home | Math251-Fall01/DayP11.htm | 10pm | 9/23/01 |