Handout (log transformation, 1-variable). + IPS 5th ed. pp.
143-5
+ Sec. 2.6: 1 copy outside my door, + IPS4e on
reserve.
Was Sec. 2.6 in IPS 4th ed. (pp. 187-203 for
text.
Figures are -2 from download--fig. 2.30 in 4th ed is 2.32 in 5th)
or download it. (Website, "Supplemental Material" , or
it
may
be on your CD. Mine was
missing
all the figures and tables). We want pp. 2-18 for the
text. Download Acrobat file of
pp.2-25. I'm giving you the HW problems I'm asking for, in the
Handout. 2.118 on are in 2.6.
|
Hand in: Preview for Transforming:
- - - - - Postpone the
rest.- - - - - - - - - - - - - A. (SPSS) Table 1.5 (tornado damage) and Table 1.8 (guinea pig survival) gave histograms highly skewed right. For each of these data sets: Make a histogram, take the log of the data and make a new histogram. Tell if this transformation makes a "nicer" (more symmetric) graph. Problems are on handout. SPSS files are linked
to from here . (The .sav files are now on
the website. They don't seem to want to open directly into SPSS, at
least on my office machine, though they should..You'll probably need
to download them, then open with SPSS..The
.por files are still there on the website, but not directly linked to
any more..) |
Read, discuss 2.78 Applet exploration of outlier. Watch also r, and think about r-squared. 2.67 grade inflation 2.69 fidgeting or BMR? look in the back for the numbers. 2.76 mean stride rates/raw 2.83 baseball pay--reading residuals
p. 179 |
Optional Postpone: For problem A, if the log transformation didn't do a good job, work through the ladder of powers and look for one that does better. On handout: |
R-squared? Day 11
Residuals (2.4): "DEtrend" the data by graphing residuals--then y=0
line replaces slanted regression line. Residuals
should show no clear
patterns, if the regression line's a good fit. By
"detrending" the data
set, sometimes subtle
characteristics (like a curve) are uncovered. Excel Residuals
SPSS:
Analyze> Linear
Regression, horizontal axis
variable to Independent box, vertical axis variable to Dependent
box. Save button--adds columns of these values to your
data
file; then you can analyze them however you want. Choose Residuals: Unstandardized and Predicted
values: Unstandardized .
See Scatterplot handout, bottom pp. 4 and 3. The Plots
button gives residuals on the y-predicted variable! not the x-variable
as IPS shows. Doesn't matter much, since y-predicted is a linear
transformation of x, but if the slope is negative, they'll look
"backward".
"Anscombe's quartet:" summary numbers are not sufficient to describe relationship! (Data p. 169, ex. 2.80)
2.5 Causation:
Association (correlation) does not imply causation!
Association diagrams: dotted
lines= association, solid = causation. Good tool (p.174)
x causes y? Maybe y causes x.
Common response to another variable (lurker)?
Confounding: 2 or more "explanatory" variables are
associated
strongly; can't sort out which one response is "due to". (And
they
may be lurkers.)
How to establish causation? x causes y
Experiment: control all variables except the
potential
explanatory variables; randomize out uncontrollable factors (Ch. 3)
Otherwise: p. 178, criteria:
Strong association; consistent
in different contexts. Higher "dose" of x--> stronger response
of
y.
x precedes y. Plausible
"mechanism" why x should cause y.
- - - - - - - -Start here Fri.-
- - - - - - - - - - - - - - - -
Transforming variables (handout, plus Sec. 2.6)
Exponential growth. (growth by percentages)
-- In an actual "growth" situation, taking logarithms often turns
the growth curve into a straight line, or at least does the "growth"
analog
of "detrending" and makes deviations from the expected percentage
growth
more visible.
--Many other kinds of data benefit from log transformations:
>Where numbers are all >0, and larger values can be thought
of
naturally
as multiples of smaller ones.
>Where the histogram distribution is J-shaped, many observations at
small values and fewer and fewer at larger and larger values.
E.g.
earthquake severity (Richter scale is already log of amplitude),
populations
of all nations.
> Other times...
--We usually use log base 10, for ease in interpretation. Then
raw value log The
leading
log digit tells what place
1-10
0-1
the leading raw digit takes.
10-100 1-2
100-1000 2-3
Other transformations: powers, reciprocals.
Need monotonic transformation to retain the order of
data
points. If necessary, shift the data by adding a constant so all
values are > 0.
"Ladder of Powers" (Fig. 2.36) xp
log x lies at p=0. (If p is negative, xp
reverses order of data, < to >. Use - xp.
p > 1: will pull in the left tail of a
distribution and stretch out the right tail. (Making a left skewed
distribution more symmetrical.) Stronger for higher p.
p < 1: will stretch out the left tail of
a distribution and pull in the right tail. (Making a right skewed
distribution more symmetrical.) Stronger for lower p.
Relationships
Exponential growth y = a bx becomes
log y = log(a) + x log(b).
(x, log y) values have a linear relationship. Fit
with regression, solve back for y. (can use log10 or
ln.)
e.g. log y = 2 + 3 x
--> y = 102 + 3 x = 102
10 3 x = 100(1000x )
Powers y = axp becomes log y = log a + p log x.
(log x, log y) values have a linear relationship, and the
fitted slope p "is" the power.
| Sievers home | Math251-Fall07/Day2s12.htm | 1:46pm | 9/20/07 |