MATH 251, Probability and Statistics I, Fall 2005, Sept. 21, Day 12After class, downloads fixed!

Reading:  Sections 2.5, Causality.

Handout (log transformation, 1-variable). + IPS 5th ed. pp. 143-5
 + Sec. 2.6: 1 copy outside my door, 1 on reserve.  Was Sec. 2.6 in IPS 4th ed. (pp. 187-203 for text.  Figures are -2 from download--fig. 2.30 in 4th ed is 2.32 in 5th)  or Download Acrobat file (Website, or it may be on your CD; "Supplemental Material".  Mine was missing all the figures and tables).  We want pp. 2-18 for the text.  I'm giving you the HW problems I'm asking for, in the Handout.
Hand in: 
Sec. 2.5 causality
2.88  health and wealth
2.89 music
2.91  miscarriage and transistors. What information would be helpful to study/eliminate the confounding variable of standing up?
2.92 hospital stay/size
- - - - - - - - - - - - - - - - - - 
Transforming:  For the following you may need to Transform your x or y-data to a new variable in SPSS.  Use Transform>compute:  Use the function LG10( ) for the log base 10, LN( ) for natural log,  x^3 for x cubed.  Use  log base 10  unless told otherwise; but it really doesn't matter much. 
A. (SPSS)  Table 1.5 (tornado damage) and Table 1.8 (guinea pig survival) gave histograms highly skewed right.  For each of these data sets:  Make a histogram, take the log of the data and make a new histogram.  Tell if this transformation makes a "nicer" (more symmetric) graph.

Postpone the rest:  Will be assigned next time.
Problems are on handout. SPSS files will be linked to from here when I get them tracked down and relabeled, this afternoon.  Solutions
(You'll need to download them, then open with SPSS) 2.118 (not spss)  b, d Monotonic
2.123 (SPSS) fish weight
2.124 (SPSS) fish width (above file)
2.129 (SPSS) American population
2.121 (SPSS)  isotope decay
2.136 heart rate
2.131 tree biomass
2.138 (SPSS) tree seeds

Read, discuss 
p. 179
2.85 marriage
- - - - - 
2.118 a, d
2.119 sin
Postpone:
2.134, 2.135 strength, weight. 
 
Optional
For problem A, if the log transformation didn't do a good job, work through the ladder of powers and look for one that does better.

Postpone:
2.120 transistors , Moore's law

Thank you for taking care of yourselves Monday.
HW Questions?
Some leftovers, 2.4:
Outlier may or may not be "influential", in terms of changing line.
   May increase r-squared (if "in line" and outlying  in x direction.)
   May decrease r-squared (if outlying in y-direction)
Restricted-range problem (range not enough to uncover true relationship, which could be more strongly linear if x's had a bigger range (IPS).  OR:  it might be curved--Extrapolation.)
Lurking variables.  Check residuals, x, y, against time or order of observation (timeplot)--(looking for a "fatigue" or "running in" lurking variable.)
"Anscombe's quartet:"  summary numbers are not sufficient to describe relationship! (Data p. 169, ex. 2.80)

2.5 Causation:
Association (correlation) does not imply causation!
     Association diagrams:  dotted lines= association, solid = causation.  Good tool.
 x causes y?  Maybe y causes x.
 Common response to another variable (lurker)?
 Confounding:  2 or more "explanatory" variables are associated strongly; can't sort out which one response is "due to".  (And they may be lurkers.)

How to establish causation? x causes y
   Experiment; control all variables except the potential explanatory variables; randomize out uncontrollable factors (Ch. 3)
   Otherwise: p. 178, criteria:
       Strong association; consistent in different contexts.  Higher "dose" of x--> stronger response of y.
       x precedes y.  Plausible "mechanism" why x should cause y.

- - - - - - - - - - - - - - - - - - - - - - - - -
Transforming variables (handout, plus Sec. 2.6)

Exponential growth.  (growth by percentages)
--  In an actual "growth" situation, taking logarithms often turns the growth curve into a straight line, or at least does the "growth" analog of "detrending" and makes deviations from the expected percentage growth more visible.

--Many other kinds of data benefit from log transformations:
>Where 0 is the "bottom" and larger values can be thought of naturally as multiples of smaller ones.
>Where the histogram distribution is J-shaped, many observations at small values and fewer and fewer at larger and larger values.  E.g. earthquake severity (Richter scale is already log of amplitude), populations of all nations.
> Other times...

--We usually use log base 10, for ease in interpretation.  Then
   raw value   log   The leading log digit tells what place
    1-10      0-1      the leading raw digit takes.
   10-100     1-2
  100-1000    2-3

Other transformations: powers, reciprocals.
Need monotonic  transformation to retain the order of data points.  If necessary, shift the data by adding a constant so all values are > 0.

"Ladder of Powers" (Fig. 2.36) xp    log x lies at p=0.  (If p is negative,  xp reverses order of data, < to >. Use - xp.
  p > 1:  will pull in  the left tail of a distribution and stretch out the right tail. (Making a left skewed distribution more symmetrical.)  Stronger for higher p.
  p < 1:  will stretch out  the left tail of a distribution and pull in the right tail. (Making a right skewed distribution more symmetrical.)  Stronger for lower p.

Start here Fri:
Relationships
Exponential growth  y = a bx  becomes  log y = log(a) + x log(b).
   (x, log y) values have a linear relationship.  Fit with regression, solve back for y.  (can use log10 or ln.)
       e.g. log y = 2 + 3 x   -->   y = 102 + 3 x  =  102 10 3 x  = 100(1000x )
Powers y = axp  becomes log y = log a + p log x.
   (log x, log y) values have a linear relationship, and the fitted slope p "is" the power.


Sievers home  Math251-Fall05/Dayps12.htm     11:45am    9/22/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.