Math 151 , Fall 2006 Wednesday Day 18, Oct. 4 Repaired Hit reload...

HW:  Reading:  For Wednesday:  (Re) read pp. 133-136.  Skip Chapter 6.  Read Chapter 8.  Check p. 206, 8.17-22, 26 at first.
Hand in  Wednesday (have a great break!)

A .  Use the Excel RSquared page. ( R-Squared (or RSquared.xls: ClassMaterial\Math151BPS4e\RegressionDemosExcel BPS4eLab version is difficult to use.If R-Squared doesn't have "repaired 10/3/06 in cell 2-o, use this link: R-Squared2 ). Shift points around and get an r2 close to .8 (80%) (Between .75 and .85 is good enough.).  Note that if r = +.9, then  r2 = .81.   Now shift the points so that r is negative and r2 is close to .8.  Print the resulting page to hand in. (Data and graph)

Read, to discuss 


Optional 


Exam 2 Friday Oct. 6, Day 19, Today. One sheet of notes; I will give you normal tables.
Sign up today on signup sheet to start early Friday.  Contact me today at the latest for any other arrangements.
 Sample exam handed out, & Solutions available outside my office  & on reserve.  You should be able to do all the parts on the sample exam.
   Covers Chapters 3, Normal distribution (with tables), 4&5, Scatterplots, Correlation, Regression, to p. 132 (Extrapolation).
You will not be asked how to CREATE SPSS output.  You may be asked to USE and INTERPRET SPSS output.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Pick a digit (from 0,1,2,3,4,5,6,7,8,9).  Write it down.  Write it to the left of your name on the sign in sheet .
Income depends on height?!
    a) What is "$789", and what kind of analysis did they do? b) Why should short people not despair?

HW questions?
--Don't trust just summary data. 
Need to see the scatterplot to see how suitable the summary numbers are.
--Extrapolation.   
Watch out for it.
--Residuals plot: 
Takes away the "linear" part of the relationship; sometimes other structure can be seen.
--Examples from HW, involving extrapolation and residuals plots:  ex 7-28 Soap  data,    output
     ex5-9 Farm population dataoutput
 

Cautions  pp. 132-136
Plot the data: Summary formulas and numbers don't tell the whole story.  In particular, correlation and regression line only describe a linear relationship properly.
Correlation and regression are not resistant to outliers, influential points.("Anscombe's quartet", Moore p.142, 5.34)
(Overhead slide.   You can reconstruct these pictures using SPSS and Moore's problem, if you like.)

Extrapolation-- extra (outside) polation (putting a point): Using the line to predict outside the range of x's you have data for.  Linear relationships don't go on forever; straight line  is often a first approximation to a more complicated relationship.
Government projections of national budget surplus/deficit:  (www.cbo.gov publications>search)
 Jan. 2001 http://www.cbo.gov/showdoc.cfm?index=2727&sequence=6  Projection used to justify Bush tax cuts.
Jan. 2002   http://www.cbo.gov/showdoc.cfm?index=3277&sequence=6
August 2006 http://www.cbo.gov/ftpdocs/74xx/doc7492/08-17-BudgetUpdate.pdf  
     Pdf p. 19, single line projection--10 years, p. 36, uncertainty--6 years.
 June 2000, conservative think tank analysis http://www.policyreview.org/jun00/oneill.html,
      Fig 1, budget surplus/deficit 1901 on.  Notice only previous longterm surplus is 1920's,
      Fig. 6 --1960 on, & projections

Some more comments: (Optional)
Fact 4: R2
(= r2 = "Coefficient of Determination") = Proportion of variability in y-values explained/accounted for by knowing x and using the  regression line model.

  Un-accounted-for-variability =(1-r2) = variance-of-residuals / total-variance-of-y's
More:R-Squared (ClassMaterials\Math151 D&V\ RegressionDemosExcel for D&V\RSquared.xls))
(Optional: Further explanation of r2)
r2 is the square of the correlation coefficient r!  (-, + Sign gets lost.)
If r = .7, about half (.49) of the variability  in the y's is accounted for by  using the regression line model to predict y from x. (If weight and height have a correlation of .7, then half of the variability in weight can be accounted for by height.)
NOTE:  The standard deviation doesn't say anything about the distance of any individual point from the mean; it's only about a kind of "average" variability.  R2 doesn't say anything about the line and any particular (x,y) pair --just about a kind of "average" goodness of the fit of the line and the data.

Line is not symmetric: Fact 1 The regression of weight on height uses a different line from the regression of height on weight.  (Minimizing vertical  residuals pulls line "flatter" than  the line that just goes through the middle of the cloud, which would rise 1 s.d. up for one s.d. run.  Related to the idea of "regression to the mean" p. 124)
   Demonstration on overhead projector; flip transparency to exchange axes.


Monday, after exam
"Lurking" variable has an important effect, but not one of the variables studied.
    Meatloaf shrinkage vs. placement in oven?  (cooking thermometer/not had greatest influence)
    Time sequence of observations a common one.  (Learning, tiring, aging)
    The trouble with lurking variables is that by definition you don't know they're there.  Look behind every tree.

Association does not imply causation
Strong association/correlation between A and B could be:
     A causes B/   B causes A/  C causes both A and B (lurking C)/  just Chance that they go together in this data set.    
Direction?  Rooster causes sun to rise by crowing?
Both variables "caused" by a lurking variable?   Lurking variable can be part of the cause
--Women with a history of heavy antibiotic use have higher rates of breast cancer.
--Baby rats whose mothers licked and groomed them more   grew up to be more exploratory, social, less timid.
            Cause? Effect?  How to tell?

Establishing that x "causes" y:  difficult:
    Best: Do an experiment in which we change x, keep lurking variables under control. (E.g.   Rats.  Ch.9)
    Otherwise: Strong association. Consistent over many studies. Higher x-->stronger y.  X precedes y in time.  A plausible mechanism exists (parallel studies?)
                Generalize rat grooming to humans?

         E.g.Partially  hydrogenated oils --> heart disease?  Homocysteines --> heart disease?


Sievers home   Math151-Fall06/Daym18.htm  8:30 pm 10/5/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.