| Hand in Wednesday (have a
great break!) A .
Use the Excel
RSquared
page.
( R-Squared
( |
Read, to discuss
|
Optional |
Cautions
pp. 132-136
Plot the data:
Summary formulas and numbers don't tell the whole story.
In particular, correlation and regression line only describe a linear
relationship properly.
Correlation and regression are not resistant to
outliers, influential points.("Anscombe's
quartet", Moore p.142, 5.34) (Overhead
slide. You can reconstruct these pictures using
SPSS and Moore's problem, if you like.)
Extrapolation--
extra (outside) polation (putting a point): Using the line to predict
outside
the range of x's you have data for. Linear relationships don't go
on forever; straight line is often a first approximation to a
more complicated relationship.
Government projections of national budget surplus/deficit:
(www.cbo.gov publications>search)
Jan. 2001 http://www.cbo.gov/showdoc.cfm?index=2727&sequence=6
Projection used to justify Bush tax cuts.
Jan. 2002
http://www.cbo.gov/showdoc.cfm?index=3277&sequence=6
August 2006
http://www.cbo.gov/ftpdocs/74xx/doc7492/08-17-BudgetUpdate.pdf
Pdf p. 19, single line projection--10 years,
p. 36, uncertainty--6 years.
June 2000, conservative think tank analysis http://www.policyreview.org/jun00/oneill.html,
Fig 1, budget surplus/deficit 1901
on. Notice only previous longterm surplus is 1920's,
Fig. 6 --1960 on, & projections
Some more comments: (Optional)
Fact 4: R2 (= r2
= "Coefficient of Determination") = Proportion of
variability
in y-values explained/accounted for by knowing x and using the
regression
line model.
Un-accounted-for-variability =(1-r2) =
variance-of-residuals
/ total-variance-of-y's
More:R-Squared (ClassMaterials\Math151
D&V\ RegressionDemosExcel for D&V\RSquared.xls))
(Optional: Further
explanation
of
r2)
r2 is the square of the correlation
coefficient r! (-, + Sign gets lost.)
If r = .7, about half (.49) of the variability
in the y's is accounted for by using the regression line model to
predict y from x. (If weight and height have a correlation of .7, then
half of the variability in weight can be accounted for by height.)
NOTE: The standard deviation doesn't say anything about
the distance of any individual point from the mean; it's only
about
a kind of "average" variability. R2
doesn't say anything about the line and any particular (x,y)
pair
--just about a kind of "average" goodness of the fit of the
line
and the data.
Line is not symmetric: Fact 1 The
regression
of weight on height uses a different line from the regression
of
height on weight. (Minimizing vertical residuals
pulls
line "flatter" than the line that just goes through the middle of
the cloud, which would rise 1 s.d. up for one s.d. run. Related
to
the idea of "regression to the mean" p. 124)
Demonstration on overhead
projector; flip transparency to exchange axes.
Association does not
imply
causation
Strong association/correlation between A and B could be:
A causes B/ B causes A/ C
causes both
A and B (lurking C)/ just Chance that they go together in this
data
set.
Direction? Rooster causes sun to rise by
crowing?
Both variables "caused" by a lurking variable?
Lurking variable can be part of the cause
--Women with a history of heavy antibiotic use have higher rates of
breast cancer.
--Baby rats whose mothers licked and groomed
them more grew up to be more exploratory, social, less
timid.
Cause? Effect? How to tell?
Establishing that x "causes" y:
difficult:
Best: Do an experiment
in which we change x, keep lurking variables under control. (E.g.
Rats.
Ch.9)
Otherwise: Strong
association. Consistent over many studies. Higher x-->stronger
y.
X precedes y in time. A plausible mechanism exists (parallel
studies?)
Generalize rat grooming to humans?
| Sievers home | Math151-Fall06/Daym18.htm | 8:30 pm | 10/5/06 |