MATH 251, Probability and Statistics I, Fall 2001, Friday Sept. 21, Day 10

Linear regression, cont.
--Vertical Distance from point to regression line: "Error" = "Residual" = "Deviation" = (yi - yhati) corrected
   The regression line minimizes the "Sum of Squared Errors",  the "Sum of squared deviations"

--"Regressing height on weight":  Weight on the x axis, predicting height from weight.

--You do NOT get the same line if you predict height from weight as if you predict weight from height, because you are measuring those deviations from the line in different directions! (picture p. 144)

--R2 is also called "the coefficient of determination"
    See In Macmillan 101, Class Material\Math 151\Regression Demos\ResidualsRSquared
        RSquared tab.  You can change the positions of the 4 points.
The formula Moore gives, p. 147                is the same as  the        formula I use here (divide top&bottom by n-1)
  variance of predicted values yhat     Sum of explained squared variation/(n-1)____
  variance of observed values y          Sum of observed(total)squared variation/(n-1)

Reading: finish 2.3, read 2.4, Cautions/residuals/influentials (I'll demonstrate graphing residuals, DIFFITs, in class.  Focus on uses tonight.)
Hand in: 
Problems A and B from class (see below) 
2.50 Better predictor of GPA?
2.39 manatees: extrapolation
2.43 Julie's exam (formula and R2)
p. 216, 2.110 beta.
p. 179, 2.66 (This is a continuous-data version 
of "Simpson's Paradox", pp.199-200 ) 
Read, be able to discuss 
2.62 heart attacks Do a mental median trace for part b.  Make a rule of thumb for choosing a hospital for your heart attack (As if one had a choice--closer is better, and most people don't get to decide) 
p. 214, 2.108 diet--explain
2.109 heating deg. days, solar
Optional
A. (Not hard) If you know the means, standard deviations, and r for a pair of variables, you can calculate the equation of the regression line  yhat = a + bx.  Memorizing 2 facts is enough: " b = r (sY/ sX)", and "the pair of means (xbar , ybar) lie on the line".  Show that these are enough; that is, show how to get the formula for  a  , if you know these facts.


B.  The least-squares best fit  line is the line yhat = a + bx that minimizes the squared residuals (vertical distances from each yi point to the line).  Two things can vary--the slope b, and how high the line sits on the page (given by a, the intercept.)

You might ask (I know, you wouldn't--but you should...) what is the best single  point  w to describe all the y-values, using the criterion that the sum of the squared distances of the yi values to w is the smallest possible? (Another way of thinking of this, in the scatterplot setting, is what horizontal line best summarizes all the y's, if we can't use the x-information.).
Find w:  That is, find the w that makes f(w) = Sum (yi - w)2 the minimum (I can't make sigmas here: "Sum" = Big sigma, sum from i = 1 to n).  (How?  find the derivative f'(w), set it = to 0. )
If you aren't comfortable with big sigma sums, let n = 3,  f(w) = (y1 - w)2  + (y2 - w)2  +  (y3 - w)2


Sievers home  Math251-Fall01/DayP10.htm    1 am    9/21/01
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.