Math 151 , Spring '08, Mon. Day 16, March 3,Hit reload.. .After class. corrections to A

Reading:  Ch. 5, Regression, thru p. 125   (check p. 137:  5.14 through 20, basic line and regression line facts and tools.  21 r and slope signs, 22 is harder--changing units--don't worry about it. 23 If you sketch the graph and draw a line thru the points, you should be able to guesstimate the slope well enough to choose among the 3 answers.)   Next, the equation of the least-squares line (p. 120) & Fact 2, p.123. Next, Continuing regression, p. 126-137.

 Hand in: Regression   Problems in this color were given also on F. Day 15, to work ahead.  Repeated here.
 
p. 118, 5.2 equation from info.   As written, this is an algebra problem, not too hard, but not  in the main focus of the course.  I will tell you that the intercept a is -50, and  now the question is in the main focus of the course.    That is, what is the slope, and what is the equation?

p. 148, 5.54 (Applet) regression suitability
p. 140, 5.26 (SPSS) sisters & brothers
p. 146, 5.42 (SPSS) A computer circle game  The last part of the last question, "Give numerical measures that describe the success of the two regressions,"  is asking for you to use Fact 4 p.124.

A .  Use the Excel RSquared page. (Using Excel 2007 (in the labs)?R-Squared07 (  Using an older Excel? R-Squared (or RSquared.xls: ClassMaterial\Math151BPS4e\RegressionDemosExcel BPS4e)). Shift points around, by typing new values (Excel 07) or dragging points (older Excel) and get an r2 close to .8 (80%) (Between .75 and .85 is good enough.Try same x's, and y's 4,17,15,32 ).  Note that if r = +.9, then  r2 = .81.   Now shift the points so that r is negative and r2 is close to .8.Try same x's, and y's going high to low.  Adjust.  Print the resulting page to hand in. (Data and graph) 

Income depends on height?! Read the article at the link and answer this.
If your browser doesn't get the link, it's at http://aurora.wells.edu/~srs/Math151-Sp07/tallpeoplewin.htm 
  a)What is "$789", and what kind of analysis did they do? 
  b)What does my footnote at the end tell you about the data that the article did not?

 B. With  the Governors' Salaries handout, now do #10 also. (keep them all till we do #12) Governors' Salaries HW, accompanying  Scatterplot Handout 

p. 142, 5.32 going to class, proportion explained
p. 140, 5.28 social rejection, reading other software  (Read the text pp.120-122  for how to read software)
- - - - - - - - -
Postpone the rest??  YES!:
Line formula:
p. 122, 5.3b only. verify formula Find the means, s.d.'s and r in the answers in the back of the book, and use them to calculate a and b and write the formula for the regression line..
p. 141, 5.30 husbands and wives  (Note, you have to find the equation of the line to draw the graph, tho it doesn't explicitly tell you to...)

Read,
to discuss

Op 
tion 
al 

 

= = = = = = = = = = = = = = = = = = = = = =
Exam 2 this Friday: Day 18 (March 7).    Let me know Right Away if you can't take the exam Fri. at class time (or 10:30). Starts with Ch. 3, Densities, Normal distribution, tables.  Thru Ch. 4, and what we cover of Ch.5 through  Today. (All questions on the sample exam have been covered.)  Sample exam (handout), solutions (link) Normal probability practice   One sheet of notes: I will give you paper copies of the Normal table.
Bring exam questions Wed.!

Are you having trouble seeing which variable goes on the x axis?  If there is any sense that one is the cause of the other, or can/will be used to predict or estimate the other,   that's the explanatory (x) variable.  The other one is the response (y) variable.  (Sometimes you can choose the x-values and see the response for that x, in the corresponding y:  like the corn plant density problem (It's an experiment, Ch.9.)  Sometimes you can only observe.)  Language: Regress  heating oil ON temperature:  Temperature = x = horizontal, Heating oil = y = vertical.

HW questions? 
Correlation  Day 15
   Leftover:  Timeplots:  are scatterplots, where the x axis shows time. (Time is often a lurking variable: plot data against order of taking observations)
- - - - - - - - - - -
Regression line: Ch. 6, Predicts or estimates a y (vertical) value for a given x (horizontal) value:   Straight line!
     "Regressing y ON x" .
  P110, 4.28, corn plant density.  Made a regression CURVE!  (Well, broken line...)
"Regression" with no other description means "Least squares best fit line"--STRAIGHT line.

Experimenting  http://www.whfreeman.com/bps4e,  Correlation and Regression Applet.
SPSS--back of handout.  Govsal on avgpay

Formula yhat = a + b x.    Govsal = a + b avgpay   Govsal = 28,569.69 + 2.709*avgpay
     yhat =  28,569.69 + 2.709* x        
         To predict or estimate a y-value for a given x-value, plug the x value into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 5.1, p.116):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.
         Calculating:  Montana (17,895, 55,502)   Govsal = 28,569.69 + 2.709*avgpay
           Predicted Govsal = 28,569.69 + 2.709*17,895 = 28,569.69 48,477.56 = 77,047.25 (higher than actual)

(Graphing a straight line:  pick an x-value at one end of the useful range.  Plug in to the formula and calculate the corresponding y.  Graph the (x,y) pair.  Repeat with an x value at the other end of the range.  Connect the 2 dots with a line (see pretest).  Insurance:  Pick a third x and calculate the y.  This point must also lie on the line, if you did it right.)

 a is y-intercept. is slope:  If x increases one unit, yhat increases b units.  
  If you know that yhat increases 12 units for every one that x increases, you know that the slope of the line b = 12. 
            Governor's salaries increase (on the average across the states)  $2.71 for every increase of  $1 of average pay.
     This is a summary  of the linear relationship, in the same way that the mean of a distribution is one summary of the distribution.  Particular states won't match this exactly.

 (In a straight-line relationship, the amount that y increases for one unit increase in x is the same no matter what value of x you start with)  RegressionSlope.xls or in ClassMaterial\Math151-BPS4e \RegressionDemos Excel BPS4e

Income depends on height?!
    What is "$789", and what kind of analysis did they do?  (HW)

We all get the same line from a batch of data because we use the "least-squares best fit" criterion (p. 119): we'll investigate this more closely later.

Facts:  1, 2 lite, 3 first.  Then 4.   Then 2 &Formulas p. 120, from 2&3.  

Facts (Moore pp. 123-125)

  1. Which is explanatory, which is response, is crucial for regression!  The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).  See "residual"(deviation) lines for govsal on avgpay.
    Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration,soon, on overhead projector.) (Example 5.3, Fig. 5.4 pp.123-4 is about this. )
     
  2. Lite:  The correlation coefficient r and the slope b of the regression line have the same sign!  + or - .
       Negative/positive:  trend=slope ~association~correlation
    Heavy: A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.  We'll return to this.

  3. The regression line goes through the point given by the two means, (xbar, ybar)
    Applet
    http://www.whfreeman.com/bps4e
    We'll return to this.

  4. r2 ("Coefficient of Determination") = fraction of the variation in y-values explained/predicted by knowing x and using the least squares regression line.  SPSS writes "R Square", or "R Sq Linear".  (Exactly what that means mathematically is hard.  Just get used to it as a measurement.)
    Closer to 0, more scatter around the line. Closer to 1, tighter clustering around the line. R-Squared (or RSquared.xls: ClassMaterial\Math151-BPS4e\RegressionDemosExcelBPS4e)(Optional:  Further explanation of r2)

  5. r2 is the square of the correlation coefficient r!  (-, + Sign gets lost.)  
    If r = .7, about half (.49) of the variation  in the y's is explained by using the regression line relationship to predict y from x. (If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height. Or vice versa.)
    NOTE:  The standard deviation doesn't say anything about the distance of any individual point from the mean; it's only about a kind of "average" variability. 
    R2 doesn't say anything about the line and any particular (x,y) pair --just about a kind of "average" goodness of the explanatory power of the line for the data.
Got to here Mon.
Facts
2 &3  give line formula!
(Moore pp. 123-125) 

2.   A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.
The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches)
In "algebra", b = r times (s.d. of y)/(s.d. of x)  (Equation p. 120)
       If we standardize both the x-values and the y-values, the slope will just = r !
        
govsalstd.sav Govsalstd2.doc    RegressionSlope.xls

3.   The regression line goes through the point given by the two means, (xbar, ybar). http://www.whfreeman.com/bps4e 
--If you know this, you know ybar = a + b (xbar).  You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 120)
--So knowing 2 and 3 give you the equation of the line from the means, s.d.'s, and r.
--And if you draw the two lines, y on x and x on y, they will intersect at (xbar, ybar)

The line formula yhat = a + bx  from xbar, ybar, sx , sy , r:
     Find b:   b = r  sy / sx
                (Fact 2r is slope if x and y are standardized. Equation p. 120)
      Find a:  Solve  ybar = a + b xbar for a:  a = ybar - b xbar
               (Fact 3:  (xbar, ybar) lies on the regression line(s).  Equation p. 109)
 Example.  
x is measured in Rangs, y in Zobs
 xbar = 5 Rangs,   ybar = 8 Zobs,    sx = 10 Rangs,  sy = 6 Zobs ,   r = -.3:   
        b = -.3×6/10 (Zobs/Rang) = - 0.18  Zobs/Rang.  
         8 = a + (-0.18)×5             
8 Zobs = a Zobs + (-0.18)(Zobs/Rang) ×5  Rangs
         8 = a  - .95    a = 8.95 Zobs      yhat = 8.95 - 0.18x  Zobs

Sievers home  Math151-Sp08/Days16.htm  12noon 3/3/08
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.