Math 151 , Fall. '08, Day 15 Wed. Oct. 1 Hit reload.. .After class.

HW Day 15 :Ch. 5, Regression, thru p. 125 (check p. 137: 5.14 through 20, basic line and regression line facts and tools. 21 r and slope, 22 is harder--changing units--don't worry about it. 23 If you sketch the graph and draw a line thru the points, you should be able to guesstimate the slope well enough to choose among the 3 answers. It's also the InClass .)  Next, the equation of the least-squares line (p. 120) & Fact 2, p.123. Next, Continuing regression, p. 126-137. Ahead:  Skip Ch. 6 for now.  Ch. 7 review.  (Then Ch. 8 and 9)

Hand in next class:  (also hand in Day 13 if you haven't)
Regression (Ch. 5): 
A.  Tidy up your Inclass sheet, hand in next time. Solutions
B. Use the SPSS Scatterplot handout and graph  the regression line for govsal on avgpay (as shown, p. 2), also the lines for the 4 separate groups (either on one graph or on panels.) Print them out and keep them.  Find the formula for the regression line (p. 4 of handout).   Answer questions 7-9, 11, on Governors' Salaries handoutKeep the handout answers till you can answer all 12 questions.

p. 118, 5.1  IQ and reading scores. Graph, slope, predict.  Notice we don't have a scatterplot of the data, only this straight-line summary.
p. 139, 5.24 Penguins diving  Again, we don't have a scatterplot, only the summary.

p. 122, 5.4 (SPSS) Sparrowhawk colonies  Use SPSS to make the scatterplot, with the line, and find r.  Do (c),  and compute (d) by hand.    Now use the "up and over" method of Fig. 5.1, p.116, with a pencil and straightedge to mark the predicted value from (d) on the y-scale. Write down your computed answer next to it.  Make sure the two  methods give consistent answers.

p. 118, 5.2 equation from info.   As written, this is an algebra problem, not too hard, but not  in the main focus of the course.  I will tell you that the intercept a is -50, and  now the question is in the main focus of the course.    That is, what is the slope, and what is the equation?

p. 148, 5.54 (Applet) regression suitability
 p. 140, 5.26 (SPSS) sisters & brothers

More regression
p. 146, 5.42 (SPSS) A computer circle game  The last part of the last question, "Give numerical measures that describe the success of the two regressions,"  is asking for you to use Fact 4.  BTW, I found the graphs to be somewhat deceptive.  Can you see why?


A .  Use the Excel RSquared page. (Using Excel 2007 (in the labs )R-Squared07 (ClassMaterial\Math151BPS4e\RegressionDemosExcel07)  Using an older Excel? R-Squared (or RSquared.xls: ClassMaterial\Math151BPS4e\RegressionDemos OlderExcel)). Shift points around, by typing new values (Excel 07) or dragging points (older Excel) and get an r2 close to .8 (80%) (Between .75 and .85 is good enough.Try the same x's, and y's 4,17,15,32 ).  Note that if r = +.9, then  r2 = .81.   Now shift the points so that r is negative and r2 is close to .8.Try the same x's, and y's going high to low.  Adjust.  Print the resulting page to hand in. (Data and graph) 

Income depends on height?! Read the article at the link and answer this.
If your browser doesn't get the link, it's at http://aurora.wells.edu/~srs/Math151-F08/tallpeoplewin.htm 
  a)What is "$789", and what kind of analysis did they do? 
  b)What does my footnote at the end tell you about the data that the article did not?

 B. With  the Governors' Salaries handout, now do #10 also. (keep them all till we do #12) Governors' Salaries HW, accompanying  Scatterplot Handout 

p. 142, 5.32 going to class, proportion explained
p. 140, 5.28 social rejection, reading other software  (Read the text pp.120-122  for how to read software)

Read to discuss

Regression:

Use
http://www.whfreeman.com/bps4e
Correlation and Regression applet  to do p. 148, 5.55 , guessing lines

 Look at this, especially with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file.Using Excel 2007 (in the labs)?
RegressionSlope07 
   Using an older Excel?  RegressionSlope (or in the folders RegressionDemosExcel07, or RegressionDemos OlderExcel  in ClassMaterial\Math151-BPS4e).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.   Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.'s in y for each s.d. run in x.  (Fact 2)
Optional 

p. 179, 7.27 (review Normal)
Exam 1 returned Monday. Get yours if you were absent.  Solutions  Comments on selected problems.
Exam 2 a week from Friday: Day 19 (Oct. 10).  Day before break.  Let me know Right Away if you can't take the exam Friday.  Starts with Ch. 3,  Normal distribution, tables.  Thru Ch. 4, and what we cover of Ch.5 (&7)  through  Monday.

= = = = Continue today. = = =
Regression line:
Ch. 5,
Predicts or estimates a y (vertical) value for a given x (horizontal) value: Straight line!  Details Day 13
Experimenting  http://www.whfreeman.com/bps4e,  Correlation and Regression Applet.
SPSS--graph line, p. 2 top of handout link.
     formula p. 4 of handout link. . Read off "coefficients" (intercept and slope) from table.
      Govsal on avgpay
Formula yhat = a + b x.  (yhat means we're finding a sort of average y for each particular x).
  Govsal = a + b avgpay  
a
is y-intercept.
is slope:  If x increases one unit, yhat increases b units.   (b multiplies the x-variable.)
Predicting a y for a given x: 
Plug in to formula.  (Graphical:  "Up and over")
Graphing line:  Pick an (easy)  x-value at one end of the useful range.  Plug in to the formula and calculate the corresponding y.  Graph the (x,y) pair.  Repeat with an x value at the other end of the range.  Connect the 2 dots with a line (see pretest).  Insurance:  Pick a third x and calculate the y.  This point must also lie on the line, if you did it right.)  Table x  |  y  is good to organize work.

In-class work:   Handout,  Kneeheight.doc:  Do 1 thru 5.  Each gets a paper, but work together. Put your name; put your co-workers' name(s) in parentheses ( ).

Facts:  1& 2 lite, 3 first.  Then 4.   Then 2 &Formulas p. 120, from 2&3.  

Facts (Moore pp. 123-125)

  1. Which is explanatory, which is response, is crucial for regression!  The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).  See "residual"(deviation) lines for govsal on avgpay.
    Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration,on overhead projector soon!.) (Example 5.3, Fig. 5.4 pp.123-4 is about this. )
     
  2. Lite:  The correlation coefficient r and the slope b of the regression line have the same sign!  + or - .
       Negative/positive:  trend=slope ~association~correlation
    Heavy: A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.  We'll return to this.

  3. The regression line goes through the point given by the two means, (xbar, ybar)
    Applet
    http://www.whfreeman.com/bps4e
    We'll return to this.    In-class work:   Handout,  Kneeheight.doc:  Do 6.

  4. r2 ("Coefficient of Determination") = fraction of the variation in y-values explained/predicted by knowing x and using the least squares regression line.  SPSS writes "R Square", or "R Sq Linear".  (Exactly what that means mathematically is hard.  Just get used to it as a measurement.)
    Closer to 0, more scatter around the line. Closer to 1, tighter clustering around the line. R-Squared (or RSquared.xls: ClassMaterial\Math151-BPS4e\RegressionDemosOlderExcel) (Excel07?  RSquared07.xls) (Optional:  Further explanation of r2)

  5. r2 is the square of the correlation coefficient r!  (-, + Sign gets lost.)  
    If r = .7, about half (.49) of the variation  in the y's is explained by using the regression line relationship to predict y from x. (If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height. Or vice versa.)
    NOTE:  The standard deviation doesn't say anything about the distance of any individual point from the mean; it's only about a kind of "average" variability. 
    R2 doesn't say anything about the line and any particular (x,y) pair --just about a kind of "average" goodness of the explanatory power of the line for the data.  
Pizza prices example for  r2 : $10 plain, $2 extra per topping.  5 people buy: 0,1,2,3,4 toppings.
x = # of toppings:   0      1      2     3      4        Lie on a Straight line! 
y = Price ($)      :   10   12    14    16    18                y = 10 + 2x
Correlation coefficient  r = 1.   r2 = 1. 
100% of the variation in price is explained by (knowing and regressing on) the number of toppings.  NOT 100% of the price!
          (Example is not my invention, but I've lost track of whose...)
In-class work:   Handout,  Kneeheight.doc:  Do 7, 8.   

More time?  Look at what outliers do to line: http://www.whfreeman.com/bps4e, Correlation & Regression applet.

Sievers home  Math151-F08/Dayf15.htm  11:30am 10/1/08
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.