Math 151 , Day 14, Monday, Oct.1, 2001

Questions on Homework:

Section 2.3,  Review: Regression line:  Predicts or estimates a y value for a given x value.
    Formula yhat= a + b x.
         To predict a y-value for a given x-value, plug the x into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 2.10, p.107):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.
        a  is y-intercept. is slope:  If x increases one unit, yhat increases b units.

Using SPSS to:
        DRAW line(s):  In Chart Editor,  (Chart/Options: Fit Line: Linear Regression)
        Calculate formula, r:
            Statistics button: Regression coefficients: check Estimates & Model Fit (Descriptives is nice too)
            Scroll output down to "Coefficients".  B column.  (Constant) = a, number under it = b.
              Beta column.   This number is the correlation coefficient r.
        Regression coefficients       This is SPSS manual 2.1, p. 62.
 Subgroups (handout)--defined by a categorical "grouping" variable.
        Graph:  Put the grouping variable into Set Markers By, as you make the scatterplot.
            Then in Chart/Options: Fit Line, check Subgroups.  You get all the lines.
        Calculate line formula and correlation coefficient:
            Analyze/Regression/Linear,  move the grouping variable into Selection Variable box.
            Then click Rule...  You need to choose ONE subgroup, put its EXACT value in here (e.g. F not f).
                To do other subgroups, repeat.

Regression  Formula yhat= a + b x.   Predicts or estimates a y value for a given x value.
We all get the same line from a batch of data because we use the "least-squares best fit" criterion (pp. 107-8): we'll investigate this more closely later.
Facts (pp. 112-14)

  1. The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).

  2. Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration)
     
  3. A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.  
     The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches)

  4. In "algebra", b = r times (s.d. of y)/(s.d. of x)  (Equation p. 104)
           If we standardize both the x-values and the y-values, the slope will just = r !
               That's why r is next to b in SPSS
     
  5. The regression line goes through the point given by the two means, (xbar, ybar).

  6. --If you know this, you know ybar = a + b (xbar).  You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 104)
    --So knowing 2 and 3 give you the equation of the line from the means, s.d.'s, and r.
    --And if you draw the two lines, y on x and x on y, they will intersect at (xbar, ybar)
     
  7. r2 ("Coefficient of Determination") = Proportion of variability in y-values explained/predicted by knowing x and using the least squares regression line.  (Exactly what that means mathematically is hard.  Just get used to it as a measurement.)

  8. If r = .7, about half (.49)of the variability  in the y's is explained by using the regression line relationship to predict y from x.(If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height.) Rsquared in SPSS (scroll down)
Practice fitting "least squares best fit" lines: Text website,  http://www.whfreeman.com/bps, scroll down to Select a Category  (ClickNetscape toolbars to minimize them, if needed.)
  Choose "Statistical Applets",  Correlation/Regression.  Check in the "Show least-squares line" box and put in some data points.   Check in the "Show Mean X &Mean Y lines" box; see if #3 above holds.  Repeat for a few data sets.
Try fitting the line yourself:  Put in some data points.  Now click Draw Line.  Click and drag in the picture and you'll get a line with 3 blobs. Drag the center and it will go up and down, Drag an end and the slope will change. Put the line in the best place for predicting y's from x's.  If you do well by the "least squares" criterion, the green bar will shrink close to 0.   Check in the "Show Mean X &Mean Y lines" box; adjust your line.  Check in the "Show least-squares line" box and see how you did.
-------------------
Day 14: (Re)Read 2.3 to p. 116.  Read ahead, rest of 2.3 + 2.4.  Next time: least squares criterion + the rest of 2.3; start 2.4
Hand in:  Everything needs SPSS unless otherwise noted!
Using SPSS to find correl. coeff.  Hand in the scatterplots, write the r's, other info on your printout.
p. 106, 2.28 speed, gas (real) 
p. 103, 2.23 calories (manual,sec. 0.10 tells how to delete. Save both data files. )
Below: Be sure to write down the Regression line equation! Raw SPSS output isn't enough.
Subgroups:
For the data of p103, 2.22 (metabolism), Print out a graph with the regression line for all the people, and another with 2 separate lines (M and F: Fit line:Subgroups). Use the "up and over" method of Fig. 2.10 p. 107, with a pencil and straightedge, to predict (graphically) the metabolic rate for
    a) a person of mass 45 kg.
    b) a female of mass 45 kg.
    c) a male of mass 45 kg.    Write down your numerical answers, estimated from the graph scale.)
Now do:
 2.22 metabolism M/F (finding separate r's)  Also find the equations of the two (M, F) regression lines.

With "facts":
2.33  prof. swims--two lines x->y, y->x Also, Make both graphs, each with its regression line.
2.35  beavers (prop. explained.)
p. 111, 2.30 heating degree days,  checking formulas on p. 104. Use SPSS to get the formula in part a (again), and the mean, s.d., and correl. coeff. in part b.  Then use your calculator to calculate the slope and intercept.  Compare with SPSS's.  
p. 128, 2.47  Julie's grade (Not SPSS, just calculator) 
p. 129, 2.51 "regression"  (Not SPSS, just calculator)   Hint below*

A: Practice fitting lines:  Use the text website (as above) and try to fit at least 4 different data setsWrite down on your paper what you discovered (were your judgment errors consistent in any ways--did you have any surprises?)

Read,
to discuss
Optional
*Hint:  ybar = 46.6 + .41xbar [why?].  Let c be the amount Octavio's final is predicted to exceed the mean.
Then (ybar +c) = 46.6 + .41(xbar + 10) [why?].  Use the two equations and solve for c.  If your algebra skills are not strong enough, don't get upset; this calculation is not central to the course.  Read the answer!

Sievers home  Math151-Fall01/DayS14.htm  10 pm  10/01/01
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.