Math 151 , Fall 2002, Monday, Sept. 30, Day 14 Hit reload to get most current versionCorrected after class

Exams not finished yet.
HW Day 14  (re)read 2.2.  Read  Moore 2.3 (Regression) through p. 114.  Ahead, rest of 2.3
 Hand in next class, correlation:  
These I meant to be due today but was not clear on the webpage.  If you didn't do them for today, do them now; no penalty.
A. Go to Text websitehttp://www.whfreeman.com/scc, (see above), and play with the  Correlation/Regression Applet.  Create a data set of around 10-15 points with r = -.65 (close to it).  Add the meanX&meanY lines, and make a sketch of your result on your paper to hand in. (Or you can print it out like this: Hit the Printscreen while holding down the Alt button.  This puts the image of the active window on the Clipboard.  Open Word, do Edit>Paste.  Then you can print the Word document.)

Using SPSS to find correl. coeff.  Hand in the scatterplots, write the correlation values, other info on your printout.
B. Use the file mortal_vs_educ.sav    This is median education level and mortality rate for 60 American cities.  Make a scatterplot showing mortality on the y (vertical) axis  vs. education on the x axis.  with the two outliers (lower left) labeled with their cities.  Find r for the data with r, then delete the two outliers and find r again.  Write the two r's on your printed graph.
p. 106, 2.28 (SPSS) speed, gas (real)) 
 p. 103, 2.23 (SPSS) calories  To delete a case, click on the gray case number.  The whole row should show black (selected), except for first column.  Then Delete key deletes it. (Edit has an undo) Save both data files, original and deletions, to your disk.  

Sec. 2.2 Correlation (no SPSS .
p. 102 2.18  thinking about correlation.
2.19 men two years older
2.20 r =0, strong assoc. (By hand is fine) graph the data (speed on the x-axis). Draw a horizontal line at the mean of the y's (26.8 MPG) and a vertical line at the mean of the x's (40 mph).  For each data point, draw a dotted line from the point horizontally to the 40 mph line, and another line vertically to the 26.8MPG line.  Use this picture to explain as best you can why the correlation is 0. (Think about each point's contribution to r, as in the lecture.)
p. 105 2.26  newspaper
p. 157 2.90 education/age

Read, to discuss 
Moore p. 99 Use data of 2.17.You graphed this by hand for Sec. 2.1.  Guess what r is; look in the back of the book to see how close you got.
p. 106 2.29 blunders

C.  Many communities find a strong positive correlation between the amount of ice cream sold in a given month and the number of drownings that occur in that month.  Does this mean that ice cream causes drowning?  If not, can you think of an alternative explanation for the strong association?

D. Explain why one would expect to find a positive correlation between the number of fire engines that respond to a fire and the amount of damage done in the fire.  Does this mean that the damage would be less extensive if only fewer fire engines were dispatched?  Explain. 

Optional 
 
 

 

Hand in next class, regression
Review of straight lines: 
p. 124, 2.39, 2.40. Most people did fine on lines on the pretest. If these are a problem, ask someone NOW! Any MathClinic assistant can help with these.  Also Just the Basics on reserve covers it.

A. Postpone this problem to next asst.Open the Excel file RegressionSlope (or in the folder RegressionDemos in ClassMaterial\Math151).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.  Print the page to hand in.

B. Practice fitting lines:  Use the text website ("Do this" below) and try to fit at least 4 different data sets. Write down on your paper what you discovered (were your judgment errors consistent in any ways--did you have any surprises?)

Moore p. 111, 2.31 acid rain No data, therefore no SPSS (draw the line by hand)
C. Use the SPSS handout and graph  the regression line for govsal on avgpay (as shown), also the lines for the 4 separate groups (either on one graph or on panels.)

Moorep. 111 2.32 (Manatees) Import the dataset into SPSS (Class Materials\Math151) In SPSS,  Print the plain graph, and one with the regression line. Draw the regression line BY HAND as best you can on the plain graph. Check with the other one. For part b, pencil in the new points on the graph with the printed line. Find the mean by hand(calculator)...
 p. 126, 2.44 p. 129, 2.48 Sarah grows.... Use SPSS for parts a and b, calculator for the rest.

D. For the data of Moore, p103, 2.22 (metabolism), Print out a graph with the regression line for all the
 people, and another with 2 separate lines (M and F). Use the equations  to calculate the predicted
 metabolic rate for 
     a) a person of mass 45 kg. 
     b) a female of mass 45 kg. 
     c) a male of mass 45 kg. 
 Now use the "up and over" method of Fig. 2.10 p. 107, with a pencil and straightedge to mark the
 predicted values on the y-scale. Write down your computed answers next to them.  Make sure the two
 methods give consistent answers. 

Read,
to discuss 
Optional 
 
 

 

A) Activstats 9-2 and 3. Must know: what Residuals are, what Least Squares criterion is.  Formulas, and r2 ,(facts 2,3,4 above)= 2nd Activity,  p.4. Exploring regression lines, ACT, p.9-3, 1st and 2nd activities, and ACT, p. 9-4, 1st activity.  Or use LeastSquares Tool from menu bar.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  - - -
Homework questions? 2.13 corn plant density
Correlation: Day 12
--You won't have to calculate a correlation coefficient by hand. This formula is a bad one for hand computation (roundoff error); if you must do one by hand, find the computational formula in an old textbook.
--Eyeballing:  sketch xbar and ybar lines, see how much data is in + quadrants, how much in - quadrants.

Graphing Straight lines? p. 124, 2.39, 2.40

Regression line: Section 2.3, Predicts or estimates a y (vertical) value for a given x (horizontal) value.
    Formula yhat= a + b x.
         To predict a y-value for a given x-value, plug the x value into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 2.10, p.107):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.
Start here Wednesday
        a is y-intercept. b  is slope:  If x increases one unit, yhat increases b units.
    RegressionSlope.xls or in ClassMaterial\Math151\RegressionDemos

We all get the same line from a batch of data because we use the "least-squares best fit" criterion (pp. 107-8): we'll investigate this more closely later.
Facts (Moore pp. 112-14)

  1. The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).

  2. Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration)(The picture on p.113 is about this. )
     
  3. A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.

  4.  The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches)
    In "algebra", b = r times (s.d. of y)/(s.d. of x)  (Equation p. 104)
           If we standardize both the x-values and the y-values, the slope will just = r !
     
  5. The regression line goes through the point given by the two means, (xbar, ybar).

  6. --If you know this, you know ybar = a + b (xbar).  You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 104)
    --So knowing 2 and 3 give you the equation of the line from the means, s.d.'s, and r.
    --And if you draw the two lines, y on x and x on y, they will intersect at (xbar, ybar)
     
  7. r2 ("Coefficient of Determination") = Proportion of variability in y-values explained/predicted by knowing x and using the least squares regression line.  (Exactly what that means mathematically is hard.  Just get used to it as a measurement.)

  8. If r = .7, about half (.49)of the variability  in the y's is explained by using the regression line relationship to predict y from x.(If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height.)
Do this:Practice fitting "least squares best fit" lines:  Author's website,  http://www.whfreeman.com/scc,  (ClickNetscape toolbars to minimize them, if needed.)
  Choose "Statistical Applets",  Correlation/Regression.  Check in the "Show least-squares line" box and put in some data points.   Check in the "Show Mean X &Mean Y lines" box; see if #3 above holds.  Repeat for a few data sets.
--Try fitting the line yourself:  Put in some data points.  Now click Draw Line.  Click and drag in the picture and you'll get a line with 3 blobs. Drag the center and it will go up and down, Drag an end and the slope will change. Put the line in the best place for predicting y's from x's.  If you do well by the "least squares" criterion, the green bar up top will shrink close to 0.   Check in the "Show Mean X &Mean Y lines" box; adjust your line.  Check in the "Show least-squares line" box and see how you did.


Sievers home  Math151-Fall02/Day-14.htm  3:00pm 9/30/02
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.