Math 151 , Fall 2004, Monday, Sept 27, Day 14 Hit reload ...

Exams not finished yet.  Solutions are available, outside my door, + on reserve
-------------------------------------------------------------------------------
HW Day 14 2.2 (correlation) You do not have to be able to calculate r by hand.  You should be able to guess roughly at an r for a swarm of data; as p.101, fig. 2.9, and know and  be able to use facts 1 thru 7, p. 100  Also to find r using SPSS.  Start Moore 2.3 pp.106-112, then onward in 2.3.
Hand in Wednesday:
A. Go to Text website http://www.whfreeman.com/scc,  (or http://bcs.whfreeman.com/bps3e/  ): and play with the  Correlation/Regression Applet.  Create a data set of around 10-15 points with r = -.65 (close to it).  Add the meanX&meanY lines, and make a sketch of your result on your paper to hand in. (Or you can print it out like this: Hit the Printscreen while holding down the Alt button.  This puts the image of the active window on the Clipboard.  Open Word, do Edit>Paste.  Then you can print the Word document.)

 Using SPSS to find correl. coeff.  (Back page of Scatterplot handout: Analyze>Correlate>Bivariate)
Hand in the scatterplots, write the correlation values, other info on your printout.
B. Use the file educ-v-mortality.sav    This is median education level and mortality rate for 60 American cities.  Make a scatterplot showing mortality on the y (vertical) axis  vs. education on the x axis.  with the two outliers (lower left) labeled with their cities.  Find r for the data with the outliers, then delete** the two outliers and find r again.  Write the two r's on your printed graph.
p. 106, 2.28 (SPSS) speed, gas (real)) 
 p. 103, 2.23 (SPSS) calories  **To delete a case, click on the gray case number.  The whole row should show black (selected), except for first column.  Then Delete key deletes it. (Edit has an undo) Save both data files, original and deletions, to your disk. 

Sec. 2.2 Correlation (no SPSS ).
p. 102 2.18  thinking about correlation.
2.19 men two years older
2.20 r =0, strong assoc. (By hand is fine) graph the data (speed on the x-axis). Draw a horizontal line at the mean of the y's (26.8 MPG) and a vertical line at the mean of the x's (40 mph).  For each data point, draw a dotted line from the point horizontally to the 40 mph line, and another line vertically to the 26.8MPG line.  Use this picture to explain as best you can why the correlation is 0. (Think about each point's contribution to r, as in the lecture.)
p. 105 2.26  newspaper
p. 157 2.90 education/age

Read, to discuss 
Moore p. 99 Use data of 2.17.You graphed this by hand for Sec. 2.1.  Guess what r is; look in the back of the book to see how close you got.
p. 106 2.29 blunders

C.  Many communities find a strong positive correlation between the amount of ice cream sold in a given month and the number of drownings that occur in that month.  Does this mean that ice cream causes drowning?  If not, can you think of an alternative explanation for the strong association?

D. Explain why one would expect to find a positive correlation between the number of fire engines that respond to a fire and the amount of damage done in the fire.  Does this mean that the damage would be less extensive if only fewer fire engines were dispatched?  Explain. 

Optional 
 
 

 


Regression prep. 
Review of straight lines: 
p. 124, 2.39, 2.40. Optional, but be able to do. Most people did fine on lines on the pretest. If these are a problem, ask someone NOW! Any MathClinic assistant can help with these.  Also Just the Basics on reserve covers it.

A. Open the Excel file RegressionSlope (or in the folder RegressionDemos in ClassMaterial\Math151).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.  Print the page to hand in.

B. Practice fitting lines:  Use the text website ("Do this" below) and try to fit at least 4 different data sets. Write down on your paper what you discovered (were your judgment errors consistent in any ways--did you have any surprises?) 

Moore p. 111, 2.31 acid rain No data, therefore no SPSS (draw the line by hand)

Read, 
to 
discuss 
Op
tion
al 
 
 

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  - - -
Hand in Later!regression with SPSS (you can get your printouts now, answer other questions later if you like.). 
 C. Use the SPSS Scatterplot handout and graph  the regression line for govsal on avgpay (as shown, back page), also the lines for the 4 separate groups (either on one graph or on panels.) Print them out and keep them.  Start answering questions 6-11, on p. 3 of the handout.  Keep till you can answer all questions

 Moore p. 111 2.32 (Manatees) all parts. Import the dataset into SPSS (Class Materials\Math151) In
 SPSS,  Print the plain graph, and one with the regression line. Draw the regression line BY HAND as
 best you can on the plain graph. Check with the other one. For part b, pencil in the new points on the
 graph with the printed line. Find the mean by hand(calculator)... 
  p. 126, 2.44 p. 129, 2.48 Sarah grows.... Use SPSS for parts a and b, calculator for the rest. 

 D. For the data of Moore, p103, 2.22 (metabolism), (SPSS) Print out a graph with the regression line
 for all the people, and another with 2 separate lines (M and F). Use the equations  to calculate the
 predicted metabolic rate for 
      a) a person of mass 45 kg. 
      b) a female of mass 45 kg. 
      c) a male of mass 45 kg. 
  Now use the "up and over" method of Fig. 2.10 p. 107, with a pencil and straightedge to mark the 
  predicted values on the y-scale. Write down your computed answers next to them.  Make sure the
 two  methods give consistent answers. 

Read, 
to 
discuss 
Op
tion
al 
 
 

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  - - -
Homework questions? 2.13 corn plant density: "Curve" can predict/estimate a yield for a given planting density.

Correlation: Day 11
--You won't have to calculate a correlation coefficient by hand. This formula is a bad one for hand computation (roundoff error); if you must do one by hand, find the computational formula in an old textbook.
--Eyeballing:  sketch xbar and ybar lines, see how much data is in + quadrants, how much in - quadrants.
--Strength of correlation says NOTHING about causality!  Strong correlation could be:
         A causes B/  B causes A/ C causes both A and B/ just chance that they go together in this data set.

Graphing Straight lines? p. 124, 2.39, 2.40

Regression line: Section 2.3, Predicts or estimates a y (vertical) value for a given x (horizontal) value: Straight line!
    Formula yhat = a + b x.
         To predict a y-value for a given x-value, plug the x value into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 2.10, p.107):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.

        a is y-intercept. b  is slope  (b multiplies x, the horizontal value):  If x increases one unit, yhat increases b units.
    RegressionSlope.xls or in ClassMaterial\Math151\RegressionDemos

We all get the same line from a batch of data because we use the "least-squares best fit" criterion (pp. 107-8): we'll investigate this more closely later.

Do this: Practice fitting "least squares best fit" lines:  Author's website,  http://www.whfreeman.com/scc,  (ClickNetscape toolbars to minimize them, if needed.  If line drawing doesn't work, try the newer version at http://bcs.whfreeman.com/bps3e/  )
  Choose "Statistical Applets",  Correlation/Regression.  Check in the "Show least-squares line" box and put in some data points.   Check in the "Show Mean X &Mean Y lines" box; see if #3 below holds.  Repeat for a few data sets.
--Try fitting the line yourself:  (Uncheck the "Show ..." boxes.) Put in some data points.  Now click Draw Line.  Click and drag in the picture and you'll get a line with 3 blobs. Drag the center and it will go up and down, Drag an end and the slope will change. Put the line in the best place for predicting y's from x's.  If you do well by the "least squares" criterion, the green bar up top will shrink close to 0 (but in the newer version you have to be really good.  Dumb.)   Check in the "Show Mean X &Mean Y lines" box; adjust your line.  Check in the "Show least-squares line" box and see how you did.
 

Facts (Moore pp. 112-14)

  1. The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).

  2. Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration)(The picture on p.113 is about this. )
     
  3. A change of one standard deviation in x corresponds to a change of r standard deviations in y, along the regression line.

  4.  The slope b expresses change in y-units per x-unit. (Suppose x is inches, y is pounds. Then b is in pounds per inch.) You can find b by multiplying r by the standard deviation of the y's (that's in pounds)  and dividing by the standard deviation of the x's (that's in inches)
    In "algebra", b = r times (s.d. of y)/(s.d. of x)  (Equation p. 109)
           If we standardize both the x-values and the y-values, the slope will just = r !
     
  5. The regression line goes through the point given by the two means, (xbar, ybar).

  6. --If you know this, you know ybar = a + b (xbar).  You can solve this for a, a = ybar - b (xbar). (OtherEquation p. 109)
    --So knowing 2 and 3 give you the equation of the line from the means, s.d.'s, and r.
    --And if you draw the two lines, y on x and x on y, they will intersect at (xbar, ybar)
     
  7. r2 ("Coefficient of Determination") = Proportion of variability in y-values explained/predicted by knowing x and using the least squares regression line.  (Exactly what that means mathematically is hard.  Just get used to it as a measurement.)

  8. If r = .7, about half (.49)of the variability  in the y's is explained by using the regression line relationship to predict y from x.(If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height.)

Sievers home  Math151-Fall04/Dayf14.htm  11:30pm 9/26/04 
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.