Math 151 , Sp. '08, Day 15 Fri. Feb. 29 Hit reload.. ..

HW Day 15 (Re)Read Ch. 4, new pp. 99-105 (correlation) Check 4.14 thru 4.20.  You do not have to be able to calculate r by hand.  You should be able to guess roughly at an r for a swarm of data; as p.102, eg. 4.6, and know and  be able to use facts 1-4, p. 101, and cautions 1-4 p. 103. Ch. 5, Regression, thru p. 125 (check p. 137: 5.14 through 20, basic line and regression line facts and tools. 21 r and slope, 22 is harder--changing units--don't worry about it. 23 If you sketch the graph and draw a line thru the points, you should be able to guesstimate the slope well enough to choose among the 3 answers.)  ahead: Continuing regression, p. 126-137.
Hand in Mon.

Correlation (thinking):
p. 112, 4.36 and 4.37 Applet explorations
p. 112, 4.34 and 4.35 correlation meaning

4.26 date heights again  You graphed this by hand.  r = .5653. Now answer the questions in the text.

p. 109 4.25 b  running records again.  It's a little complicated in SPSS to get the r's for the separate groups, so get them by looking at the answers in the back of the book.  Answer the question.

A.  If women always married men who were exactly  two years older than themselves, what would be the correlation between the ages of husband and wife? (Hint: make  a data table and the corresponding scatterplot for 4 or 5 couples with different x's, and look at it.)

Correlation (computing & thinking)
SPSS Scatterplot Handout:
Do problem 6, p. 3.  Keep this with the previous work.

p. 104, 4.11 (SPSS) gas, speed: association but 0 correlation.  Find the means and draw the mean lines on your graph (by hand) to help explain the 0 correlation.

p. 104, 4.10 (SPSS) bird colonies again.  To add a data pair in SPSS just type them in a new row at the bottom.  To delete, click on the case number, which highlights the whole row, hit delete.

(This problem looks forward to Ch. 5, sort of
 p. 110, 4.28 corn plant density. (SPSS)  Notice how the data is entered for SPSS--not as displayed here! but with the first column giving Plants per acre and the second giving Yield.  Make a scatterplot.  Use your calculator to find the mean yields, and write these on your paper. .  (Or You can find means for the separate groups in SPSS : in Explore, Plants to the Factor list).  Graph the means by hand with a pencil on your printed plot, and connect the means dots.
- - - - - - - - - - - - - - - - - - - -
Regression (Ch. 5):  ..
B. Use the SPSS Scatterplot handout and graph  the regression line for govsal on avgpay (as shown, p. 2), also the lines for the 4 separate groups (either on one graph or on panels.) Print them out and keep them.  Find the formula for the regression line (p. 4 of handout).   Answer questions 7-9, 11, on Governors' Salaries handoutKeep the handout answers till you can answer all 12 questions.

p. 118, 5.1  IQ and reading scores. Graph, slope, predict.  Notice we don't have a scatterplot of the data, only this straight-line summary.
p. 139, 5.24 Penguins diving  Again, we don't have a scatterplot, only the summary.

p. 122, 5.4 (SPSS) Sparrowhawk colonies  Use SPSS to make the scatterplot, with the line, and find r.  Do (c),  and compute (d) by hand.    Now use the "up and over" method of Fig. 5.1, p.116, with a pencil and straightedge to mark the predicted value from (d) on the y-scale. Write down your computed answer next to it.  Make sure the two  methods give consistent answers.

Some more with SPSS--as long as you're at the computer, get r's, the graphs and lines, line formulas:  keep for next assignment
p. 140, 5.26 (SPSS) sisters & brothers
p. 146, 5.42 (SPSS) A computer circle game 
The last part of the last question, "Give numerical measures that describe the success of the two regressions,"  is asking for you to use Fact 4.

Read to discuss

Correlation:
p. 112, 4.33  Do a rough sketch for yourself.

Look at all the graphs you make, and guesstimate the correlation coefficient (before you read or calculate it.)

Regression: 
Use
http://www.whfreeman.com/bps4e
Correlation and Regression applet  to do p. 148, 5.55 , guessing lines


 Look at this especially, with reference to the r standard deviations in y for every 1 standard deviation in x: A. Open the Excel file RegressionSlope (or in the folder RegressionDemosExcelBPS4e  in ClassMaterial\Math151-BPS4e).  Change x-y values in the yellow boxes and watch the line change.  Change x-values in col. F and watch the "run" (red line) change, in the rightmost 2 graphs. Notice the slope = the coefficient of x = the rise/run = increase in y per unit increase in x.  Fix it so the increase in x (the "run") is exactly 1.   Also, look at the leftmost graph, where the length of the standard deviations are shown, and note that in standard-deviation units, the rise is r s.d.'s in y for each s.d. run in x.  (Fact 2)

Optional 
Do now (for ch. 5 ) if you need the practice:
Straight line graphing practice:
A.  y = -10 + 3x, graph for 2<x<10.
B.  y = 500 - 20x, graph for 0<x<10.
 


Correlation:  Use
http://www.whfreeman.com/bps4e
(see below for details) 
to make different scatterplot 
patterns, and observe their r's.

4.28, I said to draw the line by hand.
SPSS can plot the line connecting means on your graph:  In the Chart Editor, do Elements>Interpolation Line. If it doesn't look right, in the Properties window , interpolation Line tab, choose Line Type: Straight.


Exam 2 a week from today: Day 18 (March 7.    Let me know Right Away if you need to take the exam early (Wed. or Th.)).  Starts with Ch. 3, DensitiesNormal distribution, tables.  Thru Ch. 4, and what we cover of Ch.5 through  Monday.  Sample exam (handout), solutions (link) available Today.    One sheet of notes: I will give you paper copies of the Normal table.

Relationships:
  (BPS4e, Ch. 4) Day 14
Timeplots:  are scatterplots, where the x axis shows time. (often a lurking variable: plot data against order of taking observations)

Handout on SPSS Scatterplots etc. link., showing subgroups, labeling individual points.
govsal_vs_pay.sav  is the file used for most of the handout. (In SPSS for Class BPS folder)
Homework questions?   Day 14


Correlation:  The (Pearson) correlation coefficient r is a numerical measure for how strongly linear (and in what direction) the relationship is.  Doesn't substitute  for a scatterplot.
Use if data is:  2 quantitative variables, & "nice":
    One cluster/cloud/band.
   Pretty straight.
   Outlier(s)? Do with/without & be cautious.

Correlation experiments:  Website,  http://www.whfreeman.com/bps4e,"Statistical Applets",  Correlation/Regression.  Play with data points, observing the Correlation Coefficient.   Check in the "Show Mean X & Mean Y lines" box.  See how much is in each quadrant. Compare with correlation coefficient.

Using SPSS (p.4, Scatterplot handout) Analyze>Correlate>Bivariate

Properties (p. 101) and cautions (p. 103):

  1. Measures relationship--same whichever variable is on the x-axis
  2. "Unitless"--original measurment units (cm., inches) are "standardized out"
  3. Sign of correlation coefficient matches direction of relationship. + positive, -negative.
  4.  Between -1 and +1.   0: no linear relationship,   +1 or  -1: perfect straight line.
  1. Between two quantitative variables only!
  2. Does NOT give info about curved relationships (only measures linear part of relationship).
  3. NOT resistant to outliers--quite sensitive.
  4. Not a complete summary, even for nice linear data.  Need means, s.d.'s too.
correlation graph


--You won't have to calculate a correlation coefficient by hand. This formula is a bad one for hand computation (roundoff error); if you must do one by hand, find the computational formula in an old textbook.
--Eyeballing:  sketch xbar and ybar lines, see how much data is in + quadrants, how much in - quadrants.

Strength of correlation says NOTHING about causality!  Strong correlation could be:
     A causes B/   B causes A/  C causes both A and B (lurking C)/  just Chance that they go together in this data set.

= = = = = = = = = = = = = = = = = = =
Regression line:
Ch. 5,
Predicts or estimates a y (vertical) value for a given x (horizontal) value: Straight line!
     "Regressing y ON x" .
(Graphing a straight line:  pick an x-value at one end of the useful range.  Plug in to the formula and calculate the corresponding y.  Graph the (x,y) pair.  Repeat with an x value at the other end of the range.  Connect the 2 dots with a line (see pretest).  Insurance:  Pick a third x and calculate the y.  This point must also lie on the line, if you did it right.)

Experimenting  http://www.whfreeman.com/bps4e,  Correlation and Regression Applet.
SPSS--graph line, p. 2 top
     
Govsal on avgpay

Formula yhat = a + b x.  (yhat means we're finding a sort of average y for each particular x).
  Govsal = a + b avgpay  

SPSS-- formula p. 4. Read off "coefficients" (intercept and slope) from table.
a
is y-intercept.
is slope:  If x increases one unit, yhat increases b units.   (b multiplies the x-variable.)
Govsal = 28,569.69 + 2.709*avgpay 
   yhat =  28,569.69 + 2.709* x
     To predict or estimate a y-value for a given x-value, plug the x value into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 5.1, p.116):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.

         Calculating:  Montana (17,895, 55,502)   y = 28,569.69 + 2.709*x
           Predicted y = 28,569.69 + 2.709*17,895 = 28,569.69 + 48,477.56 = 77,047.25  (higher than actual)

 a is y-intercept. is slope:  If x increases one unit, yhat increases b units.   (b multiplies the x-variable.)
  If you know that yhat increases 12 units for every one that x increases, you know that the slope of the line b = 12. 
            Governor's salaries increase (on the average across the states)  $2.71 for every increase of  $1 of average pay.
     This is a summary  of the linear relationship, in the same way that the mean of a distribution is one summary of the distribution.  Particular states won't match this exactly.

 (In a straight-line relationship, the amount that y increases for one unit increase in x is the same no matter what value of x you start with)  RegressionSlope.xls or in ClassMaterial\Math151-BPS4e \RegressionDemos Excel BPS4e

We all get the same line from a batch of data because we use the "least-squares best fit" criterion (p. 119): we'll investigate this more closely later. At first, let SPSS or the text find the line for us; then we'll learn a way to calculate it from the data.

Sievers home  Math151-Sp08/Days15.htm  8pm 2/28/08
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.