MATH 251, P&S I, Fall 2005, Sept. 14, Day 9  hit reload

Day 9, Wednesday, Sept. 14
 Read Correlation and Regression, 2.2  and 2.3.
Memorize formula for r (p. 124) and for slope b (p. 137)
SPSS scatterplot Handout :  Correlation p. 4, Linear Regression,top. rest of p. 4.
 
Hand in: 
Correlation 2.2 p. 127ff. SPSS for all
2.20 dates' heights. 
2.32 speed/fuel again. 
and 2.40 a,b(Transform/compute will make your new variables.  SPSS Intro handout, p. 8). 
2.33 brand and mileage--outlier
2.28 bio vs. physics  Do 2.10 also.  To get the separate correlations for the 2 icicle groups, you need to select each subgroup (See Scatterplot handout p. 4 top, SPSS intro p. 5 bottom)
Governors' Salaries HW:  add #6 to 1 thru 5, keep it.

Regression 2.3 , on material thru about p. 143. HW p. 145ff.
p. 146 2.37 IQ/reading--NO SPSS, just graph a straight line.
2.43, 2.44  river, perch, NO SPSS rate, prediction, intercept
Governors' Salaries HW:  add #7, #8, #9, # 11, keep it.
2.49 icicles again.  (SPSS)
2.46 pipe defects ('SPSS)

Do, but keep your results for the next assignment:
2.42 a, b (c next time) basketball NO SPSS
2.47a, b (c next time) social distress (SPSS)
Read, discuss 
Correlation 2.2 , p. 127ff
2.22 a  perch. Look at the bottom of the assignment table for the actual r.
 
2.31, 2.34,  (Applet on CD or website)  Use mean x and mean y lines to help "see" r.
2.35 (Marriage ages)
2.37 Teach/research
2.38 blunders


Regression 2.3
p. 168, 2.77 (Applet)  Also, add meanx&meany lines after your experimenting.

Optional 
  More r practice
p. 128 2.23
 
 


Play with RegressionSlope (or in the folder RegressionDemosExcel in ClassMaterial\Math251). 

2.22b .6821.;
HW Questions?  Check with your neighbor first...
--LaReina suggests we review normal distribution: P(X > x) 
--Also, that pregnancy lasting 310 days:." Dear Reader: The average gestation period is 266 days. Some babies come early. Others come late. Yours was late.  The question here is not whether the baby was late. That fact is already known. At issue is the credibility of the length of the delay. Ten months and five days is approximately 310 days, which means that the pregnancy exceeded the norm by 44 days. [How unusual is that?] --What proportion of pregnancies last 310 days or more?  z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.
        3 in a thousand pregnancies last that long.  Pretty rare.  Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course).*
- - - - - - - - - - - - - -

Section 2.2
The correlation coefficient r is a numerical measure for how strongly linear (and in what direction) the relationship is.  Doesn't substitute  for a scatterplot.

  1. Measures relationship--same whichever variable is on the x-axis
  2. "Correlation" --only for 2 quantitative variables
  3. "Unitless"--original measurment units are "standardized out"
  4. Sign of correlation coefficient matches direction of relationship
  5. Between -1 and +1.  0: no linear relationship, + or -1: perfect straight line.
  6. Does NOT give info about curved relationships.
  7. NOT resistant to outliers--quite sensitive.

Observe some correlations with applet   http://www.whfreeman.com/scc,  or http://www.whfreeman.com/ips.
Regression line: Section 2.3, Predicts or estimates a y (vertical) value for a given x (horizontal) value: Straight line!
    Formula yhat = a + b x.
         To predict a y-value for a given x-value, plug the x value into the formula and calculate.
                To do it graphically, use the Up-and-Over method (Fig. 2.12, p.134):
                    Find the x, go straight up to the line, then go over to the y-axis; that y-value is the predicted y.

        a is y-intercept. b  is slope (b multiplies x, the horizontal value):  If x increases one unit, yhat increases b units.
    RegressionSlope.xls or in ClassMaterial\Math251\RegressionDemosExcel

We all get the same line from a batch of data because we use the "least-squares best fit" criterion (pp. 135-6): we'll investigate this more closely later.

**The Regression line is trying to predict the "average y" for a given x (with the added requirement that it is a straight line).
--For a particular (xo, yo) pair in the data, yo is the observed y.  The y-value you get by plugging xo into the regression line formula is the predicted y.  The error = observed y - predicted y(Positive if the observed y is above the line) IPSp.135.
--The line is chosen to minimize these vertical errors.  Practice fitting "least squares best fit" lines with applet   http://www.whfreeman.com/scc,  or http://www.whfreeman.com/ips.

--Unless the data lies perfectly on a straight line, the line for predicting weight from height -- "regressing weight on height" --(for example) will NOT be the same line as that for predicting height from weight--"regressing height on weight".  (In-class demonstration)(The picture on p.140 is about this. )

Sievers home  Math251-Fall05/Dayps9.htm      11pm    9/13/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
  *Bear in mind that there were around 400,000 births in California in 1970. (I'm guesstimating.  There were 605,694 births in 1990, and the population of California in 1970 was 2/3 of that in 1990).  So a 3-in-a-thousand event would occur in 3x400 = 1200 births--there would be 1200 women in San Diego Reader's position (many of whom wouldn't know it.)  Rare events DO happen--it's not really fair to only notice and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical intervention (ultrasound, inducement of labor, etc.) it often gets now.