MATH 251, Probability and Statistics I, Fall 2005, Sept. 19, Day 11hit reload

In honor of Constitution Day, September 17, and in partial fulfillment of the Federal Government's requirement that all schools and colleges receiving any Federal aid  teach the Constitution on Constitution Day:   My favorite amendment, Amendment 9 of the Bill of Rights:
 The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people. 
Look it up.

Day 11, Monday Sept 19, finishing text 2.3, 2.4.
Next:  Proceed onward through ch. 2: 2.5 next, then return to transforming relationships (pp. 143-145 + virtual text)

Hand in:
2.63 infant growth (SPSS) The residuals are already in the problem's file.
(Draw the horizontal line at 0 on your residuals plot by hand. SPSS will do it with difficulty; don't bother.)
Also, follow the directions on the Scatterplot handout (p. 4 bottom, cont'd p. 3 bottom) &/or below to create a new variable containing the residuals.  Check that it duplicates the given residuals column.
2.65 infant growth is averages

Governors' Salaries HW:  do 10, 12, which completes the questions. (Create  the residuals and graph them vs.average pay.  Note your graph is the reverse of that on p. 4 of the handout.   Hand everything in.

2.68  income changes  (Recall 2.79 from Day 10)
2.73, 2.75  mileage again. (SPSS) Easiest thing to do with  the unwanted cases is delete them, save data file under a different name.  (To use Data>Select cases: if... with string variables, put the string in single quotes.)
p. 186, 2.106 speed/strideM/F (SPSS)(Regress stride on speed).
2.107  bacteria death (SPSS) (Read pp. 143-5 with this.) (For b: use Transform:Compute: lncount = ln(count) to make a new variable of the natural logarithm of count.  (You can paste in the formula from the Functions box.  To check this is the right one, do Help: get Computing variables; pick Functions, then Arithmetic Functions, and read.)  We'll do the Supplementary section on tranformation of variables soon.
Read, to discuss
2.78 Applet exploration of outlier.  Watch also r, and think about r-squared.

2.67 grade inflation
2.69 fidgeting or BMR? look in the back for the numbers.
2.76 mean stride rates/raw
2.83 baseball pay--reading residuals

 

Optional 
 
 
 
 

--Answer to problem B, Day 10:  the w that minimizes Sum (yi - w)2  is ybar, the mean of the y's.
(Note that with ybar in place of w, Sum (yi - w)2 is the top of the variance formula.)
So in the context of mean/s.d., the least squares criterion for the line fits right in.

SPSSResiduals:  Analyze> Linear Regression, horizontal axis variable to Independent box, vertical axis variable to Dependent box.   Save button--adds columns of these values to your data file; then you can analyze them however you want.  Choose Residuals: Unstandardized  and Predicted values: Unstandardized .
See Scatterplot handout, bottom pp. 4 and 3. The Plots button gives residuals on the y-predicted variable! not the x-variable as IPS shows.  Doesn't matter much, since y-predicted is a linear transformation of x, but if the slope is negative, they'll look "backward".

Residuals should show no clear  patterns, if the line's a good fit.  By "detrending" the data set, sometimes subtle characteristics (like a subtle curve) are uncovered.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CAUTIONS:
Correlation/regression only capture linear association (lots of things are almost linear over a short interval)
   Extrapolation is dangerous-- (maybe not linear over a longer interval)
   Restricted-range problem (range not enough to uncover true relationship:  it may be curved, or it may be more strongly linear!)
Influential points, outliers (squared errors make very non-resistant)   Explore with Applet.
Lurking variables.  Check residuals, x, y, against time or order of observation (timeplot)--
(looking for a "fatigue" or "running in" lurking variable.)
Mixing 2 (or more) groups can diffuse or even reverse association (pp. 167-8--"Simpson's Paradox")
Averaged data will make stronger correlation than nonaveraged.  (e.g. country data)

Sievers home     If I am absent  Math251-Fall05/Dayps11.htm      10pm    9/18/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.