Math 151 , Spring 2004, Friday Day 16, Oct. 1 After class Hit reload...

- - - - - - - - - - - - - - - - - - - - - - -
Exams:  Solutions outside my door, on reserve, on electronic reserve!.
         total #1 #2 #3 #4 #5  #6   #7 #8  #9   9|11123
possible 100    9 10 13 9 26    8   12 8    5    8|6669
      max 93    9 10 13 9 26    8   12 8    5    8|024
       Q3 90.5  9 10 13 9 24.75 8   12 8    3.75 7|79
      Med 86    8  9 13 9 23    6.5 12 8    2    7|
       Q1 79.25 7  8 11 7 17.25 5   11 4.25 0   6|7
      min 37    2  4  7 3  2    4    8 1    0    6|4   lo 37
Generally very good, gives good foundation to go forward.  Don't coast--it gets harder!
Come see me!! if you're in the lower reaches.

2) Pull the tail on the  square root sign down all the way!
3a) A stemplot is supposed to be quick.  The quick way is to read the numbers as they come and put the leaves on in that order, as if you're tallying.  Trying to order on the first pass is slow and inaccurate; defeats the purpose.
5c-d) Had the same answer:  the number with 15% above it = the number with 85% below it = 85th  percentile.
6a)  Since the mean is the total for 100 days divided by 100, to estimate the total for 30 days, multiply the mean by 30.
 b)   "Average" is used to refer not only to the mean, but to the usual, the typical, the most common.  In the expression "better than average" the sense is better than usual or common.  ("better than THE average" might be taken to refer to the mean specifically.)  52 is certainly better than usual, since the median is down at 46.
7c)  The Value was meant to be the value of the land.  I didn't take off if you interpreted it as crop value.
9)  TIME plot, it says.  I thought this would be easy, after the long discussion of the steel bars timeplot on the pretest in the class just before the exam.  She should plot vit.C results against the order she analyzed them in, or better, the time since picking, or buying if she doesn't know picking.  Nutrients tend to deteriorate after plants are picked.

HW assignment Day 16
Reading:  Finish 2.3, read 2.4.   Skip 2.5. Ahead in Ch. 3.
Hand in Monday:
Exercises with four facts, from Day 15: See details there. 
  C.  govsal on avgpay  (if not handed in already)
  2.33, 2.30, 
  2.35--Note Text &Excel files are put in order, so look different,+ Text is MISSING the 23rd point, (5,56).  You can just type it in.
  2.47, 2.51 
  E. RSquared 
 POSTPONE the rest:= = = = = = = = = = = = = = = = = = 
A.  Use ResidualsRSquared from the website or the lab to graph these data sets, along with a graph of the residuals.  Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) 
a)  x 1 2 8 4 6 9 
    y 1 3 6 6 7 5 
b) x 1 2 7 4 6 9
   y 7 6 2 4 2 1
Moore p. 122, 2.36 speed&gas again a, b, c, d.   There is a data file for problem 2.36, and its third column is the residuals (check them against the book). 

B. Use Author's website, http://www.whfreeman.com/scc, ...Correlation/regression.   Make a cloud of data (about 15 points), put in the regression line.  Play with an outlier: drag a point to the far left (right) and drag it up and down.  Try it if it's in the middle range of x's.  Write answer: Where is it most influential? Now add a bunch more points (50 is max.)  Play with an outlier  againDoes the outlier have more or less influence with a larger data set?

Moore p. 123, 2.38 Gesell first word-point in middle of x range. Get the data into SPSS, delete child 19, graph and get the regression line and r2.  Use the formula on p.117 and graph the line for the full data set by hand on your printout.   r2  for the full data set is on p. 122. 

Moore p. 122, 2.37 Calories (You saved these, I think--or, from Moore's files, in  TA02-04) Graph and get lines in SPSS with and without the outliers.  Graph the line for "without outliers" by hand on the printout for "with outliers" so you can compare them better.  Print one more graph (with outliers) and keep it for problem C below.

Read,  Optional 
 
 
 
 

Postpone;==== = = = = = = 
SPSS will make residuals:  Do Analyze>Regression>Linear (a new menu for us) 
Click your variables into Independent (X) and Dependent(Y). 
Hit the Button "Save...": Checkbox Residuals: Unstandardized. Continue, Ok out of the menus.  You'll get output; ignore it. 
You'll get a new variable, the residuals. 
Try it with the data file for problem 2.36, with speed and gas.  You'll get a fourth variable that should be the same as the residuals variable. 
 

 

Regression-- Review comments
ANY Straight line y = a + bx  (or bx + a):  b, the coefficient of x, is the slope of the line.  If x changes one unit, y changes b units, so b is the rate of change of y with respect to x.  (If y is weight in pounds, and x is height in inches, b is the number of pounds  we expect to see weight go up by, per inch that height goes up by.
"Regression line of weight on height":  height = horizontal (x) axis, weight = vertical (y) axis.
Four FactsDay 15

The line formula yhat = a + bx  from xbar, ybar, sx , sy , r:
     Find b:   b = r  sy / sx
                (Fact 2r is slope if x and y are standardized.Equation p. 109)
      Find a:  Solve  ybar = a + b xbar for a:  a = ybar - b xbar
               (Fact 3:  (xbar, ybar) lies on the regression line(s).  Equation p. 109)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
The Line formula yhat = a + bx tells us our best prediction or estimate of a response (y) value for a particular value of the explanatory (x) value.  It says NOTHING about how good that "best" is--that is, it says nothing about how tight or scattered the data is around the line.  R-squared does that job.

    r2 is the square of the correlation coefficient r!  (-, + Sign gets lost.)
    If r = .7, about half (.49) of the variability  in the y's is explained by using the regression line relationship to predict y from x. (If weight and height have a correlation of .7, then half of the variability in weight can be explained by knowing height.)
Start here Monday
LEAST SQUARES PROPERTY
"Residual at x" = (y - yhat)  = distance between observed y and  predicted y (= what's left over after predicting)
    ( Positive if observed is bigger than predicted, negative if observed is smaller than predicted)
Least squares principle:  Find the line that minimizes the sums of the squared residuals.(Here, or in Mac 101, ClassMaterials\Math151\ RegressionDemos\RegressionLine.xls, Squares tab)
       This method of finding a "best fit" straight line for predicting y's from x's was derived mathematically to work well with "joint normal" data--elliptical clouds.  For data of this sort, the line does  give the mean of the y's for each given x (at least in the abstract.)
Residuals drawn to line, govsal data (download OK in lab)

Drawback if the data is not the "elliptical cloud" type:
     Outliers get their residual distance squared:  May be very influential  in determining where line sits.
             Especially if at lowest or highest x-values, may change slope of line a lot.
            Author's website,http://www.whfreeman.com/scc, ...Correlation/regression.   Play with an outlier.
 (Outliers toward the middle x's may not change the slope, but may affect r, and r2.)
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Plotting residuals:  This amounts to making the regression line into a new x-axis--If you plot the residuals themselves vs. the original x values, without the distraction of the slanted line, outliers and patterns other than the linear (if any) can emerge. (Here or ClassMaterials\Math151\RegressionDemos\ResidualsRSquared.xls , Graph of Residuals tab.(doesn't have tiny unlined graph)
SPSS can make a new variable of residuals, which you then can use to make a scatterplot. Optional HW.


Sievers home  Math151-Sp04/Dayf16.htm  3:30pm 10/1/04
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.