2) Pull the tail on the square root sign down all the way!
3a) A stemplot is supposed to be quick. The quick way is to read
the numbers as they come and put the leaves on in that order, as if you're
tallying. Trying to order on the first pass is slow and inaccurate;
defeats the purpose.
5c-d) Had the same answer: the number with 15% above it = the
number with 85% below it = 85th percentile.
6a) Since the mean is the total for 100 days divided by 100,
to estimate the total for 30 days, multiply the mean by 30.
b) "Average" is used to refer not only to the mean,
but to the usual, the typical, the most common. In the expression
"better than average" the sense is better than usual or common. ("better
than THE average" might be taken to refer to the mean specifically.)
52 is certainly better than usual, since the median is down at 46.
7c) The Value was meant to be the value of the land. I
didn't take off if you interpreted it as crop value.
9) TIME plot, it says. I thought this would be easy, after
the long discussion of the steel bars timeplot on the pretest in the class
just before the exam. She should plot vit.C results against the order
she analyzed them in, or better, the time since picking, or buying if she
doesn't know picking. Nutrients tend to deteriorate after plants
are picked.
HW assignment Day 16
Reading: Finish 2.3, read 2.4. Skip 2.5. Ahead in
Ch. 3.
| Hand in Monday:
Exercises with four facts, from Day 15: See details there. C. govsal on avgpay (if not handed in already) 2.33, 2.30, 2.35--Note Text &Excel files are put in order, so look different,+ Text is MISSING the 23rd point, (5,56). You can just type it in. 2.47, 2.51 E. RSquared POSTPONE the rest:= = = = = = = = = = = = = = = = = = A. Use ResidualsRSquared from the website or the lab to graph these data sets, along with a graph of the residuals. Print the results, and describe the shape of the residuals (it may help to connect the dots with pencil, to see the pattern.) a) x 1 2 8 4 6 9 y 1 3 6 6 7 5 b) x 1 2 7 4 6 9 y 7 6 2 4 2 1 Moore p. 122, 2.36 speed&gas again a, b, c, d. There is a data file for problem 2.36, and its third column is the residuals (check them against the book). B. Use Author's website, http://www.whfreeman.com/scc, ...Correlation/regression. Make a cloud of data (about 15 points), put in the regression line. Play with an outlier: drag a point to the far left (right) and drag it up and down. Try it if it's in the middle range of x's. Write answer: Where is it most influential? Now add a bunch more points (50 is max.) Play with an outlier again. Does the outlier have more or less influence with a larger data set? Moore p. 123, 2.38 Gesell first word-point in middle of x range. Get the data into SPSS, delete child 19, graph and get the regression line and r2. Use the formula on p.117 and graph the line for the full data set by hand on your printout. r2 for the full data set is on p. 122. Moore p. 122, 2.37 Calories (You saved these, I think--or, from Moore's files, in TA02-04) Graph and get lines in SPSS with and without the outliers. Graph the line for "without outliers" by hand on the printout for "with outliers" so you can compare them better. Print one more graph (with outliers) and keep it for problem C below. |
Read, | Optional
Postpone;==== = = = = = =
|
Regression-- Review comments
ANY Straight line y = a + bx (or bx + a): b,
the coefficient of x, is the slope of the line. If
x changes one unit, y changes b units, so b is the rate of change of
y with respect to x. (If y is weight in pounds, and x is height
in inches, b is the number of pounds we expect to see
weight go up by, per inch that height goes up by.
"Regression line of weight on height":
height = horizontal (x) axis, weight = vertical (y) axis.
Four Facts: Day
15
The line formula yhat = a + bx
from xbar, ybar, sx , sy , r:
Find b: b = r sy
/ sx
(Fact 2: r is slope if x and y are standardized.Equation
p. 109)
Find
a: Solve ybar = a
+ b xbar for a: a = ybar - b xbar
(Fact 3: (xbar, ybar) lies on the regression
line(s). Equation p. 109)
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
The Line formula yhat = a + bx
tells us our best prediction or estimate of a response (y) value
for a particular value of the explanatory (x) value. It says NOTHING
about how good that "best" is--that is, it says nothing about how tight
or scattered the data is around the line. R-squared does that
job.
Drawback if the data is not the "elliptical cloud" type:
Outliers get their residual distance
squared: May be very influential in determining where
line sits.
Especially if at lowest or highest x-values, may change slope of
line a lot.
Author's website,http://www.whfreeman.com/scc,
...Correlation/regression. Play with an outlier.
(Outliers
toward the middle x's may not change the slope, but may affect r, and r2.)
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Plotting residuals: This amounts to making the regression
line into a new x-axis--If you plot the residuals themselves vs.
the original x values, without the distraction of the slanted line, outliers
and patterns other than the linear (if any) can emerge. (Here
or
ClassMaterials\Math151\RegressionDemos\ResidualsRSquared.xls
, Graph of Residuals tab.(doesn't have tiny unlined graph)
SPSS can make a new variable of residuals, which you then can use
to make a scatterplot. Optional HW.
| Sievers home | Math151-Sp04/Dayf16.htm | 3:30pm | 10/1/04 |