Day 3 (Wed. Aug.29) Assigned:
(Re) Read: 1.2 thru p. 53 .Read for next class 53-55
( linear transformations)We'll also start 1.3, Normal distributions.
| Hand in Friday: p. 56 ff. With Applet http://www.whfreeman.com/ips5e: 1.55, 56, 57 1.48 (0's effect) 1.41 (tuitions, boxplot) 1.62, 63, 64 (income, cf boxplots) 1.60 (logging) Make side-by-side stemplots, 5-number summaries, and side-by-side boxplots. Discuss. P. 63, Read 1.71, 1.77 (guinea pigs, trimmed mean, cf. 1.36) Make a boxplot of the data, with or without outliers (you choose). We'll do trimmed means in SPSS, soon. 1.75 quintiles by hand. Method p. 45 middle Can do now: Save to be part of Day 4: p. 58, 1.50 (Do xbar and s by hand. Then put them in SPSS Handout & do them.) |
Read, discuss 1.47 &1.49(salary) A. If a distribution is skewed right, the mean will be on the /right?/left?/ of the median. (Check with the Mean&Median Applet) p. 95, 1.131 (mode, median) p. 95, 1.133 B. Forbes magazine reported (1995) that the "average" household wealth of its readers was either about $800,000, or $2.2 million, depending on what "average" it used. Which is mean/median? |
Optional Guinea pigs, 1.36, Table 1.8 Use http://www.whfreeman.com/ips5e One Variable Statistical Calculator Applet to get the histogram. Compare boxplot (HW) with histogram: longer boxplot sections mean lower histogram height and vice versa. |
Handout SPSS mean
and standard deviation quickly.
Homework questions? (Homework is to be handed in Friday. Sorry for
the confusion.) #s on board:
Note p. 53, fig. 1.20 shows a stemplot with negative numbers.
Need two "0" stems!
Section
1.2: Summarizing distribution info with numbers
Measures of Middle
(central tendency)
--Colloquially
"average" can refer to any measure of middle, so watch out; be
more
specific.
Mean (most common
"average")"x-bar":
Take sum (aggregate) of all observations and divide by how many (n)
(Formula p. 41)
Metaphors.
1) Center of gravity, balance
point
of histogram.
2) Slice off bits from the big and add to
the little till everyone has the same.
(Or "aggregate"--total-- it all and portion it out evenly.)
Outlier
or long tail will pull mean in that direction (think seesaw
balancing)
"Sensitive" to outliers, skewness.
Especially
useful: 1) For symmetric, tidy distributions
2) When metaphor 2 makes sense--looking for "fair share" of a total.
Median: half are
bigger,
half are smaller
Point
on histogram with half the area to the left, half to the right.
Calculating:
Put observations in numerical order (stemplot!).
Middle one if n is odd, or average the 2 middle if n
is
even.
Formula: Count in how far? (n+1)/2 places. (7
1/2 places? go halfway =average the 7th and 8th observations)
"Resistant
to skewness and outliers"--trimming off ends will make little
difference
in median value
--changing a few values has little effect on the measure.
More
"typical" than mean, if there is skewness or outliers.
(Badly bimodal
distribution--"middle"
doesn't mean much.)
Symmetric
distribution:
mean
= median
Website(
http://www.whfreeman.com/ips5e)
or CD: "Statistical Applets",
Mean &Median.
Check out symmetric, skewed, distributions with outliers.
Measures of Spread (dispersion)
Quartiles: (Q1=25th,
Q3=75th percentile
This quick-and-dirty method
is from Tukey, who called them "Hinges".
Just take the median of each "half" of the data.
Detail. IF you had an even number of
observations to start with, the data divides evenly into an upper and a
lower half. IF you had an odd number to
start
with, you have one in the middle, the median. In this
case only,
you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20,
are n=8 observations.
Median at
(8+1)/2= 9/2=4 1/2th ; 1
3 5 6 8 8 11 20,
M = 7
8/2 = 4
in each half: Halves are 1 3 5 6,
and 8 8 11 15.
The quartiles are the medians of each half; count in (4+1)/2=
2 1/2. 1
3 5 6, Q1=(3+5)/2=
4.
8 811 15. Q3= (8+11)/2=
9.5
1 3 | 5 6 | 8 8 | 11 20
1 3 5 6 6 8 8 11 20, are
n=9 observations.
Median at (9+1)/2=10/2=5th ; 1
3 5 6 8 8 11 20,
M = 6
Throw
away the median. Now we have an even number again, 8 numbers
8/2 = 4 in
each half: Halves are 1 3 5 6,
and 8 8 11 15.
Continue as before. (This is a dirty method
because
it gives the same quartiles for both these data sets. Quick
because
computation is minimal and simple.)
1 3 | 5 66 8 8 | 11
20
Annoying detail: Some
books do this but (odd n only)
keep the middle value with each half: then
halves are 1 3 5 6 6,
and 6 8 8 11 15. Do it
Moore's way this term, please.
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)Box (and whisker) plot: Graphical form of five number summary.
=The range of the middle half of the observations. Resistant to outliers!
Boxplots, modified, showing outliers as dots. The
outlier rule p. 47 is good to know
about but don't bother to memorize it. If you're doing a boxplot
by hand just use your judgment about what's a suspected outlier.
"Plain vanilla--Moore" Draw
and label the numerical scale first. Then mark the five numbers.
Finish the picture.
The box spreads over the middle half (Q1 to
Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3
to
Max). Each section shows the spread of 1/4 of the data: the
longer
the section the thinner the data must be spread in there.
Can "read" skewness.
Demonstration with set of 9. 1 3 | 5 66 8 8 | 11
20 5#summ: 1, 4, 6,
9.5, 20
Direction
of boxplot? Vertical or horizontal is a matter of taste. I
do horizontal, usually.
|-----[ |
]--------------------|
0·········5·········10········15········20
Other percentiles: 70th Percentile: 70% of
observations are
at or below the 70th Percentile. M&M give a quick &
dirty method at the end of example 1.14, p.
45: Take 0.70×n, round; count to that item. (More
exact methods exist,
but there is not universal acceptance of any. The
practical differences are small. )
Spread,
cont.
Standard deviation (goes with mean)
Variance: (almost) average
of squared deviations from the mean.
(deviations sum to 0)
(Divide by (n-1)
"degrees of freedom"--dimension of vector space
spanning
the deviations from the mean)
s
: Standard deviation is the square
root
of the variance. Formula p. 49-50.
Computation: I will require you to know how to do it by hand for
up to 7 observations (use a table). Example.
Physics: angular momemtum (spinning ice skater)
Not so weird: High school geometry?
Remember Pythagorean theorem: c2 = a2
+ b2:
hypotenuse of right triangle is also the square root of a sum of
squares. Length of a vector..
Very
sensitive to outliers (squared deviations do it)
>0 unless all
observations
are identical.
Mean/standard deviation
pair useful for symmetric, unimodal (one-humped), no outliers.
("Normal"
dist.)
SPSS to find mean and s.d. Handout
| Sievers home | Math251-Fall07/Day2s3.htm | 6pm | 8/28/07 |