Day 3 (Wed. Aug.31) Assigned:
(Re) Read: 1.2 thru p. 53 .Read for next class 53-55
( linear transformations)We'll also start 1.3, Normal distributions.
| Hand in: p. 56 ff. With Applet: 1.55, 56, 57 1.48 (0's effect) 1.41 (tuitions, boxplot) 1.62, 63, 64 (income, cf boxplots) 1.60 (logging) Make side-by-side stemplots, 5-number summaries, and side-by-side boxplots. Discuss. Read 1.71, 1.77 (guinea pigs, trimmed mean, cf. 1.36) Make a boxplot of the data, with or without outliers (you choose). We'll do trimmed means with SPSS, next week. Will assign next class: 1.50 (Do xbar and s by hand. Then put them in SPSS & do them.) |
Read, discuss
1.47 &1.49(salary) A. If a distribution is skewed right, the mean will be on the /right?/left?/ of the median. (Check with the Mean&Median Applet) p. 95, 1.131 (mode, median) p. 95, 1.133 B. Forbes magazine reported (1995) that the "average" household wealth of its readers was either about $800,000, or $2.2 million, depending on what "average" it used. Which is mean/median? |
Optional |
Section
1.2: Summarizing distribution info with numbers
Measures of Middle
(central tendency)
--Colloquially
"average" can refer to any measure of middle, so watch out; be
more
specific.
Mean (most common
"average"):
Take sum (aggregate) of all observations and divide by how many (n)
Metaphors.
1) Center of gravity, balance
point
of histogram.
2) Slice off bits from the big and add to
the little till everyone has the same.
(Or "aggregate"--total-- it all and portion it out evenly.)
Outlier
or long tail will pull mean in that direction (think seesaw
balancing)
"Sensitive" to outliers, skewness.
Especially
useful: 1) For symmetric, tidy distributions
2) When metaphor 2 makes sense--looking for "fair share" of a total.
Median: half are
bigger,
half are smaller
Point
on histogram with half the area to the left, half to the right.
Calculating:
Put observations in numerical order (stemplot!).
Middle one if n is odd, or average the 2 middle if n
is
even.
Formula: Count in how far? (n+1)/2 places. (7
1/2 places? go halfway =average the 7th and 8th observations)
"Resistant
to skewness and outliers"--trimming off ends will make little
difference
in median value
--changing a few values has little effect on the measure.
More
"typical" than mean, if there is skewness or outliers.
(Badly bimodal
distribution--"middle"
doesn't mean much.)
Symmetric
distribution:
mean
= median
Website or CD: "Statistical Applets",
Mean &Median.
Check out symmetric, skewed, distributions with outliers.
Measures of Spread (dispersion)
Quartiles: (Q1=25th,
Q3=75th percentile
This quick-and-dirty method
is from Tukey, who called them "Hinges".
Just take the median of each "half" of the data.
Detail. IF you had an even number of
observations to start with, the data divides evenly into an upper and a
lower half. IF you had an odd number to
start
with, you have one in the middle, the median. In this
case only,
you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20,
are n=8 observations.
Median at
(8+1)/2= 9/2=4 1/2th ; 1
3 5 6 8 8 11 20,
M = 7
8/2 = 4
in each half: Halves are 1 3 5 6,
and 8 8 11 15.
The quartiles are the medians of each half; count in (4+1)/2=
2 1/2. 1
3 5 6, Q1=(3+5)/2=
4.
8 811 15. Q3= (8+11)/2=
9.5
1 3 | 5 6 | 8 8 | 11 20
1 3 5 6 6 8 8 11 20, are
n=9 observations.
Median at (9+1)/2=10/2=5th ; 1
3 5 6 8 8 11 20,
M = 6
Throw
away the median. Now we have an even number again, 8 numbers
8/2 = 4 in
each half: Halves are 1 3 5 6,
and 8 8 11 15.
Continue as before. (This is a dirty method
because
it gives the same quartiles for both these data sets. Quick
because
computation is minimal and simple.)
1 3 | 5 66 8 8 | 11
20
Annoying detail: Some
books (this year's 151) do this but (odd n only)
keep the middle value with each half: then
halves are 1 3 5 6 6,
and 6 8 8 11 15. Do it
Moore's way this term, please.
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)Box (and whisker) plot: Graphical form of five number summary.
=The range of the middle half of the observations. Resistant to outliers!
Boxplots, modified, showing outliers as dots. The
outlier rule p. 47 is good to know
about but don't bother to memorize it. If you're doing a boxplot
by hand just use your judgment about what's a suspected outlier.
Other percentiles: 70th Percentile: 70% of observations are
at or below the 70th Percentile. M&M give a quick &
dirty method at the end of example 1.14, p.
45: Take 0.70×n, round; count to that item. (More
exact methods exist,
but there is not universal acceptance of any. The
practical differences are small. )
NEXT CLASS: Standard deviation and
variance:
I'll expect you to memorize the formula, and to be able to calculate
this
by hand for up to 7 numbers.
| Sievers home | Math251-Fall05/Dayps3.htm | 10pm | 8/30/05 |