MATH 251, Probability and Statistics I, Fall 2005, Wed. Aug.31, Day 3

Unless otherwise noted, all assignments are in Moore & McCabe,"IPS"
Italics are notes to myself--which problem is it?

Day 3 (Wed. Aug.31) Assigned:
   (Re) Read:  1.2 thru p. 53 .Read for next class 53-55 ( linear transformations)We'll also start 1.3, Normal distributions.
Hand in:  p. 56 ff.
With Applet: 1.55, 56, 57
1.48 (0's effect)
 1.41 (tuitions, boxplot)
1.62, 63, 64 (income, cf boxplots)
1.60 (logging)  Make side-by-side stemplots, 5-number summaries, and side-by-side boxplots. Discuss.

Read 1.71, 1.77 (guinea pigs, trimmed mean, cf. 1.36) 
      Make a boxplot of the data, with or without outliers (you choose).
       We'll do trimmed means with SPSS, next week.

Will assign next class:
1.50 (Do xbar and s by hand.  Then put them in SPSS & do them.)
Read, discuss 
1.47 &1.49(salary)
A.  If a distribution is skewed right,
the mean will be on the /right?/left?/ of the median. (Check with the Mean&Median Applet)
p. 95, 1.131 (mode, median)
p.  95, 1.133
B.  Forbes magazine reported (1995) that the "average" household wealth of its readers was either about $800,000, or $2.2 million, depending on what "average" it used.  Which is mean/median?
Optional

Math clinic hours
Applets work? 
Author's website (different book) http://www.whfreeman.com/scc

Section 1.2:  Summarizing distribution info with numbers
Measures of Middle (central tendency)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
    Mean (most common "average"):  Take sum (aggregate) of all observations and divide by how many (n)
        Metaphors.  1) Center of gravity, balance point of histogram.
                2) Slice off bits from the big and add to the little till everyone has the same.
                    (Or "aggregate"--total-- it all and portion it out evenly.)
        Outlier or long tail will pull mean in that direction (think seesaw balancing)  "Sensitive" to outliers, skewness.
        Especially useful: 1) For symmetric, tidy distributions
            2) When metaphor 2 makes sense--looking for "fair share" of a total.
    Median: half are bigger, half are smaller
        Point on histogram with half the area to the left, half to the right.
        Calculating:  Put observations in numerical order (stemplot!).
                          Middle one if n is odd, or average the 2 middle  if n is even.
                Formula:  Count in how far?  (n+1)/2 places.  (7 1/2 places? go halfway =average the 7th and 8th observations)
        "Resistant to skewness and outliers"--trimming off ends will make little difference in median value
               --changing a few values has little effect on the measure.

        More "typical" than mean, if there is skewness or outliers.
     (Badly bimodal distribution--"middle" doesn't mean much.)
    Symmetric distribution: mean = median
Website or CD:  "Statistical Applets", Mean &Median. Check out symmetric, skewed, distributions with outliers. 

Measures of Spread (dispersion)
Quartiles:  (
Q1=25th, Q3=75th percentile This quick-and-dirty method is from Tukey, who called them "Hinges".
   Just take the median of each "half" of the data.
Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.  1 3 | 5 6Q1=(3+5)/2= 4.
8 8 |11 15. Q3= (8+11)/2= 9.5                                             1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
 Throw away the median.  Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  Continue as before. (This is a  dirty method because it gives the same quartiles for both these data sets.  Quick because computation is minimal and simple.)
1 3 | 5 6 6 8 8 | 11 20
Annoying detail: 
Some books
(this year's 151) do this but (odd n only) keep the middle value with each half: then
halves are 1 3 5 6 6, and 6 8 8 11 15.   Do it Moore's way this term, please.

Five-number summary:  min, Q1, Median, Q3, max.  (1, 4, 7, 9.5, 20  for the set of 8 above)
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)
      =The range of the middle half of the observations.  Resistant to outliers!
Box (and whisker) plot:  Graphical form of five number summary.
    Especially good for comparing sets of data.
"Plain vanilla" Draw and label the numerical scale first.  Then mark the five numbers. Finish the picture.
The box spreads over the middle half, the whiskers over the smallest and largest quarter.  Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there.
Demonstration.  Direction of boxplot?  Vertical or horizontal is a matter of taste.  I do horizontal, usually.

Boxplots, modified, showing outliers as dots.  The outlier rule p. 47 is good to know about but don't bother to memorize it.  If you're doing a boxplot by hand just use your judgment about what's a suspected outlier.

Other percentiles: 
70th Percentile: 70% of observations are at or below the 70th Percentile.   M&M give a quick & dirty method at the end of example 1.14, p. 45:  Take 0.70×n, round; count to that item.  (More exact methods exist, but there is not universal acceptance of any.    The practical differences are small. )

NEXT CLASS: Standard deviation and variance:  I'll expect you to memorize the formula, and to be able to calculate this by hand for up to 7 numbers.


Sievers home  Math251-Fall05/Dayps3.htm    10pm    8/30/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.