MATH 251, Probability and Statistics I, Fall 2007, Wed. Aug.29, Day 3

Hit reload for most current version

Unless otherwise noted, all assignments are in Moore & McCabe,"IPS"
Italics are notes to myself--which problem is it?

Day 3 (Wed. Aug.29) Assigned:
   (Re) Read:  1.2 thru p. 53 .Read for next class 53-55 ( linear transformations)We'll also start 1.3, Normal distributions.
Hand in Friday:  p. 56 ff.
With Applet http://www.whfreeman.com/ips5e: 1.55, 56, 57
1.48 (0's effect)
 1.41 (tuitions, boxplot)
1.62, 63, 64 (income, cf boxplots)
1.60 (logging)  Make side-by-side stemplots, 5-number summaries, and side-by-side boxplots. Discuss.

P. 63, Read 1.71, 1.77 (guinea pigs, trimmed mean, cf. 1.36) 
      Make a boxplot of the data, with or without outliers (you choose).   We'll do trimmed means in SPSS, soon.

1.75 quintiles by hand. Method p. 45 middle

Can do now:  Save to be part of Day 4:
p. 58, 1.50 (Do xbar and s by hand.  Then put them in SPSS Handout & do them.)
Read, discuss 
1.47 &1.49(salary)
A.  If a distribution is skewed right,
the mean will be on the /right?/left?/ of the median. (Check with the Mean&Median Applet)
p. 95, 1.131 (mode, median)
p.  95, 1.133
B.  Forbes magazine reported (1995) that the "average" household wealth of its readers was either about $800,000, or $2.2 million, depending on what "average" it used.  Which is mean/median?
Optional

Guinea pigs, 1.36, Table 1.8 Use  http://www.whfreeman.com/ips5e
One Variable Statistical Calculator Applet
to get the histogram.   Compare boxplot (HW) with histogram:  longer boxplot sections mean lower histogram height and vice versa.

Handout SPSS mean and standard deviation quickly.
Homework questions? (Homework is to be handed in Friday.  Sorry for the confusion.) #s on board:
Note p. 53, fig. 1.20 shows a stemplot with negative numbers.  Need two "0" stems!

Section 1.2:  Summarizing distribution info with numbers

Measures of Middle (central tendency)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
    Mean (most common "average")"x-bar":  Take sum (aggregate) of all observations and divide by how many (n)  (Formula p. 41)
        Metaphors.  1) Center of gravity, balance point of histogram.
                2) Slice off bits from the big and add to the little till everyone has the same.
                    (Or "aggregate"--total-- it all and portion it out evenly.)
        Outlier or long tail will pull mean in that direction (think seesaw balancing)  "Sensitive" to outliers, skewness.
        Especially useful: 1) For symmetric, tidy distributions
            2) When metaphor 2 makes sense--looking for "fair share" of a total.
    Median: half are bigger, half are smaller
        Point on histogram with half the area to the left, half to the right.
        Calculating:  Put observations in numerical order (stemplot!).
                          Middle one if n is odd, or average the 2 middle  if n is even.
                Formula:  Count in how far?  (n+1)/2 places.  (7 1/2 places? go halfway =average the 7th and 8th observations)
        "Resistant to skewness and outliers"--trimming off ends will make little difference in median value
               --changing a few values has little effect on the measure.

        More "typical" than mean, if there is skewness or outliers.
     (Badly bimodal distribution--"middle" doesn't mean much.)
    Symmetric distribution: mean = median
Website( http://www.whfreeman.com/ips5e) or CD:  "Statistical Applets", Mean &Median. Check out symmetric, skewed, distributions with outliers. 

Measures of Spread (dispersion)
Quartiles:  (
Q1=25th, Q3=75th percentile This quick-and-dirty method is from Tukey, who called them "Hinges".
   Just take the median of each "half" of the data.
Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.  1 3 | 5 6Q1=(3+5)/2= 4.
     8 8 |11 15. Q3= (8+11)/2= 9.5                                      1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
 Throw away the median.  Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  Continue as before. (This is a  dirty method because it gives the same quartiles for both these data sets.  Quick because computation is minimal and simple.)
1 3 | 5 6 6 8 8 | 11 20
Annoying detail: 
Some books
do this but (odd n only) keep the middle value with each half: then
halves are 1 3 5 6 6, and 6 8 8 11 15.   Do it Moore's way this term, please.

Five-number summary:  min, Q1, Median, Q3, max.  (1, 4, 7, 9.5, 20  for the set of 8 above)
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)
      =The range of the middle half of the observations.  Resistant to outliers!
Box (and whisker) plot:  Graphical form of five number summary.
    Especially good for comparing sets of data.

Boxplots, modified, showing outliers as dots.  The outlier rule p. 47 is good to know about but don't bother to memorize it.  If you're doing a boxplot by hand just use your judgment about what's a suspected outlier.
"Plain vanilla--Moore"
Draw and label the numerical scale first.  Then mark the five numbers. Finish the picture.

The box spreads over the middle half (Q1 to Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3 to Max).  Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there.   Can "read" skewness.
Demonstration with set of 9. 1 3 | 5 6 6 8 8 | 11 20    5#summ: 1, 4, 6, 9.5, 20
Direction of boxplot?  Vertical or horizontal is a matter of taste. I do horizontal, usually.

  |-----[   |      ]--------------------|
0·········5·········10········15········20

Showing outliers p.47ff  Outliers can make a boxplot whisker extend deceptively beyond the bulk of the data.
      Make the whiskers to the last item in the "main mass" of the data.
       Put a dot or a star for each outlier,  beyond the whisker end.
   How do we decide what's an outlier?  By hand; use your judgement.
     (Rule of thumb
p. 48: Knowing rule is optional--used by computers ) Define "outlier" as a value farther out than 1.5 IQR  from the Quartiles.
          (Q1 - 1.5 IQR is lower "fence", Q3 + 1.5 IQR is upper "fence".)
                For the set of 9, 1.5 IQR = 1.5×5.5. = 8.25. Fences are 4 - 8.25 = -4.25, and 8 + 8.25 = 16.25.
                   So 20 lies outside the fence, and the whiskers & box  should go from 1 to 11 (largest inside the fence)
        (Dot or *?  Tukey:  Dot ·between 1.5 and 3 IQR's out, * if more than 3 IQR's out. By hand, I don't care. Here a * because it shows better.)

  |-----[   |      ]--|                 *
0·········5·········10········15········20 
   This is the same as we would have done without the rule, probably.

Other percentiles:  70th Percentile: 70% of observations are at or below the 70th Percentile.   M&M give a quick & dirty method at the end of example 1.14, p. 45:  Take 0.70×n, round; count to that item.  (More exact methods exist, but there is not universal acceptance of any.    The practical differences are small. )

Spread, cont.
Standard deviation (goes with mean)
           Variance:  (almost) average of squared deviations from the mean.
                  (deviations sum to 0)
                 (Divide by (n-1) "degrees of freedom"--dimension of vector space spanning the deviations from the mean)
       s : Standard deviation  is the square root of the variance.  Formula p. 49-50.
                Computation:  I will require you to know how to do it by hand for up to 7 observations (use a table). Example.
             Physics: angular momemtum (spinning ice skater)
             Not so weird: High school geometry?
                Remember Pythagorean theorem: c= a2  + b2:
                hypotenuse of right triangle is also the square root of a sum of squares. Length of a vector.. 
        Very sensitive to outliers (squared  deviations do it)
        >0 unless all observations are identical.
     Mean/standard deviation pair useful for symmetric, unimodal (one-humped), no outliers. ("Normal" dist.)
SPSS to find mean and s.d.   Handout


Sievers home  Math251-Fall07/Day2s3.htm    6pm    8/28/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.