Math 151 , Spring 2006, Day 6  Fri. Feb. 8 Hit reload...After class

What's due today?  The Chapter 3 HW (Day 4's assignment).
Day 5's assignment (SPSS) are due on or before Wednesday.  Hand them in as you finish them. You already knew all the statistics except for the added questions in #E.  If you read the book and the handout (page 3) you can get the desired numbers off the output without thinking much what they mean. (IQR = InterQuartile Range).  Write the discussion after we cover the things in class.

Errors in my book: Ch4p43 middle; "for men...a narrower broader peak..."
    Ch5p65top: Needs square root sign over formula (later printings are ok(?))
Day 6 (Wed. Sept 7): Reading:  D&V Ch5, AS Ch5 (D&V, and I, will do medians, quartiles, boxplots first, then mean/s.d.  AS does middles, then spreads, then boxplots.)
Hand in Monday (bring any remaining questions on these; I'll review boxplots.  I don't care if you don't know the "fences" rule--just draw a "whisker" line from each quartile all the way to the min or max .
Ch5, p. 72 
#3,  also make a boxplot.  ("No calculator" means no statistical calculator) 
15 Wines
16 Ozone (note, this is a sort of "time plot" using boxplots instead of dots)
28 Population growth 
p. 107 (review) 18&19 Old Faithful
- - - - - - - - - - 
Postpone Mean/Median. 
p. 72, 7a,b,c,Payroll  Also, with c: What measure would be most useful if you wanted to use it to figure the total weekly payroll cost? 
6 Sick days

+ + + + + + + + + + 
Start now on a separate page, do the parts that you can; keep for the next assignment :  p. 72
19, 20 (no computations needed.  19 d may not be decidable from pictures.  Don't worry about it.)
5 Mistake 
9 Standard deviation Tonight, make (by hand) dot plots of each pair on axes with the same unit size, find the mean of each set and mark it with a little ^ (like fig. 5.6 p. 64).  Notice this looks like a good balance point. Leave space to calculate  some standard deviations next time.  Also, make a dot plot of  #10b set 2 (10, 50, 60, 70, 110).  Which of the data sets in problem 9 does it most resemble?
Read,
 be able to discuss
Read Circle questions: email me any more: sievers@wells.edu
Ch.5: 25 Caffeine
41 Eye & Hair color
31 Reading scores (f is harder; optional)
- - - - - - - - - - - 
Postpone:
http://www.whfreeman.com/scc   or http://bcs.whfreeman.com/ips  Under Student Categories or Student tools,  choose "Statistical Applets", Mean &Median . (50 points max.) Check out symmetric, skewed, distributions with outliers. How far apart can you get the mean and median? 

13 Marriage age.  Ithaca Journal Jan 22, '05 had quiz answers: "How old is the average bride? 24.5 years.... How old is the average groom? 26.5 years." Give some reasons that could account for the big difference between these numbers and the graphed numbers in D&V.

Optional 
ActivStats  lessons on SPSS, in Mac 102: 
on AS pp.1-2, 3-1, 3-2, are a gentle introduction (using raw data).  4-2, 4-3 do continuous data.
Email list: Math151@wells.edu   Math clinic schedule:  From Helpers page.
Cluster:  Tell everyone your name, even if you think they know it.
   Check for Homework questions? Remaining #s on board.
SPSS problems?  Don't postpone...
Leftover HW questions?   What did you see, comparing the speed of your hands?    List of Circle questions.

Two way table questions?
  
Error in description

Ch.5 Summarizing distribution info with numbers 

Measures of middle (center)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
   Mean (most common "average"):  Take sum of all observations & divide by how many (n) p. 63
    (Midrange:  Average the maximum & minimum values.  Very sensitive to outliers.)
  Median:half are bigger, half are smaller
      Point on histogram with half the area to the left, half to the right.

Calculating:  Put observations in numerical order (stemplot!).
      Middle one if n is odd, or average the 2 middle  if n is even.
Formula:  Count in how far?  (n+1)/2 places.  (7 1/2 places? go halfway =average the 7th and 8th observations. Book's method, p.58, is more complicated, same result.)
Spread (dispersion)
   (Standard Deviation s, p. 64.  Next lecture.)
  Range:  Max - Min.  (one number)  Very sensitive to outliers.
  Interquartile range IQR.
     Quartiles Divide data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th percentile.
             (2nd quartile= median = 50th percentile.  Percentiles divide into hundredths)
        3rd quartile Q3: 3/4 below, 1/4 above.  = 75th percentile.
Computation of quartiles:  Different texts, packages use different methods.
By hand: quick and somewhat dirty:
Take the two halves of the data you got from finding the median.  Find the median of each half, using the same rule as before.  (Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you use the median as part of both halves)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.
 
1 3 | 5 6Q1=(3+5)/2= 4.       8 8 |11 15,   Q3= (8+11)/2= 9.5              1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
  The median joins both halves. Each half has (n+1)/2 values.
(9+1)/2 = 5 in each half: Halves are 1 3 5 6 6, and 6 8 8 11 15.  Quartiles are middle values of each half.
Q1=5, Q3= 8                                                                      1 3 5 6 6 8 8 11 20
(This is a dirty method because it doesn't "exactly" divide the data into quarters.  Quick? Yes.  Tukey did a variation on this, throwing away the median instead of giving it to each half.  He called them "Hinges" to avoid fights over the "quartile" name.  People who took the course out of Moore, Basic Practice, a year and a half ago, learned that method.)

Read the following:  I'll review Monday.
Five-number summary:
  min, Q1, Median, Q3, max.

     (1, 4, 7, 9.5, 20  for the set of  8 above, 1, 5, 6, 8, 20  for the set of  9 )
INTERQUARTILE RANGE = IQR= Q3 - Q1.
=The range of the middle half of the observations.  Resistant to outliers!
       9.5 - 4 = 5.5 for the set of 8.   8 - 5 = 3 for the set of 9.
Box (and whisker) plot:  Graphical form of five number summary.
    Especially good for comparing sets of data, conditioned on a categorical variable.
&&"Plain vanilla" Draw and label the numerical scale first.  Then mark the five numbers. Finish the picture.
The box spreads over the middle half (Q1 to Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3 to Max).  Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there.
Demonstration with set of 9. 1 3 5 6 6 8 8 11 20  Direction of boxplot?  Vertical or horizontal is a matter of taste. I do horizontal, usually.

  |-------[ |   ]-----------------------|
0·········5········10········15·········20

"Showing outliers" Outliers can make a boxplot whisker extend deceptively beyond the bulk of the data.
      Make the whiskers to the last item in the "main mass" of the data.
       Put a dot or a star for each outlier,  beyond the whisker end.
   How do we decide what's an outlier?  (Rule of thumb; esp. for computers.)
      Fence:  (Knowing rule is optional) Define "outlier" as a value farther out than 1.5 IQR  from the Quartiles.
          (Q1 - 1.5 IQR is lower fence, Q3 + 1.5 IQR is upper fence.
                For the set of 9, 1.5 IQR = 4.5.  Fences are 5 - 4.5 = .5, and 8 + 4.5 = 12.5.
                   So 20 lies outside the fence, and the whiskers & box  should go from 1 to 11 (largest inside the fence)
        (Dot or *?  Tukey:  Dot ·between 1.5 and 3 IQR's out, * if more than 3 IQR's out. By hand, I don't care.)

  |-------[ |   ]-----|                 *

0·········5········10········15·········20
Boxplots shine at comparing distributions conditioned on several categories .

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
New material Monday:
Mean vs. Median
Mean (most common "average"):  Take sum (aggregate) of all observations & divide by how many (n)
        Metaphors.  1) Center of gravity, balance point of histogram.
                2) Slice off bits from the big and add to the little till everyone has the same.
                    (Or "aggregate"--total-- it all and portion it out evenly.)
        Outlier or long tail will pull mean in that direction (think seesaw balancing)  "Sensitive" to outliers, skewness.
        Especially useful: 1) For symmetric, tidy distributions
            2) When metaphor 2 makes sense--looking for "fair share" of a total.
    Median: half are bigger, half are smaller
        Point on histogram with half the area to the left, half to the right.
                "Resistant to skewness and outliers"--trimming off ends will make little difference in median value.
        More "typical" than mean, if there is skewness or outliers.
     (Badly bimodal distribution?--"middle" doesn't mean much. Give values at modes.
         Extremely skewed or J-shaped?  Mode (value at peak) might better tell most typical)
    Symmetric distribution: mean = medianSkewedmean pulled to long-tail side of median.
Investigate differences:  Activstats 5-3.   See ActivStats&SPSS Info for details of use.
David S. Moore's websites http://www.whfreeman.com/scc or http://bcs.whfreeman.com/ips5e.  Under Student Categories or Student tools,  choose "Statistical Applets", Mean &Median . Check out symmetric, skewed, distributions with outliers.

Next, Standard Deviation

Sievers home  Math151-Sp06/Daysp6.htm  2:30pm 2/10/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.