Math 151 , Fall '08, Day 3, Wed., Sept. 3 .After Class, Th. 1:00.  Hit reload to get most current version

HW assignment  Day3  (From Moore unless otherwise noted.)

Reading:  Finish Ch.1, + stemplot  handout, "Check" problems p. 24 1.14, 16 thru 22.
Timeplot, p.22-3.  You will need to be able to recognize cyles and trends, not make timeplot by hand. (We'll make them in SPSS later.)
Read Ch.2 thru p. 43, then thru p. 47.  Do "check" p. 56, 2.13, 14, 16 (mean/median)  15,17,18 (5#summary/boxplot) Further: Finish Ch. 2.
Do the means and medians required here by hand (with a calculator).  

Hand in Friday
p.14, 1.7 Revisting histogram: histogram bins: Use applet at  http://www.whfreeman.com/bps4e, as in class last time.  Help.

p. 35, 1.43 Orange prices timeplot
- - - - -
p.39, 2.1 Wood, mean Punch the 20 actual values into your calculator, adding and dividing by 20. 
A.  Find the (approximate) median for the data on Wood breakage, using the numbers in the stemplot on p. 21 (Fig.1.10.)  It's approximate because the stemplot data is rounded--Quick and Dirty is often sufficient!  Keep a copy for #2.5, which will be assigned soon.

p. 41, 2.4  Bonds Home runs Make a stemplot to put the numbers in order to find the medians.  For the means, just punch them in. (You can shorten the work by finding the sum of the 18 years excluding the 73, writing that down, and then adding the 73 to get the total for the 19 years.  Then divide the appropriate sums by 19 and 18.)
A.  Using the Handout: Wages in our region:  How are the occupations organized (Clearly not alphabetically). For what occupations is the wage scale probably skewed left?  (Marker:  mean is less than median.)   For what other occupations is the skewness not too extreme (Mean no more than about $1000 greater than median)?  
p. 41, 2.3,  p. 57, 2.23, 2.24  mean or median?

= = = Postpone the rest= = = = =
p. 58, 2.29 fruit eating
p. 58, 2.30 newborns.  (I said I wouldn't make you make a histogram, but the data's already pre-binned, so do it here.) Also Describe the distribution--symmetric, skewed?
p. 58, 2.28 U. endowments.  They mean, what do you have to count in to, in the list, to locate the mean and quartiles?

p. 59, 2.34 guinea pigs survival:  For a) use the One Variable Statistical Calculator Applet at  http://bcs.whfreeman.com/bps4e   or on your text's CD (If you have an older, used book, it may be  in the datasets as if for BPS3e; ex02-23.dat).  Just observe the skewness.  For b), find the 5-number summary (easy since they're in order in the book), check your answers with the Applet results.  Draw the boxplot and compare with the histogram on your screen.  (with or without outliers, I don't care.)
p. 45, 2.5 Wood again. Go ahead and use the stemplot figures to find the quartiles.  Also make a boxplot.
p.58, 2.27 Flower length: Find the 5-number summary for bihai, from the stemplot p. 55. If you want more practice, do the other 2 by hand also, but you may just use the numbers from the answers in the back of the book.  Use them to make 3 side by side boxplots, and finish the problem as written.

"Read," to discuss (be able to answer in class)

p.34,1.40 coins (skewed left)

- - - -
Ch. 2
p. 57-8, 2.25 Dr's salaries.  Look at the answers in the back for the mean and median.
 
 

p. 58, 2.26 Resistance, with Applet
 http://bcs.whfreeman.com/bps4e   Or use CD from book.  Choose "Statistical Applets",Mean &Median. Also, add more points (up to 50 total). Check out symmetric,  skewed, distributions with outliers.

Optional 

 P. 59, 2.32 (mean/median play, with Applet)
p. 63, 2.42, 43 (more play, with pencil)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Introduce yourself to at least 2 other people in the class.  Check HW with neighbors.  Tell me which to discuss!
Sign in.  Note Class members is now up.  Check that you're correct.
Handing back Pretests
MathClinic (Mac 120) hours: Helpers

Turning in HW out of class:  NOT Campus Mail!  Into 151 box outside my door, into yellow folder if it's there.
   (Other papers--for me--under my door, please!)

Ch. 1, Review.
Data:  Numbers (usually) in context:  What, Who (how many), Why?  When and Where? How?
            When?  Class has changed already since this compilation../StudatF08.xls  
Distribution of one variable:  what values, how many (or what proportion) of each.
Graphical summaries of data: Area represents proportion.
       Quantitative: Shape (symmetric, skewed (think smeared, or sliding) right or left.
               (
(&& bell-curve (Ch 3), J-shaped (is really skewed (fig.1.15a p.31)) )),
             (Humps:  uni- or bi- modal (multi-)   Two peaks = two "causes"?)   Outliers (giraffe with the zebras?)
              Center, spread--rough --specific measures next  Hand around:  "Living Histograms"
   Lurking variable:  One that affects your data but perhaps you didn't think/know to measure!
             
(pulse rate--running, stairs, nervousness, sex?  Height--sex)   Missing data? (why?)

HW questions? (nonstemplot) Day2 

Stemplots (Stem-and-Leaf) are a powerful hand tool.  Tally, with value added.
     !!Unordered first,!! then ordered if necessary.  By tens, then split?  
         Splitting stem 5 ways--2 leaves on each:  optional labels *  t  f  s  .
               0-1 * (start).  2-3 t (two three). 4-5 f (four five).  6-7 s (six seven).   8-9 . ( period/end)
         Truncate is faster! (corresponds to bin edges at "whole numbers")
        Back to back, comparing two groups.   (or side-by-side on same scale, cf. p55 fig. 2.5):
                 
       ../StudatSp08plus.xls (scroll down)
Data source? Lurking variables? 
Heights-- several past classes.   Variability happens.   Things settle down on average, BUT inferences are never certain.
HW questions? (stemplot)  
    Statistics will give us a language for talking about uncertainty.

Choosing a display (by hand)
    A dot plot Day 2 is most useful for n = 3 to about 15-20, or when the data only fall on a few values (just stack the dots up).
    A stemplot is good for continuous data, smeared around; you can do 100 values in 3-5 minutes.


Time plot. (pp. 17-19) Time on horiz. axis, values on vertical.  trend? (general slope up or down). Cyclic?

  --Beware of extrapolation --predicting a time trend into the future.
  -- Research data: time, or order of taking measurements, is often a lurking variable.  Always do a time plot.

..
Ch. 2:  Summarizing distribution info with numbers
Measures of middle (central tendency)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
    Mean (most common "average") "x-bar":  Take sum (aggregate) of all observations and divide by how many (n) . Formula p. 38. 
        Metaphors.  1) Center of gravity, balance point of histogram.
                2) Slice off bits from the big and add to the little till everyone has the same.
                    (Or "aggregate"--total-- it all and portion it out evenly.)
        Outlier or long tail will pull mean in that direction (think seesaw balancing)  "Sensitive" to outliers, skewness.
        Especially useful: 1) For symmetric, tidy distributions
            2) When metaphor 2 makes sense--looking for "fair share" of a total.
                    (1,1,2,4 cookies eaten by 4 people, mean = 2.   But 1,1,2,4, 12 (n=5):  mean =4.)

     Median: half are bigger, half are smaller
        Point on histogram with half the area to the left, half to the right.
        Calculating:  Put observations in numerical order (stemplot!).   
                    (For our hand calculations:  Accept the small variation caused by truncation or rounding in the stemplot.  (Quick and dirty!))
                          Middle one if n is odd, or average the 2 middle  if n is even.
                Formula:  Count in how far?  (n+1)/2 places.  ( 14 items--> 7 1/2 places? go halfway =average the 7th and 8th observations)                         

        "Resistant to skewness and outliers"--trimming off ends will make little difference in median value.
        More "typical" than mean, especially if there is skewness or outliers.
     (Badly bimodal  distribution--"middle" doesn't mean much.)
    Symmetric distribution: mean = median
Author's website http://bcs.whfreeman.com/bps4e, or Applets on your CD.   "Statistical Applets", Mean &Median. Check out symmetric, skewed, distributions with outliers.
     Handout:  Wages in our region (the Southern Tier) 
START ABOUT HERE Friday:
Almost all income, wage, wealth data is skewed right. Less so if category very narrow (Sometimes higher pay goes only with a new job category in the same place--e.g. Food prep worker, manager of same. ).
&& Other "averages"/ middles:   (Many:  e.g. trimmed mean: throw away, say top and bottom 5%, take mean of rest.)
 Midrange:  Point midway on the ruler scale between smallest and largest:  Min = 5, Max = 15, Midrange = (5+15)/2= 10.
      Highly sensitive, non-resistant, not too useful, but quick!.
 The Mode/modal class: (Mode: most "popular")  Group with the most individuals; peak of the histogram "curve".


Measures of
Spread
(dispersion, variability)  distributions with different spreads
    Range:  largest - smallest.   Resistant?  NO!  Two observations carry all the info; the rest could be anywhere.

Dot plots of 3 distributions, all with same range:
.        .
.        .
.        .
.        .
__________
                                   We need measures of spread that will better take into account  all the observations:
..........
__________
           Quartiles, five-number summaries, boxplot, InterQuartile Range.
    ..
    ..
.   ..   .
__________
                                      (Variance), Standard deviation.

Quartiles Divide data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th percentile.
             (2nd quartile= median = 50th percentile)
            3rd quartile Q3: 3/4 below, 1/4 above.  = 75th percentile.

Computation of quartiles:  Different texts, packages use different methods. (different last year!)
By hand: We'll use Tukey's quick and dirty: (he called them "hinges")
Take the two halves of the data you got from finding the median.  Find the median of each half, using the same rule as before.  (Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half. No problem.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2. 
1 3 | 5 6
Q1=(3+5)/2= 4.         8 8 |11 15. Q3= (8+11)/2= 9.5                                          
                                                           
1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
 Throw away the median.  Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  Continue as before. (This is a  dirty method because it gives the same quartiles for both these data sets.  Quick because computation is minimal and simple.)
1 3 | 5 6 6 8 8 | 11 20

Five-number summary:  min, Q1, Median, Q3, max.  (1, 4, 7, 9.5, 20  for the set of 8 above)
    INTERQUARTILE RANGE
= IQR= Q3 - Q1.
(9.5 - 4 = 5.5 for both sets above)
      =The range of the middle half of the observations.  Resistant to outliers!
How to put numbers in order?  Stemplot is good! 
  StudatSp08plus.xls (scroll down)

Box (and whisker) plot: 
Graphical form of five number summary.
    Especially good for comparing sets of data, conditioned on a categorical variable.
"Plain vanilla--Moore" Draw and label the numerical scale first.  Then mark the five numbers. Finish the picture.
The box spreads over the middle half (Q1 to Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3 to Max).  Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there.   Can "read" skewness.
Demonstration with set of 9. 1 3 | 5 6 6 8 8 | 11 20    5#summ: 1, 4, 6, 9.5, 20
Direction of boxplot?  Vertical or horizontal is a matter of taste. I do horizontal, usually.

  |-----[   |      ]--------------------|
0·········5·········10········15········20

"Showing outliers" p.45ff. Outliers can make a boxplot whisker extend deceptively beyond the bulk of the data.
      Make the whiskers to the last item in the "main mass" of the data.
       Put a dot or a star for each outlier,  beyond the whisker end.
   How do we decide what's an outlier?  By hand; use your judgement.
     (Rule of thumb
: Knowing rule is optional--used by computers) Define "outlier" as a value farther out than 1.5 IQR  from the Quartiles.
          (Q1 - 1.5 IQR is lower "fence", Q3 + 1.5 IQR is upper "fence".)
                For the set of 9, 1.5 IQR = 1.5×5.5. = 8.25. Fences are 4 - 8.25 = -4.25, and 8 + 8.25 = 16.25.
                   So 20 lies outside the fence, and the whiskers & box  should go from 1 to 11 (largest inside the fence)
        (Dot or *?  Tukey:  Dot ·between 1.5 and 3 IQR's out, * if more than 3 IQR's out. By hand, I don't care. Here a * because it shows better.)

  |-----[   |      ]--|                 *
0·········5·········10········15········20 
   This is the same as we would have done without the rule, probably.

Example:  p.60, 2.34 Guinea pig survival: (redo for hw)
Use the One Variable Statistical Calculator Applet at 
http://bcs.whfreeman.com/bps4e
   Compare boxplot with histogram:  longer boxplot sections mean lower histogram height and vice versa.

Sievers home  Math151-Fall08/Dayf3.htm  1:00pm 9/4/08
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.