Math 151 , Fall 2007, Day 3, Wed., Aug. 29After Class  Hit reload to get most current version

HW assignment  Day3  (From Moore unless otherwise noted.)
Needed forHW: Stemplot, rounding when there are more than 2 decimal places?  Handout says truncate  (round down= trim), Moore text says round to nearest.  Tukey, the inventor, said truncate; throw away the trailing digits; I agree!  This is supposed to be fast--rounding to nearest slows it down.  I encourage truncating but you can do it either way and be right.  If you truncate, your stemplot may look a little different from the text answers. But it will look like a histogram whose bin edges are at the whole numbers.  (A stemplot is hard for a computer to do, but some packages do. For them, rounding to nearest is easiest.  SPSS truncates, which is hard for a computer.)
Do you need to put the leaves in order?  NO, not if you just want the shapes.
Outliers--if they're quite far out, just write the numbers at the bottom (labeled High) or top (labeled Low) e.g. p.20, fig. 1.9, I might write "High 44.2" and stop the stemplot at stem 35.

Reading:  Finish Ch.1, + stemplot  handout, "Check" problems p. 24 1.14, 16 thru 22.
Timeplot, p.22-3.  You will need to be able to recognize cyles and trends, not make timeplot by hand. (We'll make them in SPSS later.)
Read Ch.2 thru p. 43, then thru p. 47.  Do "check" p. 56, 2.13, 14, 16 (mean/median) Ahead? 15,17,18 (5#summary/boxplot) Further: Finish Ch. 2.
Do the means and medians required here by hand (with a calculator).  
Hand in Friday.
p.14, 1.7 histogram bins: Use applet at  http://www.whfreeman.com/bps4e, as in class.  Help.

p. 31, 1.35 CO2 stemplot.  I would use whole tons as stems, tenths as leaves, see how it looks.  Truncate, don't round, for speed.   Don't bother to put leaves in order.
p. 31, 1.34 doctors.   Do a stemplot, not a histogram.  Use hundreds as stems, and split them as on p. 21.
p. 33,1.37 study time back to back, or do side by side on the same scale, like fig. 2.5, p. 55.  (Good stems: maybe by 2's:  80-90, 100-110, 120-130, etc., so stems are 0*, 0t, 0f, 0s,  0., 1*, 1t, 1f, 1s, 1., 2*, 2t etc., and 140 goes on the 1f stem as a 4, 210 goes on the 2* stem as a 1, 30 goes on the 0t stem as a 3, 0 goes on the 0* stem as 0, or is "low".  Splitting by 5's (p. 21) might be good enough. )  Notice the mental rounding of the responses, to quarter hours if not to ten minuteses. Makes "Granular" data.

p. 35, 1.43 Orange prices timeplot

Postpone Ch. 2 to Day 4
p.39, 2.1 Wood, mean Punch the 20 actual values into your calculator, adding and dividing by 20. 
A.  Find the (approximate) median for the data on Wood breakage, using the numbers in the stemplot on p. 21 (Fig.1.10.)  It's approximate because the stemplot data is rounded--Quick and Dirty is often sufficient!  Keep a copy for #2.5, next assignment.

p. 41, 2.4  Bonds Home runs Make a stemplot to put the numbers in order to find the medians.  For the means, just punch them in. (You can shorten the work by finding the sum of the 18 years excluding the 73, writing that down, and then adding the 73 to get the total for the 19 years.  Then divide the appropriate sums by 19 and 18.)
p. 41, 2.3,  p. 57, 2.23, 2.24  mean or median?


Part (only) of Day 4 HW: 
p. 45, 2.5 Wood again. Go ahead and use the stemplot figures.  Also make a boxplot.
p. 58, 2.28 U. endowments.  They mean, what do you have to count in to, in the list, to locate the mean and quartiles?
p. 58, 2.29 fruit eating
p. 58, 2.30 newborns. 
(I said I wouldn't make you make a histogram, but the data's already pre-binned, so do it here.) Also Describe the distribution--symmetric, skewed?
"Read," to discuss (be able to answer in class)


p.34,1.40 coins (skewed left)



Postpone Ch. 2 to Day 4
Ch. 2
p. 57-8, 2.25 Dr's salaries.  Look at the answers in the back for the mean and median.
 
 

p. 58, 2.26 Resistance, with Applet
 http://bcs.whfreeman.com/bps4e   Or use CD from book.  Choose "Statistical Applets",Mean &Median. Also, add more points (up to 50 total). Check out symmetric,  skewed, distributions with outliers.

Optional 
Postpone Ch. 2 to Day 4
 P. 59, 2.32 (mean/median play, with Applet)
p. 63, 2.42, 43 (more play, with pencil)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Introduce yourself to at least 2 other people in the class.  Check HW with neighbors.  #s on board?
Sign in.  Note Class members is up.  Check that you're correct.
Handing back Pretests, Muslims
MathClinic (Mac 120) hours this week only:  Matthew Thurs 3:30pm-5:30pm, Fri 1:30-3:30pm

Turning in HW out of class:  NOT Campus Mail!  Into 151 box outside my door, into yellow folder if it's there.

Ch. 1, Review.
Data:  Numbers (usually) in context:  What, Who (how many), Why?  When and Where? How?
            When?  Class may be changed already since this compilation../StudatFall07.xls  
Distribution of one variable:  what values, how many (or what proportion) of each.
Graphical summaries of data: Area represents proportion.
       Quantitative: Shape (symmetric, skewed (think smeared, or sliding) right or left.
               (
(&& bell-curve (Ch 3), J-shaped (is really skewed (fig.1.15a p.31)) )),
             (Humps:  uni- or bi- modal (multi-)   Two peaks = two "causes"?)   Outliers (giraffe with the zebras?)
              Center, spread--rough --specific measures next  Hand around:  "Living Histograms"
Pretest:  Restate #5 as histogram of 100 "5-volt" batteries tested for actual voltage.
   The proportion with voltage < 1 is 20. 
   The proportion with voltage < 3 is 60 That includes <1. So each rectangle represents 10.
        a) What proportion have voltage beween 1 and 3?  Count rectangles, OR subtract the part below 1 from the part below 3:  60 -20 = 40. 40%
        b) What proportion have voltage > 3?  Count rectangles, OR note that this is the whole 100 minus the part < 3:  100 -60 = 40.  40%

HW questions? (nonstemplot)  
Histogram can change somewhat depending on intervals you choose.
  Moore Applet (
http://www.whfreeman.com/bps4e)  . or use disk in book) One Variable Statistical Calculator, text pp. 11-13, Ta 1.1, % degreed (Drag histogram bars R/L to change "bins.  No "Data Sets" tab?  Try a different browser)

Stemplots (Stem-and-Leaf) are a powerful hand tool.  Tally, with value added.  Handout
     !!Unordered first,!! then ordered if necessary.  By tens, then split?   Truncate is faster!
        Back to back, comparing two groups.   (or side-by-side on same scale, cf. p55 fig. 2.5):
                 
       ../StudatFall07.xls
Data source? Lurking variables?  (pulse: stair climb: last term.  Missing data?)
Heights--two classes, Living histogram.  Variability happens.   Things settle down on average, BUT inferences are never certain.

    Statistics will give us a language for talking about uncertainty.

Choosing a display (by hand)
    A dot plot Day 2 is most useful for n = 3 to about 15-20, or when the data only fall on a few values (just stack the dots up).
    A stemplot is good for continuous data, smeared around; you can do 100 values in 3-5 minutes.


Time plot. (pp. 17-19) Time on horiz. axis, values on vertical.  trend? (general slope up or down). Cyclic?

  --Beware of extrapolation --predicting a time trend into the future.
  -- Research data: time, or order of taking measurements, is often a lurking variable.  Always do a time plot.

Start here on Friday:
Ch. 2:  Summarizing distribution info with numbers
Measures of middle (central tendency)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
    Mean (most common "average") "x-bar":  Take sum (aggregate) of all observations and divide by how many (n) . Formula p. 38. 
        Metaphors.  1) Center of gravity, balance point of histogram.
                2) Slice off bits from the big and add to the little till everyone has the same.
                    (Or "aggregate"--total-- it all and portion it out evenly.)
        Outlier or long tail will pull mean in that direction (think seesaw balancing)  "Sensitive" to outliers, skewness.
        Especially useful: 1) For symmetric, tidy distributions
            2) When metaphor 2 makes sense--looking for "fair share" of a total.
                    (1,1,2,4 cookies eaten by 4 people, mean = 2.     1,1,2, 12:  mean =4.)

     Median: half are bigger, half are smaller
        Point on histogram with half the area to the left, half to the right.
        Calculating:  Put observations in numerical order (stemplot!).   
                    (For our hand calculations:  Accept the small variation caused by truncation or rounding in the stemplot.  (Quick and dirty!))
                          Middle one if n is odd, or average the 2 middle  if n is even.
                Formula:  Count in how far?  (n+1)/2 places.  ( 14 items--> 7 1/2 places? go halfway =average the 7th and 8th observations)                         

        "Resistant to skewness and outliers"--trimming off ends will make little difference in median value.
        More "typical" than mean, especially if there is skewness or outliers.
     (Badly bimodal  distribution--"middle" doesn't mean much.)
    Symmetric distribution: mean = median
Author's website http://bcs.whfreeman.com/bps4e, or Applets on your CD.   "Statistical Applets", Mean &Median. Check out symmetric, skewed, distributions with outliers.
&& Other "averages"/ middles:   (Many:  e.g. trimmed mean: throw away, say top and bottom 5%, take mean of rest.)
 Midrange:  Point midway on the ruler scale between smallest and largest:  Min = 5, Max = 15, Midrange = (5+15)/2= 10.
      Highly sensitive, non-resistant, not too useful, but quick!.
 The Mode/modal class: (Mode: most "popular")  Group with the most individuals; peak of the histogram "curve".


Measures of
Spread
(dispersion, variability)  distributions with different spreads
    Range:  largest - smallest.   Resistant?  NO!  Two observations carry all the info; the rest could be anywhere.

Dot plots of 3 distributions, all with same range:
.        .
.        .
.        .
.        .
__________
                                   We need measures of spread that will better take into account  all the observations:
..........
__________
           Quartiles, five-number summaries, boxplot, InterQuartile Range.
    ..
    ..
.   ..   .
__________
                                      (Variance), Standard deviation.

Quartiles Divide data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th percentile.
             (2nd quartile= median = 50th percentile)
            3rd quartile Q3: 3/4 below, 1/4 above.  = 75th percentile.

Computation of quartiles:  Different texts, packages use different methods. (different last year!)
By hand: We'll use Tukey's quick and dirty: (he called them "hinges")
Take the two halves of the data you got from finding the median.  Find the median of each half, using the same rule as before.  (Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half. No problem.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2. 
1 3 | 5 6
Q1=(3+5)/2= 4.         8 8 |11 15. Q3= (8+11)/2= 9.5                                          
                                                           
1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
 Throw away the median.  Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  Continue as before. (This is a  dirty method because it gives the same quartiles for both these data sets.  Quick because computation is minimal and simple.)
1 3 | 5 6 6 8 8 | 11 20

Five-number summary:  min, Q1, Median, Q3, max.  (1, 4, 7, 9.5, 20  for the set of 8 above)
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)
      =The range of the middle half of the observations.  Resistant to outliers!

Sievers home  Math151-Fall07/Dayf3.htm  9pm 8/30/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.