Math 151 , Fall 2005, Day 4 Fri. Sept. 2 Hit reload to get most current versionAfter class

  Day 4 (Fri. Sept. 2): Reading: D&V Ch.3 pp. 18-22, 23-4 (Simpson's Paradox optional). Activstats 3-2.  D&V Ch5, AS Ch5 (D&V, and I, will do medians, quartiles, boxplots first, then mean/s.d.  AS does middles, then spreads, then boxplots.)

We'll start using SPSS next time,Monday--have class in the computer lab.  If you're computer-phobic, coming into Mac 101 and trying some ActivStats SPSS exercises ahead of time might help.  No downside except for time lost.  ActivStats only shows how to handle raw data, and Ch.3 text hw problems all involved pre-piled data, harder in SPSS.
 
Hand in Monday (copied from Day 3
Ch3p. 31, 2 cat. variables
17 Canadian languages
14  Cars  (for f, do a segmented bar graph of the cond. dist's of part e, as part of your discussion.)
24 a only Obesity 
26 Pet ownership  (Also:  what's most startling about these percents?) 
20 Prisons  (include one or more graphs)
Read, be able to discuss in class
CCh3  25 Family planning
21 Working Parents (What's "wrong" with the graph in the back of the book?)
Optional 
ActivStats  lessons on SPSS, in Mac 102: 
on AS pp.1-2, 3-1, 3-2, are a gentle introduction (using raw data).  4-2, 4-3 do continuous data.
Ch.3 
31 Simpson's paradox, UC

The rest of Day 4's original page is postponed!
Hand in Monday Later
Ch3p. 31, 2 cat. variables
17 Canadian languages
14  Cars  (for f, do a segmented bar graph of the cond. dist's of part e, as part of your discussion.) 
24 a only Obesity 
26 Pet ownership  (Also:  what's most startling about these percents?) 
20 Prisons  (include one or more graphs)
== == = = = = = = = = = = =
Ch5, p. 72   Start these, keep for Day 6 assignmt.
#3,  also make a boxplot.  ("No calculator" means no statistical calculator) 
15 Wines
16 Ozone
28 Population growth 
p. 107 (review) 18&19 Old Faithful
- - - - - - - - - - 
Mean/Median.  Will be Assigned when?
p. 72 7a,b,c,Payroll  Also, with c: What measure would be most useful if you wanted to use it to figure the total weekly payroll cost? 
6 Sick days
+ + + + + + + + + + 
Start now on a separate page; keep for the next assignments :
p. 72  19, 20 (no computations needed.  19 d may not be decidable from pictures.  Don't worry about it.)
5 Mistake (You can do parts now)
9 Standard deviation Tonight, make  dot plots of each pair on axes with the same unit size, find the mean of each set and mark it with a little ^ (like fig. 5.6 p. 64).  Notice this looks like a good balance point. Leave space to calculate  some standard deviations next time.
Read, be able to discuss
Monday Later
Ch3  25 Family planning
21 Working Parents (What's "wrong" with the graph in the back of the book?)
=== == == = = = = = = = = 
Start these, keep for Day 6 assignmt.
25 Caffeine
41 Eye & Hair color
31 Reading scores (f is harder; optional)
- - - - - - - - - - - 
http://www.whfreeman.com/scc or http://bcs.whfreeman.com/ips  Under Student Categories or Student tools,  choose "Statistical Applets", Mean &Median . (50 points max.)Check out symmetric, skewed, distributions with outliers. How far apart can you get the mean and median? 

13 Marriage age.  Ithaca Journal Jan 22, '05 had quiz answers: "How old is the average bride? 24.5 years.... How old is the average groom? 26.5 years." Give some reasons that could account for the big difference between these numbers and the graphed numbers

Optional 
ActivStats  lessons on SPSS, in Mac 102: 
on AS pp.1-2, 3-1, 3-2, are a gentle introduction (using raw data).  4-2, 4-3 do continuous data.
Monday:   Come to Computer Lab, Mac 101.  Bring text; disk to save on (containing your circle data if possible.)
Email list: Math151@wells.edu    If you didn't get the welcome message, email lists@wells.edu
Cluster:  Tell everyone your name, even if you think they know it.
   Check for Homework questions? Remaining #s on board.
HW:  PLEASE Label with Day #.  Please paperclip/staple (paperclips in envelope.  Reuse.)
Errors in my book: Ch4p43 middle; "for men...a narrower broader peak..."
    Ch5p65top: Needs square root sign over formula (later printings are ok(?))
HW questions?   What did you see, comparing the speed of your hands?    Results of Color vs. Hand:
Much variability in the proportions of Red,Blue,Green.  BUT
With larger numbers of observations, proportions settle down.

Categorical vs. Categorical (Color vs. Hand) Ch2, pp. 18-22
   Day 3
    From the New Yorker magazine, traditionally the most literary and error-free of all, Feb.14/21, '05:
CORRECTION: The Mail of January 3rd contained the incorrect statistic that four-fifths of Bush voters identified moral values as the most important factor in their decision.  In fact, four-fifths of those identifying moral values as the most important factor of their decision were Bush voters.

Start here Wednesday
Ch.5 Summarizing distribution info with numbers

Measures of middle (center)
        --Colloquially "average" can refer to any measure of middle, so watch out; be more specific.
   Mean (most common "average"):  Take sum of all observations & divide by how many (n) p. 63
    (Midrange:  Average the maximum & minimum values.  Very sensitive to outliers.)
  Median:half are bigger, half are smaller
      Point on histogram with half the area to the left, half to the right.

Calculating:  Put observations in numerical order (stemplot!).
      Middle one if n is odd, or average the 2 middle  if n is even.
Formula:  Count in how far?  (n+1)/2 places.  (7 1/2 places? go halfway =average the 7th and 8th observations. Book's method, p.58, is more complicated, same result.)
Spread (dispersion)
   (Standard Deviation s, p. 64.  Next lecture.)
  Range:  Max - Min.  (one number)  Very sensitive to outliers.
  Interquartile range IQR.
     QuartilesDivide data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th percentile.
             (2nd quartile= median = 50th percentile.  Percentiles divide into hundredths)
        3rd quartile Q3: 3/4 below, 1/4 above.  = 75th percentile.
Computation of quartiles:  Different texts, packages use different methods.
By hand: quick and somewhat dirty:
Take the two halves of the data you got from finding the median.  Find the median of each half, using the same rule as before.  (Detail.  IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half.  IF you had an odd number to start with, you have one in the middle, the median. In this case only, you use the median as part of both halves)
1 3 5 6 8 8 11 20, are n=8 observations.
    Median at (8+1)/2= 9/2=4 1/2th 1 3 5 6 | 8 8 11 20, M = 7
 8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15.  The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.  1 3 | 5 6Q1=(3+5)/2= 4.
8 8 |11 15. Q3= (8+11)/2= 9.5                                           1 3 | 5 6 | 8 8 | 11 20

1 3 5 6 6 8 8 11 20, are n=9 observations.
     Median at (9+1)/2=10/2=5th ; 1 3 5 6 6 8 8 11 20, M = 6
  The median joins both halves. Each half has (n+1)/2 values.
9+1/2 = 5 in each half: Halves are 1 3 5 6 6, and 6 8 8 11 15.  Quartiles are middle values of each half.
Q1=5, Q3= 8                                                                      1 3 5 6 6 8 8 11 20
(This is a dirty method because it doesn't "exactly" divide the data into quarters.  Quick? Yes.  Tukey did a variation on this, throwing away the median instead of giving it to each half.  He called them "Hinges" to avoid fights over the "quartile" name.  People who took the course out of Moore, Basic Practice, before this term, learned that method.)

Five-number summary:  min, Q1, Median, Q3, max.
     (1, 4, 7, 9.5, 20  for the set of 8 above, 1, 5, 6, 8, 20  for the set of 9 )
INTERQUARTILE RANGE = IQR= Q3 - Q1.
=The range of the middle half of the observations.  Resistant to outliers!
       9.5 - 4 = 5.5 for the set of 8.   8 - 5 = 3 for the set of 9.
Box (and whisker) plot:  Graphical form of five number summary.
    Especially good for comparing sets of data, conditioned on a categorical variable.
&&"Plain vanilla" Draw and label the numerical scale first.  Then mark the five numbers. Finish the picture.
The box spreads over the middle half (Q1 to Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3 to Max).  Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there.
Demonstration with set of 9. 1 3 5 6 6 8 8 11 20Direction of boxplot?  Vertical or horizontal is a matter of taste. I do horizontal, usually.

  |-------[ |   ]-----------------------|
0·········5········10········15·········20

"Showing outliers" Outliers can make a boxplot whisker extend deceptively beyond the bulk of the data.
      Make the whiskers to the last item in the "main mass" of the data.
       Put a dot or a star for each outlier,  beyond the whisker end.
   How do we decide what's an outlier?  (Rule of thumb; esp. for computers.)
      Fence: Define "outlier" as a value farther out than 1.5 IQR  from the Quartiles.
          (Q1 - 1.5 IQR is lower fence, Q3 + 1.5 IQR is upper fence.
                For the set of 9, 1.5 IQR = 4.5.  Fences are 5 - 4.5 = .5, and 8 + 4.5 = 12.5.
                   So 20 lies outside the fence, and the whiskers & box  should go from 1 to 11 (largest inside the fence)
        (Dot or *?  Tukey:  Dot ·between 1.5 and 3 IQR's out, * if more than 3 IQR's out. By hand, I don't care.)
  |-------[ |   ]-----|                 *
0·········5········10········15·········20
<>Boxplots shine at comparing distributions conditioned with several categories .


Sievers home  Math151-Fall05/Dayf4.htm  2:15pm 9/2/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.