Math 151, Fall 2005, Day 3, Wed.Aug 3,Hit reload to get most current versionAfter class

Day 3 (Fri. Feb. 4): Reading:  Reread D&V Ch.4 thru p. 46  (Re-expressing p. 44 optional), ActivStats 3-1, 4 all.New, D&V Ch.3 pp. 18-22, 23-4 (Simpson's Paradox optional). Activstats 3-2.  Ahead  D&V Ch5, AS Ch5 (D&V, and I, will do medians, quartiles, boxplots first, then mean/s.d.  AS does middles, then spreads, then boxplots.)

We'll start using SPSS Monday--have class in the computer lab that day.  Everything by "hand" till then!
Needed for HW: Stemplot, rounding when there are more than 2 decimal places?  Handout says truncate (round down), D&Wtext says round to nearest.  Tukey, the inventor, said truncate; throw away the trailing digits; I agree.  This is supposed to be fast--rounding to nearest slows it down.  I encourage truncating but you can do it either way and be right.  If you truncate, your stemplot may look a little different from the text answers. (A stemplot is hard for a computer to do, but some packages do. For them, rounding to nearest is easiest.  SPSS truncates, which is hard for a computer.)
Hand in (all from D&V text)  
Ch4 p 50  (repeated from day 2)
Creating:
 12 bird species (10's as leaves, split 5 leaves per stem is good.  Big outliers)
 18 Marijuana (stem &leaf)
Describing: 
 5 Heart attack stays
 9 Wineries: Make a) "under 60 acres". Book's answer to b is screwy, why?
 14 Pop. growth
More Ch4: #4 more shapes
A. Use your circle data and make a back-to-back stemplot of Time (first column) for your two hands.  Write a few sentences comparing the speed performance of your hands.

Postpone to Day 4: Ch3p. 31, 2 cat. variables
17 Canadian languages
14  Cars  (for f, do a segmented bar graph of the cond. dist's of part e, as part of your discussion.)
24 a only Obesity 
26 Pet ownership  (Also:  what's most startling about these percents?) 
20 Prisons  (include one or more graphs)

Read, be able to discuss in class
Ch4
Creating: 
 17 Acid rain Look at answer, note stems used 
Describing:
 7 Cereal sugar
 6 Emails (I think the answer book does a crummy job)
 19  Hosp. stays Do a only.  Read answer to c.  Most mothers & babies go home in 2 days now.  What W's are crucially omitted here? 
 
 

Postpone to Day 4:Ch3  25 Family planning
21 Working Parents (What's "wrong" with the graph in the back of the book?)

Optional 
 
 
 
 
 

Postpone to Day 4:Ch.3 
31 Simpson's paradox, UC

Sign in.  Sign your picture.  Fill in Day 2's attendance diagram.
Friday: here in classroom.  Monday:  Come to Computer Lab, Mac 101.  Bring text; disk or usb to save on (containing your circle data if possible.)
Class Membersare posted (link off main page too).  Check that yours is right, email me.
Cluster  in 3's, 4's or 5's. Check for Homework questions? Remaining #s on board.
Each group fill in summary sheet of Circle colors/hand.  Pool separate results. (Hand Summaries in)
    (Also--hand in collected questions-you'd-like-to-answer about circle exp't, from last time)
Pretests:  Mixed:  Order of op's-- Please Excuse My Dear Aunt Sally:  Parentheses rule; Exponents,  x, /, +, -.
   Take it to math clinic, anyone, ask for problems like the ones you missed.

Distribution of one variable:  Area represents proportion.

    Quantitative: Histogram, Stem-and-leaf (Stemplot), Dotplot
      (I will only require you to read, not make histograms by hand. You'll Make stemplots and dotplots by hand)
       Pretest:  Restate #5 as histogram of 100 "5-volt" batteries tested for actual voltage.
              The proportion with voltage < 1 is 20%.  The proportion with voltage < 3 is 60%.
               a) What proportion have voltage beween 1 and 3?  b) What proportion have voltage > 3?

   Stem-and-Leafs are a powerful hand tool.  Handout
            Unordered first, then ordered if necessary.  By tens, then split?  (Ex.:Class data)
        Back to back, comparing two groups. (p.51, #14)

Choosing a display (by hand):
    A dot plot is most useful for n = 3 to about 15-20, or when the data only fall on a few values (just stack the dots up).
    A stemplot is good for continuous data, smeared around; you can do 100 values in 3-5 minutes.

Describing:  Pattern-- and deviations from it
   Shape (symmetric, or skewed (think smeared, or sliding) right or left),
        (Humps: uni- or bi- modal (multi-)   Two humps = two "causes"?)
        Some special shapes:  uniform (p. 40)  && J-shaped (#6 p.50) bell-shaped (Ch 6)
   Center, Spread (roughly now, Ch.5 numerically)
   Outliers,  gaps ? (different groups, sources?)   Look at pulse data.  &&"Lurking  variable"

What do we see?  What can we infer? (Introduction)
    Data source? Lurking variables?
    Variability happens.  Things settle down on average  (Pooled data on colors)
       BUT conclusions are never certain.
    Statistics will give us a language for talking about uncertainty.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Start here Friday
So far:  One Categorical variable. One Quantitative variable.
  A Quantitative vs. Categorical with 2 values (backtoback stemplot, parallel histo's or dots)
Categorical vs. Categorical (Color vs. Hand) Ch2, pp. 18-22

 "Two way table"   "Contingency table"   "Crosstab(ulation)s" (color vs. hand)
A thousand people are interviewed by the census bureau, and the results tabulated in this two way table.
Working Status vs. Sex.
Women Men Total
In Labor Force 350 450 800
Not in Labor Force 150 50 200
Total 500 500 1000

What is the "Percent of women in the labor force" ?
Calculate it Now. Write your answer down on a scrap of paper.
When you write or see percents, be clear what is on the  bottom of the fraction (even if it takes longer to say)!!.

Marginal distribution:  Distribution of one variable, ignoring/summingover the other.

Working Status
In Labor Force 800 80%
Not in Labor Force 200 20%
Total 1000 100%

Sex
Women Men Total
500 500 1000
50% 50% 100%

Conditional distribution:  Distribution of one variable, with the individuals being only those which satisfy a condition in the other variable.
For women, their conditional distribution as to working status  For men, their distribution as to working status.
            "Column %s"--columns add to 100%:  "conditional distributions of working status by sex".
Women Men Total
In Labor Force 350/500 = 70% 450/500 = 90% 80%
Not in Labor Force 150/500 = 30% 50/500 = 10% 20%
Total 500/500=100% 500/500=100% 100%

For those in the labor force, conditional distribution as to sex.
    For those not in the labor force, conditional distribution as to sex.
           "Row %s"--rows add to 100%:  "conditional distributions of sex by working status."
Women Men Total
In Labor Force 350/800 = 43.8% 450/800 = 56.2% 800/800=100%
Not in Labor Force 150/200 = 75% 50/200 = 25% 200/200=100%
Total 50% 50% 100%

Graphs to compare proportions:  parallel pies, see text.
  Segmented (stacked) bar charts,  of  % (so total length the same)
 % Women O            % Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX  In Labor Force
OOOOOOOOOOOOOOOOOOOOOOOOXXXXXXXX  Not in Labor Force

&&Can do segmented bars of raw numbers, conveys different info:
 25 Women O            25 Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX  In Labor Force
OOOOOOOOXX                      Not in Labor Force

Independence:  two variables are independent when the (conditional) distribution of one is the same for all categories of the other.  Working status is clearly not independent of sex.
Circle experiment: Is color independent of hand?  (Usually: Do we have enough data to tell whether it's true in general?)


Sievers home  Math151-Fall05/Dayf3.htm  1pm 8/31/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
 
Women Men Total
In Labor Force 350 450 800
Not in Labor Force 150 50 200
Total 500 500 1000
Of people in the labor force, what percent are women?  350/800=43.75%
Of women, what percent are in the labor force? 350/500 = 70%
Of people, what percent are women in the labor force? 350/1000 = 35% back