MATH 251, P&S I, Fall 2007, Mon. Sept. 24, Day 14.After class.

Added problem, two way table solutions

Reading: Two-way Tables for Categorical Variables (used to be in Ch. 2): Sec. 9.1, pp. 582-591, 9.2 pp. 591-93 (examples 9.12, 13, 14 only), and 9.3 pp. 601-3 only. (Data analysis issues only:  How to summarize what the given data tell us.)  Next: back to Ch. 3
Hand in: 

(Review) Income depends on height?! Read the article and answer this.
If your browser doesn't get the link, it's at http://aurora.wells.edu/~srs/Math251-Fall07/tallpeoplewin.htm    a)What is "$789", and what kind of analysis did they do? 
  b)What does my footnote at the end tell you about the data that the article did not? 

(Review) p. 128, 2.29 dates' heights. Added after class
- - - - - - - - - - - - - - - - 
Two-way  tables These all by hand.  pp. 612ff.   Solutions now downloadable HERE
For 9.1, 9.2, 9.3 students, do the following parts by hand.  KEEP your results for the next section of HW, when you will use SPSS to do this and the rest of the parts. 
   9.1 Compute joint distribution (a),  and marginal distribution of age. (No graph.)
   9.2  Of 15-19 year olds, what is the proportion  who are full-time?
   9.3  Of full-time students, what is the proportion who are 15-19? Of part-time students, what is the proportion who are 15-19? 

9.9 a, b, c, d nonresponse
9.11 cocaine
9.7   Also, do the same for a 2 by 3 table.
9.23 gambling  Only this:  Turn the table here into a two-way table of Division vs. "wager/not wager" by doing the appropriate calculations
9.13 volunteer  Only this:  Can you extract from this table the proportions--or the counts--of men and women studied?
9.10 b only   career plans

Two-way  tables with SPSS.  pp. 612ff. 
SPSS Intro Handout, p. 6.  Re-create the bargraph shown on the top of the handout for pp.8-9.  File is in Class Material/Math251-IPS5e/SPSS for Class/Edup6IPSp8.sav, or here.  

IPS give no raw data sets to practice crosstabs on; all the data are pre-tallied. 
For 9.1, 9.2, 9.3 students, do all the parts as written, using SPSS (The file is mislabeled Eg_09_001.  Or here. This is the example on the handout. ADD your hand work from above, checking the SPSS results against your hand calculations. 

9.26 Web ref's (SPSS) Do everything they ask for except for the "significance test."
9.27 pet owners (SPSS) Do everything they ask for except for the "significance test."

9.24 a, b mutations  (SPSS)Fill in the blank row,  then type the data into SPSS in the appropriate form, and do part b.

Postpone this problem!  9.15 applicants (Simpson's paradox) (SPSS) See how much you can get SPSS to do.  (Hint. For c, use school as your "layer" variable)

Read, discuss 


Optional 
 


HW Questions?   Day 13
Quiz Wednesday: Possible items: Matching Normal quantile plot with histogram. Scatterplot stuff: description, regression line: finding a residual, r2 as proportion of explained variability, calculating a and b from means, s.d.'s and r; facts and cautions. NOT Transformations.
Data analysis projec
t, in pairs.  I will assign you to a pair, by Wednesday.  Email me any concerns about potential partners; also what evenings and afternoons etc. you CAN'T meet with a partner to work; I will try to achieve the greatest good for the greatest number. Handout

- - - - - - - - - - - - - - - - - - - - -
Relationships:  We know how to analyze/summarize quantitative vs. quantitative (scatterplot), and categorical  vs. quantitative (side-by -side histograms, stemplots, boxplots) .  Now
Categorical vs. Categorical  Sec. 9.1 "Two way tables"

 "Two way table"   "Contingency table"   "Crosstab(ulation)"  Hair color vs. Class year.
A thousand people are interviewed by the census bureau, and the results tabulated in this two way table.
Working Status vs. Sex.

Women Men Total
In Labor Force 350 450 800
Not in Labor Force 150 50 200
Total 500 500 1000

What is the "Percent of women in the labor force" ?
Calculate it Now. Write your answer down on a scrap of paper.  answer
When you write or see percents, be clear what is on the  bottom of the fraction (even if it takes longer to say)!!.
From the New Yorker magazine, traditionally the most literary and error-free of all, Feb.14/21, '05:

CORRECTION: The Mail of January 3rd contained the incorrect statistic that four-fifths of Bush voters identified moral values as the most important factor in their decision.  In fact, four-fifths of those identifying moral values as the most important factor of their decision were Bush voters.
Marginal distribution:  Distribution of one variable, ignoring/summingover the other.

Working Status 
In Labor Force 800 80%
Not in Labor Force 200 20%
Total 1000 100%

Sex 
Women Men Total
500 500 1000
50% 50% 100%

Conditional distribution:  Distribution of one variable, with the individuals being only those which satisfy a condition in the other variable.
For women, their conditional distribution as to working status  For men, their distribution as to working status.
            "Column %s"--columns add to 100%:  "conditional distributions of working status by sex ".

Women Men Total
In Labor Force 350/500 = 70% 450/500 = 90% 80%
Not in Labor Force 150/500 = 30% 50/500 = 10% 20%
Total 500/500=100% 500/500=100% 100%

For those in the labor force, conditional distribution as to sex.
    For those not in the labor force, conditional distribution as to sex.
           "Row %s"--rows add to 100%:  "conditional distributions of sex by working status."

Women Men Total
In Labor Force 350/800 = 43.8% 450/800 = 56.2% 800/800=100%
Not in Labor Force 150/200 = 75% 50/200 = 25% 200/200=100%
Total 50% 50% 100%

Graphs to compare proportions:  parallel sets of bar graphs, see text, p. 603,.
  Segmented (stacked) bar charts,  of  % (so total length the same)   (Redundant if there are only 2 segments)
 % Women O            % Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX  In Labor Force
OOOOOOOOOOOOOOOOOOOOOOOOXXXXXXXX  Not in Labor Force

Can do segmented bars of raw numbers, conveys different info:
 25 Women O            25 Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX  In Labor Force
OOOOOOOOXX                     Not in Labor Force

Categorical data with SPSS:
(p. 6, Intro handout)
   Pre-tallied?  Data> Weight Cases>Count to Frequency box.
   Analyze>Descriptive Statistics>Crosstabs.  Cells button.  (3-way? Third to Layer box)
  Graph>Interactive> Bar:   100% box for stacked percents,  one variable to horiz. axis, other to legend box, stacked or clustered .  Third to panel.
.Start here Wed..
Simpson's paradox:  An association or comparison that holds for all or several subgroups can reverse direction  when the data are combined into a single group.
Example from text.  p. 588 example 9.10
   SPSS output
Parallel Continuous situation: Cars.sav , like econ graduates problem (Ch.2).  (X=weight, Y=time to accelerate to 60.  Heavier car should be slower? Oops. Panel with #of cylinders, or color with horsepower.)


Sievers home  Math251-Fall07/Day2s14.htm   6pm    9/25/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
 

Women Men Total
In Labor Force 350 450 800
Not in Labor Force 150 50 200
Total 500 500 1000
Of people in the labor force, what percent are women?  350/800=43.75%
Of women, what percent are in the labor force? 350/500 = 70%
Of people, what percent are women in the labor force? 350/1000 = 35% back