Day 3 (Fri. Feb. 4): Reading: Reread D&V Ch.4 thru
p. 46 (Re-expressing p. 44 optional), ActivStats 3-1, 4 all.New,
D&V Ch.3 pp. 18-22, 23-4 (Simpson's Paradox optional). Activstats 3-2.
Ahead D&V Ch5, AS Ch5
(D&V, and I, will do
medians, quartiles, boxplots first, then mean/s.d. AS does middles,
then spreads, then boxplots.)
We'll start using SPSS Wednesday--have class in the computer lab.
| Hand in (all from D&V text)
Ch4 p 50 (repeated from day 2) Creating: 12 bird species (10's as leaves, split 5 leaves per stem is good. Big outliers) 18 Marijuana (stem &leaf) Describing: 5 Heart attack stays 9 Wineries: Make a) "under 60 acres". Book's answer to b is screwy, why? 14 Pop. growth More Ch4: #4 more shapes A. Use your circle data and make a back-to-back stemplot of Time (first column) for your two hands. Write a few sentences comparing the speed performance of your hands. Ch3p. 31, 2 cat. variables 17 Canadian languages 14 Cars (for f, do a segmented bar graph of the cond. dist's of part e, as part of your discussion.) 24 a only Obesity 26 Pet ownership (Also: what's most startling about these percents?) 20 Prisons (include one or more graphs) |
Read, be able to discuss in class
Ch4 Creating: 17 Acid rain Look at answer, note stems used Describing: 7 Cereal sugar 6 Emails (I think the answer book does a crummy job) 19 Hosp. stays Do a only. Read answer to c. Most mothers & babies go home in 2 days now. What W's are crucially omitted here? Ch3 25 Family planning
|
Optional
Ch.3 31 Simpson's paradox, UC |
Pretests:
Mostly good: Order of op's Please Excuse My Dear Aunt
Sally: Parentheses rule; Exponents, x, /, +, -.
Take it to math clinic, anyone,
ask for problems like the ones you missed.
Distribution of one variable: Area represents proportion.
Quantitative: Histogram,
Stem-and-leaf
(Stemplot), Dotplot
(I will only require
you to read, not make histograms by hand. You'll
Make
stemplots
and dotplots by hand)
Pretest:
Restate #5 as histogram of 100 "5-volt" batteries tested for actual voltage.
The proportion with voltage < 1 is 20%. The proportion
with voltage < 3 is 60%.
a) What proportion have voltage beween 1 and 3? b) What proportion
have voltage > 3?
Stem-and-Leafs
are
a powerful hand tool. Handout
Unordered first, then ordered if necessary. By tens, then split?
Back
to back, comparing two groups. (p.51, #14)
Choosing a display (by hand):
A dot plot is
most useful for n = 3 to about 15-20, or when the data only fall on a few
values (just stack the dots up).
A stemplot is
good for continuous data, smeared around; you can do 100 values in 3-5
minutes.
Describing: Pattern-- and deviations
from it
Shape (symmetric,
or skewed (think smeared, or sliding) right or left),
(Humps:
uni- or bi- modal (multi-) Two humps = two "causes"?)
Some special shapes:
uniform (p. 40) && J-shaped (#6 p.50)
bell-shaped
(Ch 6)
Center, Spread
(roughly now, Ch.5 numerically)
Outliers, gaps ?
(different
groups, sources?)
Look at pulse data. &&"Lurking variable"
What do we see?
What can we infer?
(Introduction)
Data source? Lurking variables?
Variability happens.
Things settle down on average (Pooled data on colors)
BUT conclusions
are never certain.
Statistics will give us
a language for talking about uncertainty.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~ ~ ~ ~
So far: One Categorical variable.
One Quantitative variable.
A Quantitive vs. Categorical with 2 values
(backtoback stemplot, parallel histo's or dots)
Categorical vs. Categorical
(Color vs. Hand)
Ch2, pp. 18-22
"Two way table" "Contingency
table" "Crosstab(ulation)s"
(color vs. hand)
A thousand people are interviewed by the census bureau, and the results
tabulated in this two way table.
Working Status vs. Sex.
| Women | Men | Total | |
| In Labor Force | 350 | 450 | 800 |
| Not in Labor Force | 150 | 50 | 200 |
| Total | 500 | 500 | 1000 |
What is the "Percent of women in the labor force" ?
Write your
answer down on a scrap of paper.
When you write or see percents, be clear what
is on the bottom of the fraction (even if it takes longer to
say)!!.
Marginal distribution: Distribution of one variable, ignoring/summingover the other.
|
|
Conditional distribution: Distribution of one variable,
with the individuals being only those which satisfy a condition in the
other variable.
For women, their conditional distributionas
to working status; For
men, their distribution as to working
status.
"Column %s"--columns add to 100%: "conditional distributions
of working status by sex".
| Women | Men | Total | |
| In Labor Force | 350/500 = 70% | 450/500 = 90% | 80% |
| Not in Labor Force | 150/500 = 30% | 50/500 = 10% | 20% |
| Total | 500/500=100% | 500/500=100% | 100% |
For those in the labor force, conditional
distribution as to sex.
For those not in the labor
force, conditional distribution as to sex.
"Row
%s"--rows add to 100%: "conditional distributions of sex by working
status."
| Women | Men | Total | |
| In Labor Force | 350/800 = 43.8% | 450/800 = 56.2% | 800/800=100% |
| Not in Labor Force | 150/200 = 75% | 50/200 = 25% | 200/200=100% |
| Total | 50% | 50% | 100% |
Graphs to compare proportions: parallel pies, see
text.
Segmented (stacked) bar charts, of % (so
total length the same)
% Women O
% Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX In Labor Force
OOOOOOOOOOOOOOOOOOOOOOOOXXXXXXXX Not in Labor Force
&&Can do segmented bars of raw numbers, conveys different info:
25 Women O
25 Men X
OOOOOOOOOOOOOOXXXXXXXXXXXXXXXXXX In Labor Force
OOOOOOOOXX
Not in Labor Force
Independence: two variables are independent when the
(conditional)
distribution of one is the same for all categories of the other.
Working status is clearly not independent of sex.
Circle experiment: Is color independent of hand? (Do we have
enough data to tell whether it's true in general?)
| Sievers home | Math151-Sp05/Days3.htm | 10:40am | 2/4/05 |
| Women | Men | Total | |
| In Labor Force | 350 | 450 | 800 |
| Not in Labor Force | 150 | 50 | 200 |
| Total | 500 | 500 | 1000 |