Math 151 , Spring 2007 Wednesday Day 20, March 14
Hit reload....After
classcorrected 3/15
HW: (Re) read
pp.
133-136. Skip Chapter
6. Read p. 186. Read Chapter 8. Read p. 200 (Other designs)
last. Check p. 206, 8.17-22, 26 at first., then 8.23-25 with
Table B. Ahead, Chapter 9.
| Hand in Friday
p.
136 5.13 hospitals: big = bad?
p. 192, 8.1, 8.2, 8.3 expt, obsn
p. 207, 8.27 Alcohol & heart attacks
p. 194, 8.4, 5, 6 population/sample
Postpone the rest: (These will
probably be Friday day 21's assignment)
p. 195, 8.7 Sampling badly on campus
- - - -
p. 199 8.9 Apartment living, SRS. Use Table B.
p. 209, 8.36 Area code sample, SRS Use Table B.
p. 211, 8.45 random digit dialing
p. 210, 8.41 random digit characteristics
p.209-10, 8.38 b only Traffic lights
p. 208, 8.30 movie viewing
|
Read, to discuss
p.
136, 5.12 lurking variables
p. 208, 8.29 safety of anesthetics
p. 192 8.3 TV & aggression (lurking)
Postpone the rest:
p.195, 8.8 more Sampling badly on campus
p. 211, 8.47 guns
- - - -
p. 204, 8.14, 8.15 biases.
p. 208, 8.31 world affairs
p. 211, 8.46 wording survey questions
|
Optional
p. 136, 5.11, lurking variables
- - -
Postpone the rest:
p. 209, 8.35 Use table B (more
practice)
p. 209, 8.34 seat belt use
|
Exams not finished. Friday I hope(!!).
= = = = = = = = = = = = = = = = = = = = = = = = =
= = = = = = =
Pick a digit (from
0,1,2,3,4,5,6,7,8,9).
Write it down. Write it to the left of your name on the sign in
sheet .
HW questions?
--Don't trust just summary data. Need to see the
scatterplot to see how suitable the summary numbers are.
("Anscombe's
quartet", Moore p.142, 5.34) (Overhead
slide. You can reconstruct these pictures using
SPSS and Moore's problem, if you like.)
--Extrapolation. Watch out for it.
--Residuals plot: Takes away the "linear" part of the
relationship; sometimes other structure can be seen.
--Examples from HW, involving extrapolation and residuals plots:
ex 7-28 Soap data, output
ex5-9 Farm population
data, outputYour
computation of the predicted value for year 2000 may differ quite a bit
from the book's; it's roundoff error: This happens because
the x-values are so big, in the thousands, that the roundoff error can
be in the ten's.
Ours: 1166.93 -.59*2000 - 1166.93- 1180 = -13.07.
Theirs: 1166.93-.5868*2000 = 1166.93-1173.6 = -6.67
Finishing Ch. 5:
"Lurking" variable:
has an important effect, but not one of the variables studied.
Meatloaf shrinkage vs.
placement
in oven? (cooking thermometer/not had greatest influence)
Time sequence of
observations
a common one. (Learning, tiring, aging)
The trouble with lurking
variables is that by definition you don't know they're there.
Look
behind every tree.
Association does not
imply
causation
Strong association/correlation between A and B could be:
A causes B/ B causes A/ C
causes both
A and B (lurking C)/ just Chance that they go together in this
data
set.
Direction? Rooster causes sun to rise by
crowing?
Both variables "caused" by a lurking variable?
Lurking variable can be part of the cause
--Women with a history of heavy antibiotic use have higher rates of
breast cancer.
--Baby rats whose mothers licked and groomed
them more grew up to be more exploratory, social, less
timid.
Cause? Effect? How to tell?
Establishing that x "causes" y:
difficult:
Best: Do an experiment
in which we change x, keep lurking variables under control. (E.g.
Rats.
Ch.9)
Otherwise: Strong
association. Consistent over many studies. Higher x-->stronger
y.
X precedes y in time. A plausible mechanism exists (parallel
studies?)
Generalize rat grooming to humans?
E.g.Partially hydrogenated oils = "trans fats" --> heart
disease?
Homocysteines -->
heart
disease?
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Chapters 1 through 5 have covered analyzing
data
that was given to us--what it said about itself.
Informally, develop
guesses,
suspicions, hypotheses about the world the data came from.
From Exploration to Inference p. 186
Ch.
8&9: Producing Data:
Aim:
create data sets that will allow us to make inferences to a
larger
world than just the data we have.
Observational
Study: Observes individuals, measures variables, does not
influence the responses. (ch.8)
Sometimes observe individuals who are (more or less) conveniently at
hand, or, better,
Take Sample from a population, examine it....
(ch.8)
Experiment:
Imposes
treatment
on individuals, to see how the treatment
influences the response.
(ch.9)
Confounding: Two variables
(explanatory
or lurking) are confounded when you can't sort out their
effects
on a response variable. (Rats: Mothers' grooming causes
sociability, or inherited sociability from mothers who like to groom?).
Ch. 8 p. 192ff. Sampling
>>Population: Entire group that we want information
about.
>>Sample: The part of the population we actually
examine.
Hope:
Sample will be
representative
of the population.
>> Sampling design: Describes exactly how sample is
to be chosen from population.
(SAMPLING) BIAS: The design of a study is biased if
it systematically favors certain outcomes.
Start here Friday:
Check our "sample" of digits
Sample survey: (attempt to) choose a representative
sample from a large, varied population. Not Easy!
Some issues: What population do we want
to understand? What exactly do we want to measure?
Non-probability samples (sampling badly):
- Voluntary response sample ( from a general appeal--Ann
Landers,
Cosmo, Hite Report, call-in and "instant polls" ): biased
toward strong
opinions,
esp. negative. (Contenteds don't bother.) www.vote.com,
- Convenience sample (whatever/whoever looks good, is handy) Unlikely
to be representative. Digits. Math 151 is a convenience
sample from Wells for heights, shoe size, major...
Simple Random Sample (SRS)
of size
n: n
individuals
chosen in such a way that every possible set of n
individuals has an equal chance of being chosen.
A probability sample (p.200).
HOW? A chance mechanism: Cards, dice, computer program, or
Table of random digits (Simulates rolling a die with 0,1,....9,
over and over...) (Table B, p.686)
Every digit, every sequence of digits, is equally
likely to be "next" in any direction.
To use: label everyone in the population
with a number.
Important: Every labeling number
needs the
same
number of digits.
To label 9 people, use the labels 1,2,3,....9
(1-digit
chunks)
To label 15 people, use the labels 01, 02, ...10,
11, ...15 (2-digit chunks)
To label 125 people, use the labels 001, 002, ...
124, 125 (3-digit chunks)
Pick a place (at random) in the table, start reading
across in that size chunk. Get n eligible
numbers (discard repeats)
Read Row 150: 07511
88915
41267 16853 84569 79367 ..
From 9 people, a sample n = 5: 0,7,
5,
1,
1, 8, 8, 9,
1, 5, 4, (sample is individuals 7, 5, 1, 8, 9)
From 15 people, a sample 07,
51, 18, 89, 15,
41, 26, 71, 68, 53, 84, 56, 97, 93, 67.... keep reading,
go to next line (or back to top line) if you need
more. Individuals 7, 15,...are chosen using this line.
From 125 people, a sample 075,
118,
891, 541, 267, 168, 538, 456, 979, 367...keep reading.
Individuals
75, 118, ...
Why the same number of digits in each
label?
Each individual 3-digit chunk is as likely as any other 3-digit
chunk.
But a 1- or 2-digit chunk is more likely than any 3-digit chunk. So
2 will come up more often than 12, but 02 will come
up
just as often as 12.
Why across? For consistency
on
HW, go the way they say (so you get the answer in the book).
In practice, you can read up, down, backwards, as long as you decide
beforehand, and don't change in the middle of choosing the sample.
+ + + + + + + + + + + + + + + + + +
Some more sources of bias, even
in probability samples (p. 201-3):
**Undercoverage: Some groups in the population are left
out, or slighted, in the process of choosing the sample.
One possible source of undercoverage: Sampling
frame: Moore p. 211 problem 8.45: the group from which
the sample is actually chosen--as different from the
"population"--the
group you want information about. The sampling frame is often,
unfortunately,
smaller than the population. (Often a "list" that already
exists.) The sample is (usually
much) smaller than the sampling frame.
** "Chosen" sample may not turn out to be actual sample, if some
individuals
don't respond--"Nonresponse".
**Response bias Lies, bad memory, pleasing interviewer
(nutrition
surveys) Interview technique
**Wording of questions Confusing? Leading? Limiting choices?
This page belongs to Sally Sievers who is solely
responsible
for its content. Please see our statement
of responsibility.