MATH 251, Probability and Statistics I, Fall 2011, Wed. Sept. 14
Day 9.After Class.corrected
HW numbers, twice
HW Day 9, Ch. 2, Intro, then
2.1(scatterplots)--postpone Transformation pp.89-1, Next, 2.2
(Correlation) Memorize formula
for r (p. 102). Then Normal quantile plots, 65-67.
Handouts SPSS
Scatterplots, Scatterplots (Governor's
Salary) HW (Optional,
Normal Practice)
|
QUIZ on Normal
Distribution MONDAY
= = = = = = = =
Reading and questions: due Wed. Day 12 (a week)
(Why does the mean of an IQ test trend upward over the years? Cf.
"Old" IQ test, mean 110 (Moore ed. 1)& IPS7e 1.30-31, WAIS, mean
100.)
"None of the above" article by Malcolm Gladwell on reserve or
PDF link
html
link .
Questions: 1) What is the Flynn effect?
2) What is a likely reason for it?
= = = = = = = = = =
Chapter 2:
p.. 81-2, 2.2 & 2.3 categorical <-->
quantitative
p. 82, 2.4 explanatory?
response? What are you after?
Scatterplots (ch2.1) p. 94ff,
mostly
Continue to watch for data variables with the wrong Measure in
SPSS.
Using SPSS: Handout: Scatterplots,
and Scatter HW sheet
*
*On a separarate sheet: Begin the Governors'
Salaries HW You can do 1-5 now. KEEP till all
questions have been answered. file for handout: govsal_vs_pay.sav
p. 88, 2.9 coffee drinks BY HAND, just this
one, for refresher.
p. 88, 2.10 (SPSS) debt (NOT MAKING a scatterplot here--other
questions).
p. 88, 2.11 (SPSS) bigger debtors too. (This
was just Before the Great Recession; U.S. debt shown was accumulated
largely after the Bush tax cuts of 2000) Answer book shows
bigger ones with different symbols. Don't bother with that, but
Label the 5 added countries.
2.35 (SPSS) body mass M//F (put sex in the Set Markers by
box) Turn the page for (b)
2.36 (SPSS) icicles .
You'll come back to this dataset.
2.31 (SPSS) merlin falcons To
plot Mean response: In Chart Editor, Elements> Interpolation
Line (Big SPSS handout p.10 top,
"Timeplot") gives means line. Sometimes by hand it's convenient
to use medians instead of means; easy to estimate in a picture (middle
dot, or half way between the 2 middle dots). BY HAND, Mark the
medians for each "pairs" level and connect with a dotted line.
How different are the two lines?
POSTPONE THE REST
Correlation 2.2 p. 101ff. (top
of Scatterplots
handout p. 4 )
Governors' Salaries Scatter
HW sheet: add #6 to 1 thru 5, keep it.
Hand in the rest:
2.53 (SPSS)dates' heights.
2.42 (SPSS) strong assoc., no correlation
2.54 (SPSS) unsuitable for correlation
2.52 (SPSS)bio vs. physics Do 2.32 (arabadopsis)
also. (you did 2.36(icicles). To get the separate correlations for the
2 icicle groups, you need to select each subgroup (See Scatterplot
handout p. 4 top, SPSS intro p. 5 bottom)
2.59 teacher ratings--misuse of concept
|
Read, discuss
2.29reading/IQ
2.30 estimate/ actual reading ability. (Note "granularity"
because of limited estimate choices.)
POSTPONE THE REST
Correlation; using Applet:
Important!
2.55, 2.56
2.60 wrong uses
|
Optional
|
Questions on HW?
SPSS? Day 6
Normal distribution? Day 8.
Links for more Normal Table problems (optional):
Templates, Practice (like the questions I like to
ask)
C) Surprising difference in tails? Writeup
D) --Also, that pregnancy lasting 310 days:." Dear
Reader: The average gestation period is 266 days.
Some babies come early. Others come late. Yours was late. The question here is not whether the baby was late.
That fact is already known. At issue is the credibility of the length
of the delay. Ten months and five days is approximately 310 days, which
means that the pregnancy exceeded the norm by 44 days. [How unusual is
that?] --What proportion of pregnancies last 310 days or
more? z = (310-266)/16 = 44/16= 2.75. Area above 2.75 =
.0030.
3 in a thousand pregnancies
last that long. Pretty rare. Is "San
Diego Reader" one of the 3-in-a-thousand, or is she lying?
(this is the kind of question we deal with in Significance Testing,
part 3 of the course).*
Quiz MONDAY: Normal distribution and
tables. 68-95-99.7% rule, and problems like those on
the Normal Probability Practice Handout .
I will give you copies of Table A; if you have a calculator that does
this type of problem, you must show all the work (x <-->z,
numbers from the paper table needed) to demonstrate that you can do the
problem by hand.
Postpone
till after Sec. 2.1:
How do you know if it's safe to treat a data set as if it comes
from a Normal Density model?
Let SPSS draw a normal curve with the same mean and s.d. over its
histogram; or use a Normal quantile
plot: Handout (is notes) forthcoming.
= = = = = = = = = = = = = = = =
Relationships:
(Ch 2 Intro and
Sec. 2.1)
Two variables recorded on the same cases:
"Associated" = knowing the value of one variable (the "explanatory"
one) tells you something about the other (the "response"
variable)
Nurses' salaries, Workplace
(hospital/office)
Quantitative on Categorical: Done: back-to-back (side by
side) stemplots, boxplots together, histograms on same axes...
Categorical on Categorical: Sec.. 2.5
Handout: Scatterplots,
and Scatter HW sheet
(mostly repeating handout output. Do first.)
file for handout: govsal_vs_pay.sav
Two Related quantitative variables
"Just Related" or "explanatory &
response?"
(scatterplots)
explanatory = independent = "x" = horizontal axis ( = "cause", sometimes but not always)
response = dependent=
"y" = vertical
axis = ("effect ")
(Living histograms: Height vs. weight, Height vs. gpa)
Discussing Scatterplot:
General
Pattern
Deviations
Clusters?
Outliers? (label if possible)
Shape (linear, curved, ...?)
Strength of relationship (how
unfuzzy) "Weak,
moderate, strong"
Direction
Positively associated: y increases
as x increases (generally).
Negatively associated: y decreases
as
x increases.
Mark subgroups differently to do comparisons. (Subgroups
defined
by categorical variable, like Sex, Region of country)
Some scatterplot data: educ-v-mortality.sav
govsal_vs_pay.sav
is the file
used for most of the handout.
Got to here Wednesday
Correlation (Sec.2.2)
CD or Website, http://bcs.whfreeman.com/ips7e,
Choose "Statistical Applets",
Correlation and Regression.
Play with data points, observing the Correlation Coefficient.
Check in the "Show Mean X &Mean Y lines"
box.
See how much is in each quadrant.
SPSS: back page (p4) top, Scatterplot handout.
Analyze>Correlate>Bivariate, move both variables
Section 2.2
The
correlation
coefficient r is a numerical measure for how strongly
linear
(and in what direction) the relationship is. Doesn't
substitute
for a scatterplot.
- Measures relationship--same whichever variable is on the
x-axis
- "Correlation" --only for 2 quantitative variables
- "Unitless"--original measurment units are "standardized out"
- Sign of correlation coefficient matches direction of relationship
- Between -1 and +1. 0: no linear relationship, + or -1:
perfect
straight
line.
- Does NOT give info about curved relationships.
- NOT resistant to outliers--quite sensitive.
**Bear in mind
that there were around 400,000
births in California in
1970. (I'm guesstimating. There were 605,694
births
in 1990, and the population of California in 1970 was 2/3 of that in
1990).
So a 3-in-a-thousand event would occur in 3x400 = 1200
births--there
would be 1200 women in San Diego Reader's position (many of whom
wouldn't
know it.) Rare events DO happen--it's not really fair to only
notice
and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical
intervention (ultrasound, planned inducement of labor or Caesarian,
etc.) it often gets now.
This page belongs to Sally Sievers who is solely
responsible
for its content. Please see our statement
of responsibility.