### MATH 251, Probability and Statistics I, Fall 2011, Wed. Sept. 14 Day 9.After Class.corrected HW numbers, twice

HW Day 9,  Ch. 2, Intro, then 2.1(scatterplots)--postpone Transformation pp.89-1, Next, 2.2 (Correlation) Memorize formula for r (p. 102). Then Normal quantile plots, 65-67.

Handouts   SPSS Scatterplots,   Scatterplots (Governor's Salary) HW   (Optional, Normal Practice)

Questions on HW?
SPSSDay 6
Normal distribution? Day 8Links for more Normal  Table problems (optional)Templates,    Practice (like the questions I like to ask)
C)  Surprising difference in tails?  Writeup
D) --Also, that pregnancy lasting 310 days:." Dear Reader: The average gestation period is 266 days. Some babies come early. Others come late. Yours was late.  The question here is not whether the baby was late. That fact is already known. At issue is the credibility of the length of the delay. Ten months and five days is approximately 310 days, which means that the pregnancy exceeded the norm by 44 days. [How unusual is that?] --What proportion of pregnancies last 310 days or more?  z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.
3 in a thousand pregnancies last that long.  Pretty rare.  Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course).*

Quiz MONDAY:  Normal distribution and tables.  68-95-99.7% rule, and problems like those on the Normal Probability Practice Handout .  I will give you copies of Table A; if you have a calculator that does this type of problem, you must show all the work (x <-->z, numbers from the paper table needed) to demonstrate that you can do the problem by hand.

Postpone till after Sec. 2.1:  How do you know if it's safe to treat a data set as if it comes from a Normal Density model?
Let SPSS draw a normal curve with the same mean and s.d. over its histogram; or use a Normal quantile plot: Handout (is notes) forthcoming.
= = = = = = = = = = = = = = = =

Relationships: (Ch 2 Intro and Sec. 2.1)
Two variables recorded on the same cases:
"Associated" = knowing the value of one variable (the "explanatory" one) tells you something about the other  (the "response" variable)
Nurses' salaries,  Workplace (hospital/office)
Quantitative on Categorical:  Done: back-to-back (side by side) stemplots, boxplots together, histograms on same axes...
Categorical on Categorical:  Sec.. 2.5

Handout:  Scatterplots, and  Scatter HW sheet (mostly repeating handout output. Do first.)
file for handout: govsal_vs_pay.sav
Two Related quantitative variables
"Just Related" or "explanatory & response?"
(scatterplots)
explanatory = independent = "x" = horizontal axis ( = "cause", sometimes but not always)
response =    dependent= "y" = vertical axis      = ("effect ")

(Living histograms:  Height vs. weight, Height vs. gpa)

Discussing Scatterplot:
General Pattern                                      Deviations
Clusters?                                                      Outliers? (label if possible)
Shape (linear, curved, ...?)
Strength of relationship (how unfuzzy)  "Weak, moderate, strong"
Direction
Positively associated:  y increases as x increases (generally).
Negatively associated:  y decreases as x increases.

Mark subgroups differently to do comparisons. (Subgroups defined by categorical variable, like Sex, Region of country)
Some scatterplot data:  educ-v-mortality.sav
govsal_vs_pay.sav  is the file used for most of the handout.
Got to here Wednesday

Correlation (Sec.2.2)
CD or Website,  http://bcs.whfreeman.com/ips7e,
Choose "Statistical Applets",  Correlation and Regression.  Play with data points, observing the Correlation Coefficient.
Check in the "Show Mean X &Mean Y lines" box.  See how much is in each quadrant.
SPSS: back page (p4) top, Scatterplot handout.  Analyze>Correlate>Bivariate, move both variables

Section 2.2
The correlation coefficient r is a numerical measure for how strongly linear (and in what direction) the relationship is.  Doesn't substitute  for a scatterplot.

1. Measures relationship--same whichever variable is on the x-axis
2. "Correlation" --only for 2 quantitative variables
3. "Unitless"--original measurment units are "standardized out"
4. Sign of correlation coefficient matches direction of relationship
5. Between -1 and +1.  0: no linear relationship, + or -1: perfect straight line.
6. Does NOT give info about curved relationships.
7. NOT resistant to outliers--quite sensitive.

**Bear in mind that there were around 400,000 births in California in 1970. (I'm guesstimating.  There were 605,694 births in 1990, and the population of California in 1970 was 2/3 of that in 1990).  So a 3-in-a-thousand event would occur in 3x400 = 1200 births--there would be 1200 women in San Diego Reader's position (many of whom wouldn't know it.)  Rare events DO happen--it's not really fair to only notice and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical intervention (ultrasound, planned inducement of labor or Caesarian, etc.) it often gets now.
 Sievers home Math251-Fall11/Dayq9.htm 10pm 9/14/11
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.