MATH 251, Probability and Statistics I, Fall 2007, Sept. 10 Day 8.After class. 

HW Day 8, Please read ahead: Normal quantile plots, 80--84; then Ch. 2, 2.1(scatterplots) then 2.2 (Correlation)

Handouts  Normal Quantile,  SPSSScatterplots, (Optional:  Normal templates, Practice)
Hand in: Nothing! Postpone all! (Quiz on Normal probably Friday)
Normal quantiles:  Normal quantile plot Handout
p. 92ff.
1.121 (distances: granularity)
1.122  (match the quantile plots)
Use SPSS to make histograms or stemplots, Q-Q plots, and Method 2 Normal quantile plots (like IPS's) for the following.  Comment on what you see.
1.125 (logging) To use each group of data separately, Data>Select Cases (SPSS handout p.5 bottom)     
 1.127 To create the data, put a number in the 100th row of a data file (so SPSS will create 100 numbers in your new variable.)  Transform>Compute: RV.Uniform(0,1) (SPSS handout p. 8 bottom)

Scatterplots (ch2.1) p. 112ff, mostly
Continue to watch for data variables with the wrong Measure in SPSS.
Using SPSS:  Handout:  Scatterplots pp.1-3, and  Regression, p.4
2.6 Muslim literacy (note, table 1.2)
2.14 speed/fuel  Also Insert>FitLine>Smoother for this set.
2.13 body mass M//F (use sex as the Legend Variable) 
2.16 icicles
2.18 nematodes   Use Dot-line (Scatterplot handout p.2 top) to get means line.  Sometimes by hand it's convenient to use medians instead of means; easy to estimate in the picture.  BY HAND, Mark the medians for each nematode level and connect with a dotted line.  How different are the two lines?

On a separarate sheet:  Begin the Governors' Salaries HW (p.3, Scatterplot handout.)  You can do 1-5 now.  KEEP till all questions have been answered.

Read, discuss 
p. 112, 2.1, 2.2 
 
  Normal quantiles: 
p. 90, 1.119, 1.120
 
 
 
 

 

Optional 
2.7 breeding merlins,  Make the scatterplot by hand if you need the practice.

Questions on HW?  Comments:  HW handed back last time:  CO2 emissions:  Shows not only the "poor" countries, J-shaped beginning, but a small "hump" for the developed countries, and then the outliers, US, Canada, Australia, at about twice the level of the other developed countries.
  SPSSDay 7
 Normal distribution? Day 6Handouts/links for Normal  Table problems (optional)Templates,    Practice
A) --Also, that pregnancy lasting 310 days:." Dear Reader: The average gestation period is 266 days. Some babies come early. Others come late. Yours was late.  The question here is not whether the baby was late. That fact is already known. At issue is the credibility of the length of the delay. Ten months and five days is approximately 310 days, which means that the pregnancy exceeded the norm by 44 days. [How unusual is that?] --What proportion of pregnancies last 310 days or more?  z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.
        3 in a thousand pregnancies last that long.  Pretty rare.  Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course).*

Continue here WED: How do you know if it's safe to treat a data set as if it comes from a Normal Density model?
Let SPSS draw a normal curve with the same mean and s.d. over its histogram; or use a Normal quantile plot: Handout

Relationships: (Ch 2 Intro and Sec. 2.1)   Handout:  Scatterplots pp.1-3, and  Regression, p.4
    file for handout: govsal_vs_pay.sav
Related quantitative variables
    "Just Related" or "explanatory & response?"
(scatterplots)
explanatory = independent = "x" = horizontal axis ( = "cause", sometimes but not always)
  response =    dependent= "y" = vertical axis      = ("effect ")

(Living histograms:  Height vs. weight, Height vs. gpa)

Discussing Scatterplot
General Pattern                                      Deviations
Clusters?                                                      Outliers? (label if possible)
Shape (linear, curved, ...?)
    Strength of relationship (how unfuzzy)  "Weak, moderate, strong"
Direction
    Positively associated:  y increases as x increases (generally).
    Negatively associated:  y decreases as x increases.

Mark subgroups differently to do comparisons. (Subgroups defined by categorical variable, like Sex, Region of country)
  Some scatterplot data:  educ-v-mortality.sav
govsal_vs_pay.sav  is the file used for most of the handout.



Ahead: Correlation (2.2)
CD or Website,  http://bcs.whfreeman.com/ips5e,
  Choose "Statistical Applets",  Correlation/Regression.  Play with data points, observing the Correlation Coefficient.
    Check in the "Show Mean X &Mean Y lines" box.  See how much is in each quadrant.
   **Bear in mind that there were around 400,000 births in California in 1970. (I'm guesstimating.  There were 605,694 births in 1990, and the population of California in 1970 was 2/3 of that in 1990).  So a 3-in-a-thousand event would occur in 3x400 = 1200 births--there would be 1200 women in San Diego Reader's position (many of whom wouldn't know it.)  Rare events DO happen--it's not really fair to only notice and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical intervention (ultrasound, inducement of labor, etc.) it often gets now.

Sievers home  Math251-Fall07/Day2s8.htm      3:20pm    9/10/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.