MATH 251, Probability and Statistics I, Fall 2011, Wed. Sept. 14 Day 9.After Class.corrected HW numbers, twice

HW Day 9,  Ch. 2, Intro, then 2.1(scatterplots)--postpone Transformation pp.89-1, Next, 2.2 (Correlation) Memorize formula for r (p. 102). Then Normal quantile plots, 65-67.

Handouts   SPSS Scatterplots,   Scatterplots (Governor's Salary) HW   (Optional, Normal Practice)

QUIZ on Normal Distribution MONDAY
= = = = = = = =
Reading and questions:  due Wed. Day 12 (a week) (Why does the mean of an IQ test trend upward over the years?  Cf. "Old" IQ test, mean 110 (Moore ed. 1)& IPS7e 1.30-31, WAIS, mean 100.)
"None of the above" article by Malcolm Gladwell
  on reserve or PDF link   html link    . 
Questions: 1) What is the Flynn effect?
2) What is a likely reason for it?
= = = = = = = = = =

Chapter 2: 
p.. 81-2, 2.2 & 2.3  categorical <--> quantitative
p. 82, 2.4  explanatory? response?  What are you after?

Scatterplots (ch2.1) p. 94ff, mostly
Continue to watch for data variables with the wrong Measure in SPSS.
Using SPSS:  Handout:  Scatterplots, and  Scatter HW sheet *
*On a separarate sheet:  Begin the Governors' Salaries HW  You can do 1-5 now.  KEEP till all questions have been answered. file for handout: govsal_vs_pay.sav

p. 88, 2.9 coffee drinks BY HAND, just this one, for refresher.
p. 88, 2.10 (SPSS) debt (NOT MAKING a scatterplot here--other questions).
p. 88, 2.11 (SPSS) bigger debtors too.  (This was just Before the Great Recession; U.S. debt shown was accumulated largely after the Bush tax cuts of 2000) Answer book shows bigger ones with different symbols.  Don't bother with that, but Label the 5 added countries.
2.35 (SPSS) body mass M//F (put sex in the Set Markers by box)  Turn the page for (b)
2.36 (SPSS) icicles .  You'll come back to this dataset.
2.31 (SPSS) merlin falcons   To plot Mean response:  In Chart Editor, Elements> Interpolation Line (Big SPSS handout p.10 top, "Timeplot") gives means line.  Sometimes by hand it's convenient to use medians instead of means; easy to estimate in a picture (middle dot, or half way between the 2 middle dots).  BY HAND, Mark the medians for each "pairs" level and connect with a dotted line.  How different are the two lines?
POSTPONE THE REST
Correlation 2.2  p. 101ff. (top of  Scatterplots  handout  p. 4  )
Governors' Salaries Scatter HW sheet:  add #6 to 1 thru 5, keep it.
Hand in the rest:
2.53 (SPSS)dates' heights.   
2.42 (SPSS) strong assoc., no correlation
2.54 (SPSS) unsuitable for correlation
2.52 (SPSS)bio vs. physics  Do 2.32 (arabadopsis) also. (you did 2.36(icicles). To get the separate correlations for the 2 icicle groups, you need to select each subgroup (See Scatterplot handout p. 4 top, SPSS intro p. 5 bottom)
2.59 teacher ratings--misuse of concept

Read, discuss 
  2.29reading/IQ
2.30 estimate/ actual reading ability.
  (Note "granularity" because of limited estimate choices.)

POSTPONE THE REST 
  Correlation; using Applet:  Important!
2.55, 2.56
2.60 wrong uses
 
 
 

 

Optional 

Questions on HW? 
  SPSSDay 6
 Normal distribution? Day 8Links for more Normal  Table problems (optional)Templates,    Practice (like the questions I like to ask)
C)  Surprising difference in tails?  Writeup
D) --Also, that pregnancy lasting 310 days:." Dear Reader: The average gestation period is 266 days. Some babies come early. Others come late. Yours was late.  The question here is not whether the baby was late. That fact is already known. At issue is the credibility of the length of the delay. Ten months and five days is approximately 310 days, which means that the pregnancy exceeded the norm by 44 days. [How unusual is that?] --What proportion of pregnancies last 310 days or more?  z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.
        3 in a thousand pregnancies last that long.  Pretty rare.  Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course).*

Quiz MONDAY:  Normal distribution and tables.  68-95-99.7% rule, and problems like those on the Normal Probability Practice Handout .  I will give you copies of Table A; if you have a calculator that does this type of problem, you must show all the work (x <-->z, numbers from the paper table needed) to demonstrate that you can do the problem by hand.

Postpone till after Sec. 2.1:  How do you know if it's safe to treat a data set as if it comes from a Normal Density model?
Let SPSS draw a normal curve with the same mean and s.d. over its histogram; or use a Normal quantile plot: Handout (is notes) forthcoming.
= = = = = = = = = = = = = = = =

Relationships: (Ch 2 Intro and Sec. 2.1) 
Two variables recorded on the same cases: 
"Associated" = knowing the value of one variable (the "explanatory" one) tells you something about the other  (the "response" variable)
     Nurses' salaries,  Workplace (hospital/office) 
Quantitative on Categorical:  Done: back-to-back (side by side) stemplots, boxplots together, histograms on same axes...
Categorical on Categorical:  Sec.. 2.5

 Handout:  Scatterplots, and  Scatter HW sheet (mostly repeating handout output. Do first.)
    file for handout: govsal_vs_pay.sav
Two Related quantitative variables
    "Just Related" or "explanatory & response?"
(scatterplots)
explanatory = independent = "x" = horizontal axis ( = "cause", sometimes but not always)
  response =    dependent= "y" = vertical axis      = ("effect ")

(Living histograms:  Height vs. weight, Height vs. gpa)

Discussing Scatterplot:
General Pattern                                      Deviations
Clusters?                                                      Outliers? (label if possible)
Shape (linear, curved, ...?)
    Strength of relationship (how unfuzzy)  "Weak, moderate, strong"
Direction
    Positively associated:  y increases as x increases (generally).
    Negatively associated:  y decreases as x increases.

Mark subgroups differently to do comparisons. (Subgroups defined by categorical variable, like Sex, Region of country)
  Some scatterplot data:  educ-v-mortality.sav
govsal_vs_pay.sav  is the file used for most of the handout.
Got to here Wednesday



Correlation (Sec.2.2)
CD or Website,  http://bcs.whfreeman.com/ips7e,
  Choose "Statistical Applets",  Correlation and Regression.  Play with data points, observing the Correlation Coefficient.
    Check in the "Show Mean X &Mean Y lines" box.  See how much is in each quadrant.
SPSS: back page (p4) top, Scatterplot handout.  Analyze>Correlate>Bivariate, move both variables

Section 2.2
The correlation coefficient r is a numerical measure for how strongly linear (and in what direction) the relationship is.  Doesn't substitute  for a scatterplot.

  1. Measures relationship--same whichever variable is on the x-axis
  2. "Correlation" --only for 2 quantitative variables
  3. "Unitless"--original measurment units are "standardized out"
  4. Sign of correlation coefficient matches direction of relationship
  5. Between -1 and +1.  0: no linear relationship, + or -1: perfect straight line.
  6. Does NOT give info about curved relationships.
  7. NOT resistant to outliers--quite sensitive.

 



   **Bear in mind that there were around 400,000 births in California in 1970. (I'm guesstimating.  There were 605,694 births in 1990, and the population of California in 1970 was 2/3 of that in 1990).  So a 3-in-a-thousand event would occur in 3x400 = 1200 births--there would be 1200 women in San Diego Reader's position (many of whom wouldn't know it.)  Rare events DO happen--it's not really fair to only notice and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical intervention (ultrasound, planned inducement of labor or Caesarian, etc.) it often gets now.

Sievers home  Math251-Fall11/Dayq9.htm    10pm    9/14/11
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.