Math 151 , Day 12, Friday, Feb. 17, 2012  Hit Reload...After class..

HW Day 12: Read Ch. 4 (Scatterplots and correlation) to p. 104 Check p.112  4.14, 15, 16,   and   pp. 104-112 (correlation) Check 4.16 thru 4.22.  You do not have to be able to calculate r by hand.  You should be able to guess roughly at an r for a swarm of data; as p.108-9, and know and  be able to use facts 1-4, p. 107, and Cautions 1-4 pp. 108,110.
Please also, Ahead, Ch. 5, Regression, thru p. 135 (Check: p. 137: 5.17 through 23, basic line and regression line facts and tools (5.18: those are not very satisfactory answers, but you should be able to eliminate at least one). 5.24 r and slope.  5.26 Don't calculate! If you sketch the graph by hand and draw a line thru the points, you should be able to guesstimate the slope well enough to choose among the 3 answers. 5.25 r2 is the square of r)  Then Continuing regression, p. 126-147.

Hand In Next class; please Read Chapter 4 and as far as you can stand in 5. 

= ="Approximately" Normal = =
p. 91, 3.47  ACT scores (whole numbers)
p. 92, 3.50 (Use SPSS )  Monsoon rains  If you use Graphs menu to make the histogram, you can have it put a Normal curve over your histogram.

Postpone Ch. 4, but please read!
- - - - - - - Chapter 4, intro- - - - - - - -
p. 97, 4.1 explanatory/response or just association
   
4.2 expl/ resp in an experiment (coral)
   
4.3 beer and blood alcohol, other variables
p. 115, 4.26 date heights Make the scatterplot by hand.  Answer these questions instead of the ones given:  Describe the relationship--form, direction, strength,  (with only 6 points there's not enough data  to talk about outliers).  Is there any female dating a male shorter than she is?(Keep a copy of the graph, to use in the "correlation" hw.)
p. 115 4.25 reading ability
 - - - - - - - - -
Link to SPSS Graphs and correlation coefficients for this HW page, if you can't make SPSS work. ..

- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Scatterplots using SPSS. Get Scatterplot handout, outside my door, or  link Please email me with any  SPSS difficulties or discoveries!
---From now on, make all scatterplots on SPSS!  Don't forget to check Measure, and to add Labels.  (Trouble printing? Try copy/paste into Word, printing in Word.  If you print from Word, on a computer without SPSS, symbols may look funny.  's OK.)
A) Governors' Salaries HW, accompanying  Scatterplot Handout  handout and govsal_vs_pay.sav  data file. Use SPSS and answer questions 1-5.   Do these questions on a separate page, and Keep till we have finished all 13 questions!

Hand in:
p. 99, 4.4 and p. 102, 4.6 (Use SPSS or Applet: Two-variable Statistical Calculator ) lean vs. metab. rate (Save your SPSS file; you'll use it again for 4.12)
p.102, 4.8 (SPSS) Ford gas mileage
p. 117 4.31 (SPSS) icicle growth. Data is in table 4.2. Be sure to write on your graph which group is slow water, which fast.
pp. 121-2, 4.43, (SPSS) heating/solar panels.
p. 104, 4.9,(SPSS) lean vs. metab. rate, Men added

B. If not done in class .It wasn't??: Do it!.:
Use   educ-v-mortality.sav  (in SPSS for class BPS5e folder). Identify the two outlier cities at left, and speculate as to why they are different from the pack of data, having very low mortality rates compared with the "typical" for their education level. Ask others, till you get a satisfying answer.

- - - -.. - - - - - - - -
Correlation: you can get the SPSS output now if you want, even if the problems aren't yet assigned. (back page, top, of SPSS handout)
Correlation
(thinking):

p. 120-1, 4.40 and 4.41 Applet explorations
p. 120, 4.38 and 4.39 correlation meaning

p. 115, 4.26 date heights again  You graphed this by hand.  r = .5653. Now answer the questions in the text.

p. 113, 4.21 Husbands 2 years older  (Hint: make  a data table and the corresponding scatterplot for 4 or 5 couples with different x's, and look at it.)

Correlation (computing & thinking)
Governors' Salaries HW:
Do problem 6.  Keep this with the previous work.

p. 111, 4.13 (SPSS) gas, speed (made-up data): association but 0 correlation.  Find the means and draw the mean lines on your graph (by hand) to help explain the 0 correlation.

p. 116, 4.28 (SPSS) Sparrowhawk colonies. 

p. 110, 4.12 (SPSS or Applet) Lean vs metab. rate again (Women)To add a data pair in SPSS just type them in a new row at the bottom.  To delete, click on the case number, which highlights the whole row, hit delete.

(This problem looks forward to Ch. 5, sort of
 p. 118, 4.32 corn plant density. (SPSS)  Notice how the data is entered for SPSS--not as displayed here! but with the first column giving Plants per acre and the second giving Yield.  Make a scatterplot.  Use your calculator to find the mean yields, and write these on your paper.   (Or You can find means for the separate groups in SPSS : in Explore, Plants to the Factor list).  Graph the means by hand with a pencil on your printed plot, and connect the means dots.
Read, to discuss 
 
..Postpone.
Correlation:
p. 119-20, 4.35 changing units.  Do a rough sketch for yourself.

p. 126, 4.37 investing

Look at all the graphs you make, and guesstimate the correlation coefficient (before you read or calculate it.)


 

 

Optional 
Do now (for Ch. 5) if you need the practice:
Straight line graphing practice:
A.  y = -10 + 3x, graph for 2<x<10.
B.  y = 500 - 20x, graph for 0<x<10.

..Postpone.
More practice reading graph:
p. 114, 4.24 Masters scores
....
Correlation:  Use
http://www.whfreeman.com/bps5e
(see below for details) 
to make different scatterplot 
patterns, and observe their r's.

p. 118, 4.32 corn yield, I said to draw the line by hand.  SPSS can plot the line connecting means on your graph:  In the Chart Editor, do Elements>Interpolation Line. If it doesn't look right, in the Properties window , Interpolation Line tab, choose Line Type: Straight.

Exam 1 returned Comments  Solutions
Sample exam solutions

I haven't been mentioning Science Colloquium, every Friday 12:40-1:20 but they're often fascinating, and often have Statistics in Action (unpredictably, unfortunately)  This spring, mostly student theses. Today, "The Independence Option; Business Knowledge for Science Majors,." Prof. Ellis.  Please come!

HW questions--  Going from x to area (proportion), & backward--area to x: Day 11,   Normal probability practice

 What happens further out in normal tails?  Almost (but not quite) 0.  Rounds to .0000.
p. 90, 3.43:  Difference in tails, M/F math.  Other evidence relevant to the question:  Across countries, the difference in math scores M/F is related to the level of gender equality in the country--the more equal the sexes are in general, the smaller the differential in math scores, and vice versa. (would be good on a scatterplot but I don't have the data in that form).  Evidence for nurture not nature.

A. , What proportion of pregnancies last 310 days or more? Find Mean and s.d. in p.87, 3.19: N(266,16)
        z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.  3 in a thousand! Pretty rare!
      Why do I ask?  (see "San Diego Reader" below )
   Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course)  Discussion.
 These days a DNA test could be done to determine paternity; not then.

New today:
Normal distribution mechanism:
Thing measured is the result of many small independent influences.
"Real" data may not be perfectly normal:
--
just because of natural variation in a particular data set, especially in small data sets.
-- Data falls only on (a lot of ) integers, not really continuous.  In a Continuous Model, no individual value has Area above it--only intervals have area above them.  (so proportion who are exactly = 27 is 0 by the model.  Proportion > 27 = Proportion > 27)  (A fix, if you need a better approximation.  Prop > 27 = Prop. > 27.5.  Proportion > 27 = Proportion > 26.5.  We won't bother)
-- Model may not hold for extreme values.  The Normal Model says there is still a (tiny!) proportion of individuals out at 4, 5, 8 standard deviations away from the mean. These may not even make sense in your real world situation (off the scale). Tails
-- The model may just not be quite right; the mechanism is not quite the Normal one.
But Normal may be a good enough approximation.

= = = = = = = = = = = = = = = = = = = =
Start here Monday

Relationships: (BPS5e Ch.4, at first to p. 104)  
Two Related quantitative variables  (We used side by side stemplots, boxplots, histograms to relate a quantitative variable to a categorical variable)
    "Just Related" or "explanatory & response?"
(Scatterplots)
explanatory = independent = "x" = horizontal axis ( = "cause", sometimes but not always)= predictOR
  response =    dependent = "y" = vertical axis      = ("effect ") =predicteED

(Living histograms:  Height vs. weight, Height vs. gpa)

Discussing Scatterplot
General Pattern                                      Deviations
Clusters?                                                      Outliers? (label if possible)
Form (linear, curved, ...?)
    Strength of relationship (how unfuzzy)  "Weak, moderate, strong"
Direction
    Positively associated:  y increases as x increases (generally).
    Negatively associated:  y decreases as x increases.

Mark subgroups differently to do comparisons. (Subgroups defined by categorical variable, like Sex, Region of country)

Get SPSS Scatterplot handout, link Governors' Salaries HW sheet,or outside my door, if you missed class. (BPS Ch. 4&5)
SPSS:   Graphs>Legacy Dialogs>Scatter/Dot > Simple Scatterplot.  Move variables from the lefthand  list to the X-axis (horizontal)  and Y-axis (vertical) boxes. See Handout for more.  Files from text? Don't forget to check Measure, and to add Labels.

  Some scatterplot data:  educ-v-mortality.sav  . The file used for the handout is govsal_vs_pay.sav..
(BPS Ch. 4&5) 


....
Correlation
:
(pp. 104-112)  The (Pearson) correlation coefficient r is a numerical measure for how strongly linear (and in what direction) the relationship is.  Doesn't substitute  for a scatterplot.
Use if data is:  2 quantitative variables, & "nice":
    One cluster/cloud/band.
   Pretty straight.
   Outlier(s)? Do with/without & be cautious.
Correlation experiments:
  Website,  http://www.whfreeman.com/bps5e,"Statistical Applets",  Correlation/Regression.  Play with data points, observing the Correlation Coefficient.   Check in the "Show Mean X & Mean Y lines" box.  See how much is in each quadrant. Compare with correlation coefficient.

Using SPSS (p.4 top,Scatterplot Handout ) Analyze>Correlate>Bivariate, move both variables across.

Properties (p. 107) and Cautions (p. 108,110):

  1. Measures relationship--same whichever variable is on the x-axis
  2. "Unitless"--original measurement units (cm., inches) are "standardized out"
  3. Sign of correlation coefficient matches direction of relationship. + positive, -negative.
  4.  Between -1 and +1.   0: no linear relationship,   +1 or  -1: perfect straight line.
  1. Between two quantitative variables only!
  2. Does NOT give info about curved relationships (only measures linear part of relationship).
  3. NOT resistant to outliers--quite sensitive.
  4. Not a complete summary, even for nice linear data.  Need means, s.d.'s too.
correlation graph


--You won't have to calculate a correlation coefficient by hand. This formula is a bad one for hand computation (roundoff error); if you must do one by hand, find the computational formula in an old textbook.
--Eyeballing:  sketch xbar and ybar lines, see how much data is in + quadrants, how much in - quadrants.

Strength of correlation says NOTHING about causality!  Strong correlation could be:
     A causes B/   B causes A/  C causes both A and B (lurking C)/  just Chance that they go together in this data set.


Sievers  home  Math151-Sp12/Days12.htm  2:30pm 

2/17/12

This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement  of responsibility.

**[In 1973] the following item appeared in Dear Abby's column:

     Dear Abby: You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my baby for ten months  and five days, and there is no doubt about it because I know the exact date my baby was conceived. My husband is in the Navy  and it couldn't have possibly been conceived any other time because I saw him only once for an hour, and I didn't see him again  until the day before the baby was born. I don't drink or run around, and there is no way this baby isn't his, so please print a retraction about that 266-day carrying time because otherwise I am in a lot of trouble.
                                                                               San Diego Reader
Abby's answer was consoling and gracious but not very statistical:

     Dear Reader: The average gestation period is 266 days. Some babies come early. Others come late. Yours was late.

The question here is not whether the baby was late. That fact is already known. At issue is the credibility of the length of the delay. Ten months and five days is approximately 310 days, which means that the pregnancy exceeded the norm by 44 days. [How unusual is that?]
A. , What proportion of pregnancies last 310 days or more? Find Mean and s.d. in p.74, 3.7
        z = (310-266)/16 = 44/16= 2.75.  Area above 2.75 = .0030.  3 in a thousand! Pretty rare!
      Why do I ask?  (see "San Diego Reader" just above )
   Is "San Diego Reader" one of the 3-in-a-thousand, or is she lying?  (this is the kind of question we deal with in Significance Testing, part 3 of the course)

*Bear in mind that there were around 400,000 births in California in 1970. (I'm guesstimating.  There were 605,694 births in 1990, and the population of California in 1970 was 2/3 of that in 1990). 
So a
3-in-a-thousand event would occur in 3x400 = 1200 births--there would be 1200 women in San Diego Reader's position (many of whom wouldn't know it.) 
Rare events DO happen--it's not really fair to only notice and question them AFTER the fact.
Note--pregnancy in 1970 usually didn't involve the level of medical intervention (ultrasound, inducement of labor, Caesarian, etc.) it often gets now.