MATH 251, P & S I, Fall 2011, F Sept 2, Day 4 After class hit reload..

Meet in classroom Monday; in Mac 101 lab Wednesday probablyfor big SPSS intro.  Bring a disk or flash drive.
Unless otherwise noted, all assignments are in IPS.
Day 4 Assigned:(Re) Read:  1.2 thru p. 42 (spreads).Then  43-45 ( linear transformations)  
Ahead, 1.3, pp. 50-4 (densities), then ahead (Normal distributions), up to Normal quantile plots p. 65.
Use SPSS  Handout for computation of mean and std. dev, unless it says do it by hand.

Day 4 Hand in: 
Sec. 1.21p. 46 ff, 5# summary, boxplots, repeated HW,

B.  The boxplots shown at
http://www.stata.com/support/faqs/graphics/gph/graphdocs/box1.html
compare bloodpressures for different age groups (these are not healthy people!)
Find, approximately:
a) medians of all groups. 
b) 5 number summary of 60+ group
c) maximum of youngest group (careful!  include outliers!)
d) Write a general description comparing the 3 groups.

Making boxplots:  Plain vanilla is fine, or use judgment about outliers. DON'T do 1.5 IQR rule--for computers only...

C.  Matching Histo's & Boxplots: Use http://www.whfreeman.com/ips7e:  One-variable calculator.  If it doesn't work, try http://www.whfreeman.com/ips6e, the datasets are there by name (table/problem numbers changed.)
For Acid Rain,  Carbon dioxide emissions,  Blood proteins/New guinea (For this one do 1.69 below, instead), sketch on your paper a histogram (adjust the bar widths to get it smoothish). On the same scale (using the Statistics tab to get the 5# summary) make a boxplot.  Note how  they correspond:  longer boxplot sections mean lower histogram height and vice versa.

1.86 Hummingbirds  By hand!  To get to the 5# summaries & boxplots, you need to make stemplots for each.  (You're trimming --rounding down-- so your numbers may be a little different from the answer book.)

Separate pageYes. 1.67 a, b only. (potatoes)  Use a stemplot, 5 # summary, and boxplot.  STILL KEEP this page for doing part c later. 


p.46ff, mean and s.d. 
1.78 Metabolic rates (Do xbar and s by hand.  A table is good.  Then type  them into SPSS ( Handout ) & do them. Or try using the data file  METABOLIC  from Morganstore--and send me clarifications/corrections, please!)
1.69 Blood protein... New Guinea  Use the  One-variable Statistical Calculator in the Applets to do a, b, and c; instead of a histogram, you can copy the stemplot from there.  Put the boxplot vertically  next to it, on the same scale as much as possible. Answer c. ALSO:  get the mean and s.d. from  the Calculator. 
   How far is the mean from the median?  Why the difference? 
    Find the mean minus one standard deviation.  This is negative! below any of the data.  A good sign that mean/s.d. are not a good system for this data set.

1.94  (computational accuracy)  Use SPSS and the One-variable Statistical Calculator in the Applets (tell which, which applets version) 

DO all the rest: Postpone the rest, BUT do part a of Problem B below, keep it to hand in as part of Day 5.

1,97, 1.98 (Linear transformations)
A.  The mean August temperature in a certain Asian city is 25o C, with standard deviation 5o C.  What are these values in degrees Fahrenheit?  (f = 32 + 1.8 c)
p.77 1.176abc (ed scores "transformed to a standard scale")
   (hint for a: make 2 equations, one for means & 1 for s.d.'s, and solve for a and b)
Problem B below , linear transformations algebra

Read, discuss
 

..
C. In problem B below, you need b > 0. 
Where does this come in to the computation--what would happen if you used a b that was negative?. 

Optional 
 


- - - - - - - -
Do 1.94 (computational accuracy) in Excel, if you're an Excel user.

B.  linear transformations algebra You have a data set x1, x2,... , xn,  which has mean xbar and standard deviation s.
a) We noted that the sum of all the deviations-from-the-mean's, sum(xi -xbar) always should equal 0.  Prove this is true by algebra. (If you are not skilled at working with big sigmas, do it for n = 3 (x1, x2, x3) (and write out all the sums with +'s.)   (This is ex. 1.92 in IPS)
b) You make a linear transformation xi*= a+b xi, on each data point.  (The book uses xnew  instead of x*, p. 43)
a can be + or - , but b should be positive.  (In practical terms, negative b would "flip" the data, reversing the order.)
Show that the mean xbar* of the transformed data set = a + b xbar,
and that the standard deviation of the transformed data set,  s* = bs . (The text shies at making formulas...)
(If you are not skilled at working with big sigmas, do it for n = 3 and write out all the sums with +'s.)
Do the proof by starting with the formula for the mean expressed in the xi*'s, e.g.  xbar*= (x1*+x2*+x3*)/3.
Plug in  xi* = a+b xi, and work the algebra to arrive at the desired expression (a+b xbar) involving the mean of the xi's.  Repeat for the  standard deviation formula.  Hint:  xbar* appears in the standard deviation expression:  substitute a +b xbar for it, since you already proved they were equal.


   Check for Homework questions? Day 3 Especially 1.76 (0's). "Read, to discuss" problems?  Remaining #s on board.
HW:  PLEASE Label with Day #.  Please paperclip/staple.
Helpers more or less up to date.  Class members posted.  Math251@wells.edu working.

Wednesday, probably, day for SPSS in Mac 101, at class time.   FirstSPSS Handout,   Morganstore instructions on back.
Coming Friday (probably) Quiz:  In class, closed book:  Stemplot, 5#summary and boxplot.  Mean & s.d. by hand, showing all steps.

- - - - - - - - - - - - - - - - - - - - - - - - - - -
Revisit or meet  mean/median, 5#summary, boxplot , Day 3
From http://cnx.org/content/m17103/latest/Ch2_boxplot_4.png Compare boxplot with histogram:  longer boxplot sections mean lower histogram height and vice versa.
<<-- Example:   from Connexions:
Collaborative Statistics
Barbara Illowsky, Ph.D., Susan Dean.

Some other "averages"/ measures of middle:   (Many exist)
  Trimmed mean: throw away, say top and bottom 5%, take mean of rest.  Resistant, but hard to work with.  (SPSS, later)
  Midrange:  Point midway on the ruler scale between smallest and largest:  Min = 5, Max = 15, Midrange = (5+15)/2= 10.
      Highly sensitive, non-resistant, not too useful, but quick!
  The Mode/modal class: (Mode: most "popular")  Group with the most individuals; point of peak of the histogram "curve".

Spread, cont.
Standard deviation (goes with mean) . Square root of:
           Variance:  (almost) average of squared deviations from the mean.
                  (deviations sum to 0)
                 (Divide by (n-1) "degrees of freedom"--dimension of vector space spanning the deviations from the mean)
Demo:  1,1,2,4, mean = 2, sum of squared deviations = 6, variance = 2, s = 1.41 (table is good)
1,1,2,4,12, mean = 4, sum of squared deviations = 86, variance = 21.5, s = 4.64.
(Midcomputation check:  Sum of deviations from the mean (before squaring each) always = 0 )

--s is Always > 0  (0 only if all observations are =)
--s units the same as those of the observations (squared and squarerooted).
     

Very sensitive to outliers (the outliers  contribute much more than their share to the Sum of Squared Deviations from the Mean)

Mean and Standard Deviation are for Symmetric Unimodal  distributions without big outliers.
   (ideally "Bell-shaped" = Normal)

SPSS to find mean and s.d.   Handout

We've been looking at SHAPE of distributions, and the ways irregularities can point us to knowledge about the data. (Living histograms.)  Note p.39 middle:  Statistical [summary] measures and methods based on them are generally meaningful only for distributions of sufficiently regular shape. ... [Q]uickly resorting to fancy calculations is the mark of a statistical amateur.  Look, think, and choose your calculations selectively.

Summaries of Middle & Spread "Systems:"
-- (Midrange, Range  Very sensitive to outliers--they use only the max and min!)
-- Median, IQR  (+ Quartiles Q1, Q3, 5-number summary), based on percentiles (j'th percentile is > j% of the data)
-- Mean, StandardDeviation "x-bar" (or "y-bar", etc.), "s"  (good for symmetric unimodal, no outliers)

... --------------------------  -----------------------------------------
Linear transformations (pp. 43-5) do not change the shape of a distribution :   A "good" measure of center or spread should "act naturally" if you change units of measurement by shifting (translating) (everyone eats one more cookie)
 or by stretching or shrinking (changing scale) (all cookies are broken in half; count half-cookies) . 
Fahrenheit<--> Celsius.
  (Community Handbook?)
     New x* = a + bx, for each observation.
Measures of spread are unaffected by the shifting! Only affected by the scale change.
Page 44 gives the rules explicitly.  Problem B has you prove them for mean and standard deviation.
(Shifting is often done to put numbers in a nice range, with 0 not too far away.  E.g. years from 1970)


Sievers home  Math251-Fall11/Dayq4.htm      11am    9/2/11
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
- - - - - - - - - - - - - - - - - - - - -
Table for calculating sum of squared deviations, for n = 4 observations.
x
x-xbar= x-2
(x-xbar)2
1
-1
+1
1
-1
+1
2
0
0
4
2
4
8 = Sum.  xbar = 8/4=2
0 = sum (always!)
6 = sum of squared deviations