Day 4 Hand in:
B. The boxplots shown at
Making boxplots: Plain vanilla is fine, or use
judgment about outliers. DON'T do 1.5 IQR rule--for computers only...
C. Matching Histo's & Boxplots: Use
One-variable calculator. If it doesn't work, try http://www.whfreeman.com/ips6e,
the datasets are there by name (table/problem numbers changed.)
By hand! To get to the 5# summaries & boxplots, you need
make stemplots for each. (You're trimming --rounding down-- so
your numbers may be a little different from the answer book.)
DO all the rest:
1,97, 1.98 (Linear transformations)
- - - - - - - -
B. linear transformations algebra You have a
data set x1, x2,...
, xn, which has mean xbar and standard deviation s.
a) We noted that the sum of all the deviations-from-the-mean's, sum(xi -xbar) always should equal 0. Prove this is true by algebra. (If you are not skilled at working with big sigmas, do it for n = 3 (x1, x2, x3) (and write out all the sums with +'s.) (This is ex. 1.92 in IPS)
b) You make a linear transformation xi*= a+b xi, on each data point. (The book uses xnew instead of x*, p. 43)
a can be + or - , but b should be positive. (In practical terms, negative b would "flip" the data, reversing the order.)
Show that the mean xbar* of the transformed data set = a + b xbar,
and that the standard deviation of the transformed data set, s* = bs . (The text shies at making formulas...)
(If you are not skilled at working with big sigmas, do it for n = 3 and write out all the sums with +'s.)
Do the proof by starting with the formula for the mean expressed in the xi*'s, e.g. xbar*= (x1*+x2*+x3*)/3.
Plug in xi* = a+b xi, and work the algebra to arrive at the desired expression (a+b xbar) involving the mean of the xi's. Repeat for the standard deviation formula. Hint: xbar* appears in the standard deviation expression: substitute a +b xbar for it, since you already proved they were equal.
probably, day for SPSS in Mac 101, at class time. FirstSPSS
Coming Friday (probably) Quiz: In class, closed book: Stemplot, 5#summary and boxplot. Mean & s.d. by hand, showing all steps.
- - - - - - - - - - - - - - - - - - - - - - - - - - -
Revisit or meet mean/median, 5#summary, boxplot , Day 3
Compare boxplot with histogram: longer boxplot sections mean lower histogram height and vice versa.
<<-- Example: from Connexions:
Barbara Illowsky, Ph.D., Susan Dean.
Some other "averages"/ measures of middle: (Many
Trimmed mean: throw away, say top and bottom 5%, take mean of rest. Resistant, but hard to work with. (SPSS, later)
Midrange: Point midway on the ruler scale between smallest and largest: Min = 5, Max = 15, Midrange = (5+15)/2= 10.
Highly sensitive, non-resistant, not too useful, but quick!
The Mode/modal class: (Mode: most "popular") Group with the most individuals; point of peak of the histogram "curve".
Standard deviation (goes with mean) . Square root of:
Variance: (almost) average of squared deviations from the mean.
(deviations sum to 0)
(Divide by (n-1) "degrees of freedom"--dimension of vector space spanning the deviations from the mean)
Demo: 1,1,2,4, mean = 2, sum of squared deviations = 6, variance = 2, s = 1.41 (table is good)
1,1,2,4,12, mean = 4, sum of squared deviations = 86, variance = 21.5, s = 4.64.
(Midcomputation check: Sum of deviations from the mean (before squaring each) always = 0 )
--s is Always > 0 (0 only if all observations are =)
--s units the same as those of the observations (squared and squarerooted).
Very sensitive to outliers (the outliers contribute much more than their share to the Sum of Squared Deviations from the Mean)
SPSS to find mean and s.d. Handout
We've been looking at SHAPE of distributions, and the ways
can point us to knowledge about the data. (Living histograms.)
Note p.39 middle: Statistical [summary] measures and methods
on them are generally meaningful only for distributions of sufficiently
regular shape. ... [Q]uickly resorting to fancy calculations is the
of a statistical amateur. Look, think, and choose your
of Middle & Spread "Systems:"
-- (Midrange, Range Very sensitive to outliers--they use only the max and min!)
-- Median, IQR (+ Quartiles Q1, Q3, 5-number summary), based on percentiles (j'th percentile is > j% of the data)
-- Mean, StandardDeviation "x-bar" (or "y-bar", etc.), "s" (good for symmetric unimodal, no outliers)
Linear transformations (pp. 43-5) do not change the shape of a distribution : A "good" measure of center or spread should "act naturally" if you change units of measurement by shifting (translating) (everyone eats one more cookie)
or by stretching or shrinking (changing scale) (all cookies are broken in half; count half-cookies) .
Fahrenheit<--> Celsius. (Community Handbook?)
New x* = a + bx, for each observation.
Measures of spread are unaffected by the shifting! Only affected by the scale change.
Page 44 gives the rules explicitly. Problem B has you prove them for mean and standard deviation.
(Shifting is often done to put numbers in a nice range, with 0 not too far away. E.g. years from 1970)
|8 = Sum. xbar = 8/4=2
||0 = sum (always!)
||6 = sum of squared