|
Day 4 Hand in: B. The boxplots shown at Making boxplots: Plain vanilla is fine, or use
judgment about outliers. DON'T do 1.5 IQR rule--for computers only... C. Matching Histo's & Boxplots: Use
http://www.whfreeman.com/ips7e:
One-variable calculator. If it doesn't work, try http://www.whfreeman.com/ips6e,
the datasets are there by name (table/problem numbers changed.) 1.86 Hummingbirds
By hand! To get to the 5# summaries & boxplots, you need
to
make stemplots for each. (You're trimming --rounding down-- so
your numbers may be a little different from the answer book.)
DO all the rest:
1,97, 1.98 (Linear transformations) |
Read, discuss .. |
Optional
- - - - - - - - |
B. linear transformations algebra You have a
data set x1, x2,...
, xn, which has mean xbar and standard deviation s.
a) We noted that the sum of all the deviations-from-the-mean's, sum(xi
-xbar) always should equal 0. Prove this is true by
algebra. (If you are not skilled at working with big sigmas, do
it for n = 3 (x1, x2, x3)
(and write out all the sums with +'s.) (This is ex.
1.92 in IPS)
b) You make a linear transformation xi*= a+b xi,
on each data point. (The book uses xnew instead
of x*, p. 43)
a can be + or - , but b should be positive. (In
practical terms, negative b would "flip" the data, reversing the order.)
Show that the mean xbar* of the transformed data set = a
+ b
xbar,
and that the standard deviation of the transformed data set,
s* = bs .
(The text shies at making formulas...)
(If you are not skilled at working with big sigmas, do it for n = 3
and write out all the sums with +'s.)
Do the proof by starting with the formula for the mean expressed in the
xi*'s, e.g. xbar*= (x1*+x2*+x3*)/3.
Plug in xi* = a+b xi, and work the
algebra to arrive at the desired expression (a+b xbar) involving
the mean of the xi's. Repeat for the
standard deviation formula. Hint: xbar* appears in
the standard deviation expression: substitute a +b xbar
for it, since you already proved they were equal.
Wednesday,
probably, day for SPSS in Mac 101, at class time. FirstSPSS
Handout,
Morganstore instructions
on back.
Coming Friday (probably)
Quiz: In
class, closed book: Stemplot, 5#summary and boxplot. Mean
& s.d. by hand, showing all steps.
- - - - - - - - - - - - - - - - - - - - - - - - - - -
Revisit or meet mean/median,
5#summary,
boxplot , Day 3
Compare
boxplot with histogram: longer boxplot
sections mean lower histogram height and vice versa.
<<-- Example: from Connexions:
Collaborative Statistics
Barbara Illowsky, Ph.D., Susan Dean.
Some other "averages"/ measures of middle: (Many
exist)
Trimmed mean: throw away, say top and bottom 5%, take
mean of rest. Resistant, but hard to work with. (SPSS,
later)
Midrange: Point midway on the ruler scale
between smallest and largest: Min = 5, Max = 15, Midrange =
(5+15)/2= 10.
Highly sensitive, non-resistant, not too
useful, but quick!
The Mode/modal class: (Mode: most "popular")
Group with the most individuals; point of peak of the histogram "curve".
Spread,
cont.
Standard deviation (goes with mean)
. Square root of:
Variance: (almost) average
of squared deviations from the mean.
(deviations sum to 0)
(Divide by (n-1)
"degrees of freedom"--dimension of vector space
spanning
the deviations from the mean)
Demo: 1,1,2,4, mean = 2, sum of squared deviations
= 6, variance = 2, s = 1.41 (table is good)
1,1,2,4,12, mean = 4, sum of squared deviations = 86, variance =
21.5, s = 4.64.
(Midcomputation check: Sum of deviations from the mean (before
squaring
each) always = 0 )
--s is Always > 0 (0 only if all observations are =)
--s units the same as those of the
observations
(squared and squarerooted).
Very
sensitive
to outliers (the outliers contribute much more than their
share to the Sum of
Squared Deviations from the Mean)
SPSS to find mean and s.d. Handout
We've been looking at SHAPE of distributions, and the ways
irregularities
can point us to knowledge about the data. (Living histograms.)
Note p.39 middle: Statistical [summary] measures and methods
based
on them are generally meaningful only for distributions of sufficiently
regular shape. ... [Q]uickly resorting to fancy calculations is the
mark
of a statistical amateur. Look, think, and choose your
calculations
selectively.
Summaries
of Middle & Spread "Systems:"
-- (Midrange, Range Very
sensitive to outliers--they use only the max and min!)
-- Median, IQR (+
Quartiles Q1, Q3, 5-number summary), based on percentiles (j'th
percentile is > j% of the data)
-- Mean, StandardDeviation "x-bar"
(or "y-bar", etc.), "s" (good for symmetric unimodal, no outliers)
... --------------------------
-----------------------------------------
Linear
transformations (pp. 43-5) do not
change the
shape
of a distribution : A "good" measure of center or spread
should
"act naturally" if you change units of measurement by shifting
(translating) (everyone eats one more cookie)
or by stretching or shrinking (changing scale) (all
cookies are broken in half; count half-cookies) .
Fahrenheit<--> Celsius.
(Community Handbook?)
New x* = a + bx, for each observation.
Measures of spread are unaffected by the
shifting!
Only affected by the scale change.
Page 44 gives the rules explicitly. Problem
B has you prove them for mean and standard deviation.
(Shifting is often done to put numbers in a nice range, with 0 not too
far away. E.g. years from 1970)
| Sievers home | Math251-Fall11/Dayq4.htm | 11am | 9/2/11 |
| x |
x-xbar= x-2 |
(x-xbar)2 |
| 1 |
-1 |
+1 |
| 1 |
-1 |
+1 |
| 2 |
0 |
0 |
| 4 |
2 |
4 |
| 8 = Sum. xbar = 8/4=2 |
0 = sum (always!) |
6 = sum of squared
deviations |