MATH 251, Probability and Statistics I, Fall 2007, F Aug.31, Day 4 hit reload...

Meet here Monday; in Mac 101 lab Wednesday for big SPSS intro.  Bring a disk or usb.
Unless otherwise noted, all assignments are in IPS
Day 4 Assigned: ( std. dev., linear transformations, densities)  Covers end of ch. 2, + pp. 66-9
Read for Wednesday's class, rest of 1.3 (Normal distributions), up to Normal quantile plots.
Use SPSS  Handout for computation of mean and std. dev, unless it says do it by hand.
Day 4 Hand in: 
Sec. 2.1  p.56ff. 
1.75 quintiles by hand. Method p. 45 middle
1.50 (Do xbar and s by hand.  Then put them in SPSS & do them.)
1.43 abc (you did the stemplot Day 2) Use SPSS for c (Table1.5)
1.77 (SPSS) Trimmed mean.  Do it like this: Load the guinea pig file (Table 1.8) into SPSS. Find the mean.  Then delete the highest 10% and lowest 10% of the observations (Click on the row, hit the Delete key). Find the mean of these = 10% trimmed mean.  Similarly find the 20% trimmed mean. (Median = 102.5, to do the comparisons. )
1.70 (SPSS) (computational accuracy)

1,72, 1.76 (Linear transformations)
A.  The mean August temperature in a certain Asian city is 25o C, with standard deviation 5o C.  What are these values in degrees Fahrenheit?  (f = 32 + 1.8 c)
p.98 1.141abc(ed scores "transformed to a standard scale")
   (hint for a: make 2 equations and solve for a and b)
Problem B below 

Sec. 1.3,
Density Handout:
complete the tables by counting squares.  (Look for patterns, to stay entertained...)
 p.84 1.80, 1.81, 1.82, (unif. density)
1.83 (mean, median, mode)

Read, discuss
 

C. In problem B below, you need b > 0. 
Where does this come in to the computation--what would happen if you used a b that was negative?. 

Optional 
 

Do 1.70 (computational accuracy) in Excel, if you're an Excel user.

B.  You have a data set x1, x2,... , xn,  which has mean xbar and standard deviation s.
You make a linear transformation xi*= a+b xi, on each data point.  (The book uses xnew  instead of x*, p. 54)
a can be + or - , but b should be positive.  (In practical terms, negative b would "flip" the data, reversing the order.)
Show that the mean xbar* of the transformed data set = a + b xbar,  (last sentence of p. 55)
and that the standard deviation of the transformed data set,  s* = bs .
(If you are not skilled at working with big sigmas, do it for n = 3 and write out all the sums with +'s.)
Do the proof by starting with the formula for the mean expressed in the xi*'s, e.g.  xbar*= (x1*+x1*+x3*)/3.
Plug in  xi* = a+b xi, and work the algebra to arrive at the desired expression (a+b xbar) involving the mean of the xi's.  Repeat for the  standard deviation formula.  Hint:  xbar* appears in the standard deviation expression:  substitute a +b xbar for it, since you already proved they were equal.

   Check for Homework questions? Especially 1.48, 1.64, "Read, to discuss" problems. Remaining #s on board. HW:  PLEASE Label with Day #.  Please paperclip/staple.

Monday Quiz:  In class, closed book:  Stemplot, 5#summary and boxplot.  Mean & s.d. by hand, showing all steps.
- - - - - - - - - - - - - - - - - - - - - - - - - - -
Handout (For Sec. 1.3):  Density Density  (Solutions)
(Re)visit mean/median, 5#summary, boxplot 

Some other measures of middle:
    Mode (modal class) (peak, most popular), trimmed mean (throw away a % on each end), midrange (midway between min and max)

Spread, cont.
Standard deviation (goes with mean) . Square root of:
           Variance:  (almost) average of squared deviations from the mean.
                  (deviations sum to 0)
                 (Divide by (n-1) "degrees of freedom"--dimension of vector space spanning the deviations from the mean)
Demo:  1,1,2,4, mean = 2, sum of squared deviations = 6, variance = 2, s = 1.41
1,1,2,4,12, mean = 4, sum of squared deviations = 86, variance = 21.5, s = 4.64.
(Midcomputation check:  Sum of deviations from the mean (before squaring each) always = 0 )

--s is Always > 0  (0 only if all observations are =)
--s units the same as those of the observations (squared and squarerooted).
     

Very sensitive to outliers (the outliers  contribute much more than their share to the Sum of Squared Deviations from the Mean)

Mean and Standard Deviation are for Symmetric Unimodal  distributions without big outliers.
   (ideally "Bell-shaped" = Normal)

SPSS to find mean and s.d.   Handout

We've been looking at SHAPE of distributions, and the ways irregularities can point us to knowledge about the data. (Living histograms.)  As we Note p.49 middle:  Statistical [summary] measures and methods based on them are generally meaningful only for distributions of sufficiently regular shape. ... [Q]uickly resorting to fancy calculations is the mark of a statistical amateur.  Look, think, and choose your calculations selectively.

Summaries of Middle & Spread "Systems:"
-- (Midrange, Range  Very sensitive to outliers--they use only the max and min!)
-- Median, IQR  (+ Quartiles Q1, Q3, 5-number summary), based on percentiles (j'th percentile is > j% of the data)
-- Mean, StandardDeviation "y-bar" (or "x-bar"), "s"  (good for symmetric unimodal, no outliers)

--------------------------------------------  -----------------------------------------
Linear transformations do not change the shape of a distribution :   A "good" measure of center or spread should "act naturally" if you change units of measurement by shifting (translating) (everyone eats one more cookie)
 or by stretching or shrinking (changing scale) (all cookies are broken in half; count half-cookies) . 
Fahrenheit<--> Celsius.

     New x* = a + bx, for each observation.
Measures of spread are unaffected by the shifting! Only affected by the scale change.
Page 55 gives the rules explicitly.  Problem B has you prove them for mean and standard deviation.
-------------------------------------------     ------------------------------------------
  1.3   Density function or curve: idealized histogram.
Area = relative frequency.

Any curve that is above the x-axis and has area exactly 1 under it can be thought of as the idealization of some set of observations, and can be called a Density curve.  We carry over our terms for shape, and our summary measures.
Densities
(When values can take on any of a continuous interval of numbers)
Example:  Spinner:  Label edge with continuous values from 0 to 1. Spinning should produce 1/10 of all spins in each colored sector.  Simulations of 500, 3000 spins show roughly true. More spins would get closer to  Uniform shape.

Abstraction, idealized histogram ("Probability Model") =
Density curve. Describes a theoretical distribution of data.
Any such model is a curve
   --always on or above the horizontal axis
   --has area exactly 1 underneath it.

Many, many models are possible, modeling many phenomena:  (Histograms of data for some models) Median, mean, percentiles, standard deviation are defined for a density model in analogy to those for a histogram.
-- median has half of area below and half above.
-- mean is balance point.  On the long-tail side of median if distribution is skewed. Same as median if symmetric.
--First quartile has 1/4 of area below, 3/4 above. Etc. for others.
--Greek labels "mu" for mean and "sigma" for std. dev. of a Density.
Complex models require tables to find proportions.  Make some tables: Handout Density  (Solutions)


Sievers home  Math251-Fall07/Day2s4.htm      10pm    8/30/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.