NO professor today
(ice!) If you make it to the classroom, I encourage you to share
your HW, see what different ways there
are to get the same answer. The material below will be for
Monday.
I also will not be in for the afternoon meetings! Sorry!
We'll try again next week...
HW assignment Day3 (From Moore unless otherwise
noted.)
Reading: Finish Ch.1,
+ stemplot handout, "Check"
problems p. 24 1.14, 16 thru 22.
Timeplot, p.22-3. You will
need to be able to recognize cyles and trends, not
make
timeplot by hand. (We'll make them in SPSS later.)
Read Ch.2 thru p. 43, then thru p. 47.
Do "check" p. 56, 2.13, 14, 16 (mean/median) 15,17,18
(5#summary/boxplot) Further: Finish Ch.
2.
Do the means and medians required here by hand (with a
calculator).
| Hand in p.14, 1.7 Revisting histogram: histogram bins: Use applet at http://www.whfreeman.com/bps4e, as in class last time. Help. p. 35, 1.43 Orange prices timeplot - - - - - p.39, 2.1 Wood, mean Punch the 20 actual values into your calculator, adding and dividing by 20. A. Find the (approximate) median for the data on Wood breakage, using the numbers in the stemplot on p. 21 (Fig.1.10.) It's approximate because the stemplot data is rounded--Quick and Dirty is often sufficient! Keep a copy for #2.5, which will be assigned soon. p. 41, 2.4 Bonds Home runs Make a stemplot to put the numbers in order to find the medians. For the means, just punch them in. (You can shorten the work by finding the sum of the 18 years excluding the 73, writing that down, and then adding the 73 to get the total for the 19 years. Then divide the appropriate sums by 19 and 18.) A. Using the Handout: Wages in our region: How are the occupations organized (Clearly not alphabetically). For what occupations is the wage scale probably skewed left? (Marker: mean is less than median.) For what other occupations is the skewness not too extreme (Mean no more than about $1000 greater than median)? p. 41, 2.3, p. 57, 2.23, 2.24 mean or median? p. 58, 2.29 fruit eating p. 58, 2.30 newborns. (I said I wouldn't make you make a histogram, but the data's already pre-binned, so do it here.) Also Describe the distribution--symmetric, skewed? p. 58, 2.28 U. endowments. They mean, what do you have to count in to, in the list, to locate the mean and quartiles? p. 59, 2.34 guinea pigs survival: For a) use the One Variable Statistical Calculator Applet at http://bcs.whfreeman.com/bps4e or on your text's CD (If you have an older, used book, it may be in the datasets as if for BPS3e; ex02-23.dat). Just observe the skewness. For b), find the 5-number summary (easy since they're in order in the book), check your answers with the Applet results. Draw the boxplot and compare with the histogram on your screen. (with or without outliers, I don't care.) p. 45, 2.5 Wood again. Go ahead and use the stemplot figures to find the quartiles. Also make a boxplot. p.58, 2.27 Flower length: Find the 5-number summary for bihai, from the stemplot p. 55. If you want more practice, do the other 2 by hand also, but you may just use the numbers from the answers in the back of the book. Use them to make 3 side by side boxplots, and finish the problem as written. |
"Read," to
discuss (be able
to answer in class) p.34,1.40 coins (skewed left) - - - - p. 58, 2.26 Resistance, with Applet |
Optional
P. 59, 2.32 (mean/median play, with Applet) p. 63, 2.42, 43 (more play, with pencil) |
Turning in HW out of class: NOT Campus
Mail! Into 151 box outside my door, into yellow folder if it's
there.
(Other papers--for me--under my door, please!)
Ch.
1,
Review.
Data: Numbers
(usually)
in context: What, Who (how
many),
Why? When and Where? How?
When? Class has changed already since this
compilation../StudatSp08.xls
Distribution of one
variable: what values, how many (or what proportion) of each.
Graphical summaries of data: Area
represents proportion.
Quantitative:
Shape
(symmetric, skewed (think smeared, or sliding) right or left.
((&&
bell-curve
(Ch 3), J-shaped (is really skewed (fig.1.15a p.31)) )),
(Humps: uni- or bi- modal (multi-) Two peaks
=
two "causes"?)
Outliers (giraffe with the zebras?)
Center, spread--rough --specific measures next Hand
around: "Living Histograms"
Lurking variable: One that affects your data but
perhaps you didn't think/know to measure!
(pulse rate--running, stairs, nervousness.
Height--sex) Missing data? (why?)
p. 10, #1.4,
Weekend birthrates? What's happening?
HW questions?
(nonstemplot)
Stemplots
(Stem-and-Leaf)
are a powerful hand tool. Tally, with
value
added.
!!Unordered first,!! then ordered if
necessary. By tens, then
split?
Truncate is faster! (corresponds to bin edges at "whole numbers")
Back
to back, comparing two groups. (or side-by-side on same
scale, cf. p55 fig. 2.5):
../StudatSp08v2.xls (scroll down)
Data source?
Lurking
variables?
HW questions?
(stemplot)
Heights-- several past
classes.
Variability happens.
Things settle down on average, BUT inferences are never certain.
Statistics will give us a
language
for talking about uncertainty.
Choosing a display (by hand):
A dot plot Day 2 is
most useful for n = 3 to about 15-20, or when the data only fall on a
few
values (just stack the dots up).
A stemplot is
good for continuous data, smeared around; you can do 100 values in 3-5
minutes.
Time plot. (pp. 17-19) Time
on horiz. axis, values on vertical. trend? (general
slope up or down). Cyclic?
--Beware of extrapolation
--predicting a time trend into the future.
-- Research data: time, or order of
taking measurements, is often a lurking variable. Always
do
a time plot.
..
Ch.
2: Summarizing distribution info with numbers
Measures of middle
(central tendency)
--Colloquially
"average" can refer to any measure of middle, so watch
out; be
more
specific.
Mean (most common
"average") "x-bar":
Take sum (aggregate) of all observations and divide by how many (n)
. Formula p. 38.
Metaphors.
1) Center of gravity, balance
point
of histogram.
2) Slice off bits from the big and add to
the little till everyone has the same.
(Or "aggregate"--total-- it all and portion it out evenly.)
Outlier
or long tail will pull mean in that direction (think seesaw
balancing)
"Sensitive" to outliers, skewness.
Especially
useful: 1) For symmetric, tidy distributions
2) When metaphor 2 makes sense--looking for "fair share" of a total.
(1,1,2,4 cookies eaten by 4 people, mean = 2. 1,1,2,4,
12 (n=5): mean =4.)
Median: half
are
bigger,
half are smaller
Point
on histogram with half the area to the left, half to the right.
Calculating:
Put observations in numerical order (stemplot!).
(For our hand calculations: Accept the small variation caused by
truncation or rounding in the stemplot. (Quick and dirty!))
Middle one if n is odd, or average the 2 middle if n
is
even.
Formula: Count in how far? (n+1)/2 places. (
14 items--> 7
1/2 places? go halfway =average the 7th and 8th observations)
Got somewhere about here Monday, Day 4
"Resistant
to skewness and outliers"--trimming off ends will make little
difference
in median value.
More
"typical" than mean, especially if there is skewness or outliers.
(Badly bimodal
distribution--"middle"
doesn't mean much.)
Symmetric
distribution:
mean
= median
Author's website http://bcs.whfreeman.com/bps4e,
or Applets on your CD. "Statistical Applets",
Mean &Median.
Check out symmetric, skewed, distributions with outliers.
Handout: Wages
in our region (the Southern Tier) Almost all
income, wage, wealth data is skewed right. Less so if category very
narrow (Sometimes higher pay goes only with a new job category in the
same place--e.g. Food prep worker, manager of same. ).
&& Other "averages"/ middles: (Many:
e.g. trimmed mean: throw away, say top and bottom 5%, take mean of
rest.)
Midrange: Point midway on the ruler scale
between smallest and largest: Min = 5, Max = 15, Midrange =
(5+15)/2= 10.
Highly sensitive, non-resistant, not too
useful, but quick!.
The Mode/modal class: (Mode: most "popular") Group
with
the most individuals; peak of the histogram "curve".
Measures of Spread (dispersion,
variability) distributions
with different spreads
Range: largest
- smallest. Resistant? NO! Two observations
carry
all the info; the rest could be anywhere.
Dot plots of 3 distributions, all with
same
range:
.
.
.
.
.
.
.
.
__________
We need measures of spread that will better take into account all
the observations:
..........
__________
Quartiles, five-number summaries, boxplot, InterQuartile Range.
..
..
. .. .
__________
(Variance), Standard
deviation.
Quartiles Divide
data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th
percentile.
(2nd quartile= median = 50th percentile)
3rd quartile Q3: 3/4
below, 1/4 above. = 75th percentile.
Computation of quartiles: Different texts, packages use different methods. (different last year!)Five-number summary: min, Q1, Median, Q3, max. (1, 4, 7, 9.5, 20 for the set of 8 above)
By hand: We'll use Tukey's quick and dirty: (he called them "hinges")
Take the two halves of the data you got from finding the median. Find the median of each half, using the same rule as before. (Detail. IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half. No problem. IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
Median at (8+1)/2= 9/2=4 1/2th ; 1 3 5 6 8 8 11 20, M = 7
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15. The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.
1 3 5 6, Q1=(3+5)/2= 4. 8 811 15. Q3= (8+11)/2= 9.5
1 3 | 5 6 | 8 8 | 11 201 3 5 6 6 8 8 11 20, are n=9 observations.
Median at (9+1)/2=10/2=5th ; 1 3 5 6 8 8 11 20, M = 6
Throw away the median. Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15. Continue as before. (This is a dirty method because it gives the same quartiles for both these data sets. Quick because computation is minimal and simple.)
1 3 | 5 66 8 8 | 11 20
"Plain vanilla--Moore" Draw and label the numerical scale first. Then mark the five numbers. Finish the picture.
The box spreads over the middle half (Q1 to Q3), the whiskers over the lowest and highest quarters (Min to Q1, Q3 to Max). Each section shows the spread of 1/4 of the data: the longer the section the thinner the data must be spread in there. Can "read" skewness.
Demonstration with set of 9. 1 3 | 5 66 8 8 | 11 20 5#summ: 1, 4, 6, 9.5, 20
Direction of boxplot? Vertical or horizontal is a matter of taste. I do horizontal, usually.|-----[ | ]--------------------|
"Showing outliers" p.45ff. Outliers can make a boxplot whisker extend deceptively beyond the bulk of the data.
0·········5·········10········15········20
Make the whiskers to the last item in the "main mass" of the data.
Put a dot or a star for each outlier, beyond the whisker end.
How do we decide what's an outlier? By hand; use your judgement.
(Rule of thumb: Knowing rule is optional--used by computers) Define "outlier" as a value farther out than 1.5 IQR from the Quartiles.
(Q1 - 1.5 IQR is lower "fence", Q3 + 1.5 IQR is upper "fence".)
For the set of 9, 1.5 IQR = 1.5×5.5. = 8.25. Fences are 4 - 8.25 = -4.25, and 8 + 8.25 = 16.25.
So 20 lies outside the fence, and the whiskers & box should go from 1 to 11 (largest inside the fence)
(Dot or *? Tukey: Dot ·between 1.5 and 3 IQR's out, * if more than 3 IQR's out. By hand, I don't care. Here a * because it shows better.)
|-----[ | ]--| *
0·········5·········10········15········20
This is the same as we would have done without the rule, probably.
Example: p.60, 2.34 Guinea pig survival: (redo for hw)
Use the One Variable Statistical Calculator Applet at http://bcs.whfreeman.com/bps4e
Compare boxplot with histogram: longer boxplot sections mean lower histogram height and vice versa.
| Sievers home | Math151-Sp08/Days3.htm | 3pm | 2/4/08 |