HW assignment Day3 (From Moore unless otherwise
noted.)
Needed forHW:
Stemplot,
rounding when there are more than 2 decimal places? Handout says
truncate
(round
down= trim), Moore text says round to nearest. Tukey, the
inventor,
said truncate; throw away the trailing digits; I agree! This is
supposed
to be fast--rounding to nearest slows it down. I
encourage
truncating but you can do it either way and be right. If you
truncate,
your stemplot may look a little different from the text answers. But it will look like a histogram whose bin edges are at the
whole numbers. (A
stemplot is hard for a computer to do, but some packages do. For them,
rounding to nearest is easiest. SPSS truncates, which is hard for
a computer.)
Do you need to put the leaves in order? NO, not if you
just want the shapes.
Outliers--if they're quite far out, just write the numbers at
the bottom (labeled High) or top (labeled Low) e.g. p.20, fig. 1.9, I
might write "High 44.2" and stop the stemplot at stem 35.
Reading: Finish Ch.1,
+ stemplot handout, "Check"
problems p. 24 1.14, 16 thru 22.
Timeplot, p.22-3. You will
need to be able to recognize cyles and trends, not
make
timeplot by hand. (We'll make them in SPSS later.)
Read Ch.2 thru p. 43, then thru p. 47.
Do "check" p. 56, 2.13, 14, 16 (mean/median) Ahead? 15,17,18
(5#summary/boxplot) Further: Finish Ch.
2.
Do the means and medians required here by hand (with a
calculator).
| Hand in Friday. p.14, 1.7 histogram bins: Use applet at http://www.whfreeman.com/bps4e, as in class. Help. p. 31, 1.35 CO2 stemplot. I would use whole tons as stems, tenths as leaves, see how it looks. Truncate, don't round, for speed. Don't bother to put leaves in order. p. 31, 1.34 doctors. Do a stemplot, not a histogram. Use hundreds as stems, and split them as on p. 21. p. 33,1.37 study time back to back, or do side by side on the same scale, like fig. 2.5, p. 55. (Good stems: maybe by 2's: 80-90, 100-110, 120-130, etc., so stems are 0*, 0t, 0f, 0s, 0., 1*, 1t, 1f, 1s, 1., 2*, 2t etc., and 140 goes on the 1f stem as a 4, 210 goes on the 2* stem as a 1, 30 goes on the 0t stem as a 3, 0 goes on the 0* stem as 0, or is "low". Splitting by 5's (p. 21) might be good enough. ) Notice the mental rounding of the responses, to quarter hours if not to ten minuteses. Makes "Granular" data. p. 35, 1.43 Orange prices timeplot Postpone Ch. 2 to Day 4 p.39, 2.1 Wood, mean Punch the 20 actual values into your calculator, adding and dividing by 20. A. Find the (approximate) median for the data on Wood breakage, using the numbers in the stemplot on p. 21 (Fig.1.10.) It's approximate because the stemplot data is rounded--Quick and Dirty is often sufficient! Keep a copy for #2.5, next assignment. p. 41, 2.4 Bonds Home runs Make a stemplot to put the numbers in order to find the medians. For the means, just punch them in. (You can shorten the work by finding the sum of the 18 years excluding the 73, writing that down, and then adding the 73 to get the total for the 19 years. Then divide the appropriate sums by 19 and 18.) p. 41, 2.3, p. 57, 2.23, 2.24 mean or median? Part (only) of Day 4 HW: p. 45, 2.5 Wood again. Go ahead and use the stemplot figures. Also make a boxplot. p. 58, 2.28 U. endowments. They mean, what do you have to count in to, in the list, to locate the mean and quartiles? p. 58, 2.29 fruit eating p. 58, 2.30 newborns. (I said I wouldn't make you make a histogram, but the data's already pre-binned, so do it here.) Also Describe the distribution--symmetric, skewed? |
"Read," to
discuss (be able
to answer in class) p.34,1.40 coins (skewed left)
Postpone Ch. 2 to Day 4 p. 58, 2.26 Resistance, with Applet |
Optional
Postpone Ch. 2 to Day 4 P. 59, 2.32 (mean/median play, with Applet) p. 63, 2.42, 43 (more play, with pencil) |
Turning in HW out of class: NOT Campus
Mail! Into 151 box outside my door, into yellow folder if it's
there.
Ch. 1,
Review.
Data: Numbers
(usually)
in context: What, Who (how
many),
Why? When and Where? How?
When? Class may be changed already since this
compilation../StudatFall07.xls
Distribution of one
variable: what values, how many (or what proportion) of each.
Graphical summaries of data: Area
represents proportion.
Quantitative:
Shape
(symmetric, skewed (think smeared, or sliding) right or left.
((&&
bell-curve
(Ch 3), J-shaped (is really skewed (fig.1.15a p.31)) )),
(Humps: uni- or bi- modal (multi-) Two peaks
=
two "causes"?)
Outliers (giraffe with the zebras?)
Center, spread--rough --specific measures next Hand
around: "Living Histograms"
Pretest:
Restate #5 as histogram of 100 "5-volt" batteries tested for actual
voltage.
The proportion with voltage <
1 is 20.
The proportion
with
voltage < 3 is 60 That includes <1. So
each rectangle represents 10.
a) What proportion have voltage beween 1 and 3? Count rectangles,
OR subtract the part below 1 from the part below 3: 60 -20 = 40.
40%
b) What
proportion
have voltage > 3? Count rectangles, OR note that this is the
whole 100 minus the part < 3: 100 -60 = 40. 40%
HW questions?
(nonstemplot)
Histogram can change
somewhat
depending on intervals you choose.
Moore Applet (
http://www.whfreeman.com/bps4e)
. or use disk in book) One Variable Statistical
Calculator, text pp. 11-13, Ta 1.1, % degreed (Drag
histogram bars R/L to change
"bins. No "Data Sets" tab? Try a different browser)
Stemplots
(Stem-and-Leaf)
are a powerful hand tool. Tally, with
value
added. Handout
!!Unordered first,!! then ordered if
necessary. By tens, then
split?
Truncate is faster!
Back
to back, comparing two groups. (or side-by-side on same
scale, cf. p55 fig. 2.5):
../StudatFall07.xls
Data source?
Lurking
variables?
(pulse: stair climb: last term. Missing data?)
Heights--two classes, Living histogram.
Variability happens.
Things settle down on average, BUT inferences are never certain.
Statistics will give us a
language
for talking about uncertainty.
Choosing a display (by hand):
A dot plot Day 2 is
most useful for n = 3 to about 15-20, or when the data only fall on a
few
values (just stack the dots up).
A stemplot is
good for continuous data, smeared around; you can do 100 values in 3-5
minutes.
Time plot. (pp. 17-19) Time
on horiz. axis, values on vertical. trend? (general
slope up or down). Cyclic?
--Beware of extrapolation
--predicting a time trend into the future.
-- Research data: time, or order of
taking measurements, is often a lurking variable. Always
do
a time plot.
Start here on
Friday:
Ch.
2: Summarizing distribution info with numbers
Measures of middle
(central tendency)
--Colloquially
"average" can refer to any measure of middle, so watch
out; be
more
specific.
Mean (most common
"average") "x-bar":
Take sum (aggregate) of all observations and divide by how many (n)
. Formula p. 38.
Metaphors.
1) Center of gravity, balance
point
of histogram.
2) Slice off bits from the big and add to
the little till everyone has the same.
(Or "aggregate"--total-- it all and portion it out evenly.)
Outlier
or long tail will pull mean in that direction (think seesaw
balancing)
"Sensitive" to outliers, skewness.
Especially
useful: 1) For symmetric, tidy distributions
2) When metaphor 2 makes sense--looking for "fair share" of a total.
(1,1,2,4 cookies eaten by 4 people, mean = 2. 1,1,2,
12: mean =4.)
Median: half
are
bigger,
half are smaller
Point
on histogram with half the area to the left, half to the right.
Calculating:
Put observations in numerical order (stemplot!).
(For our hand calculations: Accept the small variation caused by
truncation or rounding in the stemplot. (Quick and dirty!))
Middle one if n is odd, or average the 2 middle if n
is
even.
Formula: Count in how far? (n+1)/2 places. (
14 items--> 7
1/2 places? go halfway =average the 7th and 8th observations)
"Resistant
to skewness and outliers"--trimming off ends will make little
difference
in median value.
More
"typical" than mean, especially if there is skewness or outliers.
(Badly bimodal
distribution--"middle"
doesn't mean much.)
Symmetric
distribution:
mean
= median
Author's website http://bcs.whfreeman.com/bps4e,
or Applets on your CD. "Statistical Applets",
Mean &Median.
Check out symmetric, skewed, distributions with outliers.
&& Other "averages"/ middles: (Many:
e.g. trimmed mean: throw away, say top and bottom 5%, take mean of
rest.)
Midrange: Point midway on the ruler scale
between smallest and largest: Min = 5, Max = 15, Midrange =
(5+15)/2= 10.
Highly sensitive, non-resistant, not too
useful, but quick!.
The Mode/modal class: (Mode: most "popular") Group
with
the most individuals; peak of the histogram "curve".
Measures of Spread (dispersion,
variability) distributions
with different spreads
Range: largest
- smallest. Resistant? NO! Two observations
carry
all the info; the rest could be anywhere.
Dot plots of 3 distributions, all with
same
range:
.
.
.
.
.
.
.
.
__________
We need measures of spread that will better take into account all
the observations:
..........
__________
Quartiles, five-number summaries, boxplot, InterQuartile Range.
..
..
. .. .
__________
(Variance), Standard
deviation.
Quartiles Divide
data into quarters: 1st quartile Q1: 1/4 below, 3/4 above. = 25th
percentile.
(2nd quartile= median = 50th percentile)
3rd quartile Q3: 3/4
below, 1/4 above. = 75th percentile.
Computation of quartiles: Different texts, packages use different methods. (different last year!)Five-number summary: min, Q1, Median, Q3, max. (1, 4, 7, 9.5, 20 for the set of 8 above)
By hand: We'll use Tukey's quick and dirty: (he called them "hinges")
Take the two halves of the data you got from finding the median. Find the median of each half, using the same rule as before. (Detail. IF you had an even number of observations to start with, the data divides evenly into an upper and a lower half. No problem. IF you had an odd number to start with, you have one in the middle, the median. In this case only, you throw the median away, and use the remaining halves.)
1 3 5 6 8 8 11 20, are n=8 observations.
Median at (8+1)/2= 9/2=4 1/2th ; 1 3 5 6 8 8 11 20, M = 7
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15. The quartiles are the medians of each half; count in (4+1)/2= 2 1/2.
1 3 5 6, Q1=(3+5)/2= 4. 8 811 15. Q3= (8+11)/2= 9.5
1 3 | 5 6 | 8 8 | 11 201 3 5 6 6 8 8 11 20, are n=9 observations.
Median at (9+1)/2=10/2=5th ; 1 3 5 6 8 8 11 20, M = 6
Throw away the median. Now we have an even number again, 8 numbers
8/2 = 4 in each half: Halves are 1 3 5 6, and 8 8 11 15. Continue as before. (This is a dirty method because it gives the same quartiles for both these data sets. Quick because computation is minimal and simple.)
1 3 | 5 66 8 8 | 11 20
INTERQUARTILE RANGE = IQR= Q3 - Q1. (9.5 - 4 = 5.5 for both sets above)
=The range of the middle half of the observations. Resistant to outliers!
| Sievers home | Math151-Fall07/Dayf3.htm | 9pm | 8/30/07 |