Math 151 , Day 40, Friday, December 1, 2006 After class  Corrected 11 am Dec. 4 Hit reload .

HW Day40:
Reading Ch. 18:  We'll repeat the CI and test work, only with s instead of sigma, and t instead of z. First to p. 441, Next, the rest Read it all for Monday!.  Check p. 451 18.15, 15, 17, 18, 19, 20, 21, 22 first, then 23, 24.   Chapter 18, and possibly a little look into 19,  will be the last work of the course.
Hand in  Monday . 
p. 434, 18.1 and 2,  s<-->standard error
p. 436, 18.3 Critical values:  Use Table C and also the Excel t-procedures sheet; be sure your answers are consistent.
p. 436, 18. 4 Critical values:  Use Table C .  For b, make a sketch.  Note the decimal place is different in (a) and (b)
p. 437 18.5 Critical values for CI.
p. 437 18.7 Ancient air  CI Make a dotplot or stemplot to examine the data.  It will look somewhat skewed, but with so little data this kind of scatteredness can happen easily from a normal distribution.  We should report that the skewness may make our CI only approximately accurate.  Xbar = 59.5889% and s = 6.2553% are what you would get if you calculated from the data; use these to make your CI.  Optional:  Check with Excel t-procedures
p. 453 18.29 absenteeism CI
p. 455 18.36 a. Calcium and blood pressure CI (to use Table C where the degrees of freedom aren't given, go to the row with the lower degrees of freedom, here 50.  You're giving up a little bit of sharpness rather than overstate your case.) Optional: To see how much difference the "correct" t* would give, use Excel t-procedures)

p. 441 18.8 and 9  is it significant? Also, use Excel t-procedures to find the P-values more exactly.
p. 432 18.25 read carefully.  (one or more t-values was incorrectly computed.)
p. 432 18.10 Ancient air test   See note to 18.7,  for mean and s.d.  Optional: check with Excel t-procedures

The rest will be assigned Monday.  That's all .

Using SPSS for one-sample procedures (with front page of Handout) )
A.  Redo the example on the handout, getting the result of example 18.3 Dataset as SPSS file  Dataset as text (.dat) file  (If you import from the text file, remember to check that the Measure is Scale)
p. 453, 18.27  Sharks (Use SPSS)  Use SPSS to find the confidence interval, also to do the test, for the practice. 
p. 457, 18.41 Auto crankshafts (Use SPSS)
= = = = = = = = = = = = = = = = = = = = = =
Matched pairs, and robustness (by hand unless it says SPSS)
p. 455, 18.37 measuring placebo effect  Use Table C.  You can check with the Excel t-procedures
p. 446, 18.11and 12 newts healing.  Find the differences by hand, and make a stemplot by hand.  Use SPSS(back page of Handout) to do the test and CI.
p. 448, 18.13 newts with outlier Use SPSS to do the tests.  To eliminate the outlier, you can just delete that row from the data set.
p. 450, 18.14 Reading scores Use Table C. Also, what IS the standard deviation? You can check with the Excel t-procedures .  Also , you may find that the mean is (statistically) significantly below the basic level.  Is the difference large enough to be important?  (I don't know...)
p. 455, 18.36b calcium/blood pressure conditions
p. 454, 18.34 growing trees faster.  Use SPSS.
Read, 
to discuss

Optional 
& & & Leftover problems from Day 30 & & & & & & & &
          These ideas are related to those in Ch. 15.  You can get the answers visually by using the Statistical Significance Applet
p. 290, 11.39 Pollutants in auto exhausts  For 11.39:  You might want to know L so that if you tested your 25 cars and found a high value of x-bar, you would be able to compare it with L; if it was greater than L, you would go back to the manufacturer and say "I  believe you sold me a batch of bad cars, because the chances of getting an average emission level this high if the exhaust system is working properly is only 1 in 100. It is more reasonable to believe the exhaust system is not working, than that we "are" that 1 in 100 possibility."
  p. 290,  11.38 Glucose testing  If we use this cutoff level L to say that people (with a mean of 4 tests) over L "have diabetes", then the chances of declaring that someone "has diabetes" when they really are OK (with mean 125mg/dl) is .05.  .05 or 5% is the chance of a "false positive" using this protocol, when the real mean is 125.
& & & & & & & & & & & & & & & & & &
 
Final Exam* Tuesday evening, Dec. 12 7-10. Alternate exam time?  Tues. afternoon 2-5, Th. morning 10-1. 
     Sign in sheet
Monday! -your Choice.

Your simulation of shoebox results:  (25 each, for mean of 20, mean of 24) 
To the circulating pad: Add your total # where P > .10, (your # of simulations: should be 25), and proportion (total # with P>.10)/25.

Look at shoeboxes, and simulations.
For the shoeboxes,  the white numbers (where the mean is really 20) rejected H0 : µ = 20  (incorrectly) in favor of Ha : µ > 20
                   at the alpha = .10 level in 3 of 18  samples (16.7%) 
     the yellow numbers (where the mean is really bigger than 20--24 I think)   rejected H0 : µ = 20 (correctly!)
                    at the alpha = .10 level in 16 of 19  samples (84.2%)
    Simulation: 
so far, with mean = 20, reject H0 : µ = 20 (incorrectly) at alpha = .10 in 31/250 or .124 of samples (close to .10)
      
with mean = 24, reject H0 : µ = 20 (correctly!) at alpha = .10 in 181/250 or .724 of samples

IF you use a particular alpha as a "cutoff" between "reject H0 " and "failing to reject H0"--we can talk about probability of  rejecting H0 when it's true--and alpha is that probability!

Homework questions?  Day 39
    16.9, p. 395: 
"Statistically insignificant"--the differences could easily be due just to chance variability.  Why is it important to know differences "small"?  Because a large difference could be "real" but "statistically insignificant" just because the sample size was too small to confirm it.

See Day 39 for notes on old problems, Ch. 18.

Ch. 18:  Inference for population mean (realistic)
What is the significance to Statistics of the Guinness Stout Bottle ?
Standard error of the (sample) mean =    Standard deviation of xbar, estimated from the data.
  "Standard error of the mean":  s/sqrt(n) SEM, SEXbar, etc.
       When you estimate the standard deviation of a statistic, the resulting estimate is called the "standard error" of the statistic.

t-distribution family:  like standard normal only slightly fatter in the tails, slightly more spread.  Mean = 0. Symmetrical around 0.
          t(k) is the t distribution with k degrees of freedom.
 Comparison with normal (Excel file)
     Excel t-procedures sheet will find P's from t's.

Standardizing xbar with s instead of sigma results in
   the one-sample t statistic, t-distribution with n-1degrees of freedom.

Conditions for inference about a mean: 
(p. 434)
    ++ SRS
(or reasonable facsimile)
    ++ Population is Normal. 
(Can relax to symmetric, single-peaked unless n "very small")

"One-sample" t- procedures: SRS of size n.  Use Xbar to estimate µ.
Confidence intervals:     Choose t* from table C,  n-1 d.f., level C.

Significance tests:  State hypotheses as in Ch. 15, find t from data, by:
 Calculating the one-sample t-statistic, using the null hypothesis value of µ (call it µ0)
Then proceed as if it were a "z", only using the (n-1) d.f. row in  table C,
to find P-values for the t*'s it's between, write "P-value is between ___ and___".

Example: bacteria per milliliter in 10 specimens of  raw milk from one producer.
  Parameter: actual mean bacteria/ml.
       5370, 4890, 5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
4|5 
4|77
4|889 
5|11 
5|23 
 n = 10,   xbar = 4950,
s = 268.45   SEM = 268.45/sqrt(10) =268.45/3.162=84.89.  deg. of freedom = 9
90% CI:  from t(9) in table, t* = 1.833   CI is 4950+1.833x268.45/sqrt(10)
                                                       4950 +1.833x84.89, or  4950+155.6 bacteria/ml.
If we had KNOWN Population sigma = 268.45, 
  we'd have used z* = 1.645, gotten a narrower CI.   (but we don't know sigma!)

Test:  H0 : µ = 4800                          t = (4950 - 4800)/SEM = 150/84.89 = 1.767
          Ha : µ > 4800                           t is between 1.383 and 1.833   (d.f. = 9)
             (too contaminated)                Table C: One-sided P is between .10 and .05.  Some evidence for Ha
(If the test had been 2-sided, P would be between .20 and .10)
Excel t-procedures:  P-value = .05552

Monday:  SPSS for "raw data"--Get
Handout for SPSS Ch. 18
 (Milk bacteria:  Dataset)
    Analyze>Compare Means>One-Sample T Test:  Test value = 4800.
           P-value is labeled "Sig (2-tailed)"--divide by 2 for 1 tail (if observed is in correct direction)
   Analyze>Descriptive Statistics>Explore.  Statistics button, set Confidence level.

MATCHED PAIRS t procedures-- "Paired samples"(SPSS), "Paired comparisons"
   before--after, left hand--right hand, Drug A vs. Drug B on the same person or on a matched pair.
For each pair, find the difference in the observed values.  Then treat these differences as if they are "the" data set, from a normal population, and do One-sample t procedures.
Usually (always?) the null hypothesis will be " µ = 0", there is "no difference" between the treatments.

Example:  wax paper sandwich bags:  Is the wax layer the same inside and out?
25 bags:  measure (wax outside - wax inside) for each.  (pounds per square foot).
Differences:  
xbar = .093,  s = .723   n = 25    SEM = .723/5 = .1446
H0 :
µ = 0 (mean difference is 0)                  t = (.093 - 0)/SEM = .093/.1446 = .643.
Ha : µ Not = 0 (there is a difference)            t is less than .685 (d.f. = 24)
                                                                          which is right-tail t* for probability .25
       Because test is 2-sided, double the tail: .50.  P value is greater than .50.
                                           No evidence for difference.
- - - - - - - - - - - - - - - - - - - - -
ROBUST procedures:  a confidence interval or significance test is called robust if the confidence level or P-value doesn't change very much when the assumptions of the procedure are violated.  pp. 447-450.   Assumption:  Population is Normal.
t-procedures are quite robust against nonnormality. But sensitive to outliers, bad skewness. Look at data.  Need SRS!!
 Details:  n <15   t ok if data roughly symmetric, single peak, no outliers.  Don't use if skewed or outliers.  (How out is an outlier?)
              n > 15  t ok unless there is strong skewness, or outliers.
              n > 40 or so:  t ok even if there is skewness.  (Outliers?  I suggest trying with and without them, see what changes).    

Matched-pairs data (differences) are often more normal in shape than the separate variables ("oddness" is often the same for both items in a pair, and disappears in subtraction.  Another reason why this is a nice experimental design. )


Sievers home  Math151-Fall06/Daym40.htm  11am 12/4/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
Final Exam* Tuesday evening, Dec. 12. Alternate exam time?  Tues. afternoon 2-5, Th. morning 10-1. 
     Sign in sheet Monday! -your Choice!.

DAY AND DATE OF EXAM
EXAM TIME
Monday
December 11 
Tuesday
December 12
Wednesday
December 13
Thursday
December 14
9 am - noon
MWF 10:30 am H 1:45 pm TH 11:05 am T 1:45 pm

 

2-5 pm
TH 9:40 am MWF 8:10 am
TH 8:15 am
MWF 8:30 am
W 8:30 am
F 8:30 am
F 1:30 pm
W 1:30 pm
M 1:30 pm
 
 7-10 pm
MWF11:30 am
MWF 9:30 am
TWH 7:00 pm
M 7:00 pm
W 7:00 pm
T 7:00 pm
Courses for which meeting times are not listed above; make-up exams
L