Math 151 , Day 40, Monday, May 5, 2008 .after class   Hit reload .

HW Day40:  (Re)read Ch. 16, especially Multiple tests, pp 395-6
Read Ch. 17, p. 414 and p. 417 I

Reading Ch. 18:  We'll repeat the CI and test work, only with s instead of sigma, and t instead of z. First to p. 441, Next, the rest. Read it all!.  Check p. 451 18.15, 15, 17, 18, 19, 20, 21, 22 first, Next 23, 24.  

Ahead, Review Ch.9, p. 219 and around (Completely randomized experiment, especially with 2 treatments only), and p. 224 (Matched pairs experimental design) .
Read Ch. 19, pp. 460-61 only!  (Comparing 3 or more independent groups requires Analysis of Variance, Ch. 25)
This plus Reading SPSS output will be the last work of the term.

Hand in Wednesday . 
More Cautions
p. 397, 16.10 searching for ESP
p. 408 16.40 success of trainees
p. 408 16.41 schizophrenia markers

Review concepts
p. 423, 17.35 brains
p. 424, 17.36 support groups
p. 424, 17.37 CA brush fires, r2

Review meaning of P, significance:
p. 384, 15.48 Cicadas
p. 385, 15.52 P?
p. 385, 15.53 sig. def.?

: Yes. Some errors were corrected from before class: t- procedures
p. 434, 18.1 and 2,  s<-->standard error
p. 436, 18.3 Critical values:  Use Table C. Check by plugging in your t to the Excel t-procedures sheet; be sure your answers are consistent.
p. 436, 18. 4 Critical values:  Use Table C .  For b, make a careful sketch to see what to do.  Note the decimal place is different in (a) and (b)
p. 437 18.5 Critical values for CI. (sample size to d.f.)
p. 437 18.7 Ancient air  CI Make a dotplot or stemplot to examine the data.  It will look somewhat skewed, but with so little data this kind of scatteredness can happen easily from a normal distribution.  We should report that the skewness may make our CI only approximately accurate.  Xbar = 59.5889% and s = 6.2553% are what you would get if you calculated from the data; use these to make your CI.  Optional:  Check with Excel t-procedures
p. 453 18.29 absenteeism CI
p. 455 18.36 a. Calcium and blood pressure CI Sample size is 27. Check with Excel t-procedures) (b will be assigned Wed.)

p. 441 18.8 and 9  is it significant? Also, use Excel t-procedures to find the P-values more exactly.
p. 452 18.25 "read carefully".  (one or more t-values was incorrectly computed. Fix.)
p. 441 18.10 Ancient air test   See note to 18.7,  for mean and s.d.  Optional: check with Excel t-procedures

Read, 
to discuss

Optional 
(review)
p.424,  17.6 support groups

+ + ++ + + + + + + +
Final exam: Tues. May 13, 7-10 pm (evening!) 
   Alternatives-- Tuesday afternoon, starting any time after 1:00; finishing by 5:30.  Wednesday morning, 9-12.
        Choose a time (clipboard) so I know when you're coming.
Difficulties? Get in touch with me ASAP!
  Full exam schedule is at   http://www.wells.edu/pdfs/finals.pdf
     Registrar's page with link to this and other good stuff: http://www.wells.edu/academic/regist.htm  Signup on attendance clipboard.  Further difficulties?  Get in touch with me ASAP!

Review exercise (Open book, help from anyone.  Optional.) will count for 50% of Final exam score if you do it.  Due beginning of exam.  Available WED.
Late HW: accepted up to beginning of exam.

+ + + + + + + + + + +

Reviewof significance testing, in brief: 

Before taking data, define
H0: "Null hypothesis" A claim about the population we would like to show is NOT true.  
   A parameter = a particular value.  H0: µ =1000 hrs.  ("Average lightbulb life".)
Ha: "Alternative hypothesis" A claim or statement about the population we are trying to find evidence FOR.
   The parameter <, or > the particular value (one-sided/tailed)  Or NOT= the particular value (two tailed).

Take data
.  Calculate test statistic. For µ, test statistic is the z-score of xbar. (Start with xbar, standardize using mean of H0)
    Is it an unlikely result if  H0 is true?  Then that is evidence against H0.
Evidence: how strong?  P-value of a test:  The probability, computed assuming that H0 is true, that the observed outcome would take a value as extreme or more extreme than that actually observed (if we could repeat taking-data again).  p. 368. 
Small P = strong evidence.
   One-tail alternative:  P = the tail beyond the observed value, in the direction of Ha
  
Two-tail alternative:  P = the sum of both tails, farther out than the observed value in either direction.
Results are significant at level alpha  if P < alpha, not otherwise.  Significance levels are usually "benchmarks."   What's "statistically significant" can vary by field.  (.05 is usually good.)

Cautions: see Day 39 for details.
   16.9, p. 395:  "Statistically insignificant"--the differences could easily be due just to chance variability.  Why is it important to know differences "small"?  Because a large difference could be "real" but "statistically insignificant" just because the sample size was too small to confirm it.
Other  
Homework questions?  Day 39

New:  Multiple Tests: beware! pp. 395-6
If you do 100 tests and use the alpha = .05 significance level for each, then the structure of testing requires this:
    When all 100 null hypotheses H0 are true, out of your 100, about 5 of the 100 (.05) will give "significant" results by chance alone (falsely indicating the alternative hypothesis is to be preferred.)  Details Day 39

Look at shoeboxes, and simulations so far..Real shoeboxes last term,   Your shoeboxes 1/15 falsely significant at alpha = .10
For the real shoeboxes,  the white numbers (where the mean is really 20) rejected H0 : µ = 20  (incorrectly) in favor of Ha : µ > 20
          at the alpha = .10 level in 3 of 18  samples (16.7%)(Fall '06)
                            1of 16 samples(6.3%) (Sp. '07): 4 of 34 (11.8%) combined
                            1 of 13 samples(7.7%) (Fall '07, &251):5 of 47(10.6%) combined
                             1 of 15 graphed
samples(6.7%) (Sp. '08): 6 of 62(9.7%) combined
     the yellow numbers (where the mean is really bigger than 20--24 I think)   rejected H0 : µ = 20 (correctly!)
          at the alpha = .10 level in 16 of 19  samples (84.2%) (Fall '06)
                             13 of 17 samples(76.5%) (Sp. '07): 29 of 36 (80.6%) combined
                              11 of 13 samples(84.6%) (
Fall '07, 251): 40 of 49 (81.6%) combined
                              10 of 17 graphed (58.8%) (Sp '08): 50 of 66 (75.8%) combined

<>    Simulation: with mean = 20, reject H0 : µ = 20 (incorrectly) at alpha = .10 in
                           31/250 or 12.4% of samples (close to .10) (Fall '06)
                          19 of 100 samples(19%) (Sp. '07): 50 of 350 (14.3%) combined
                         
13 of 100 samples(13%) (Fall '07): 63 of 450 (14%) combined

                         12 of 90 samples(13%) (Sp '08): 75 of 540 (14%) combined
                     with mean = 24, reject H0 : µ = 20 (correctly!) at alpha = .10 in
                           181/250 or .724 of samples  (Fall '06)
                           79 of 100 samples(79%)  (Sp. '07): 260 of 350 (74.3%) combined
                          
84 of 100 samples(84%) (Fall '07): 344 of 450 (76%) combined

                         
    75 of 90 samples(83%) (Sp '08): 419 of 540 (78%) combined

<>        Shoebox simulation, my  set of 25 each. 
IF you use a particular alpha as a "cutoff" between "reject H0 " and "failing to reject H0"--we can talk about probability of  rejecting H0 when it's true--and alpha is that probability
And we can talk about the "power" of the test to "detect" an alternative of (say) 24:  (probability of rejecting H0 correctly)
   Optional:   Applet: "Power" (of a test to detect a difference).   For  the shoebox situation, the power is .761.

Assigned new:  16.10, 40, 41:  like the above. 

..
Ch. 18:  Inference for population mean (realistic)
The most unrealistic of our "simple conditions" for inference (p. 344) was that we knew the population standard deviation sigma.  We remove that condition here.
If we substitute s, the sample standard deviation, for sigma, the population standard deviation, in our Normal distribution formulas:
    If n is quite big, the value of the sample standard deviation  will be close to the same as the value from the population, and our work is approximately right.
    But if n is smaller, estimating sigma by s will add in extra variability!   Problem solved by modifying the Z-distribution!
Standard error of the (sample) mean =    Standard deviation of xbar, estimated from the data.
  "Standard error of the mean":  s/sqrt(n) SEM, SEXbar, etc.
       When you estimate the standard deviation of a statistic, the resulting estimate is called the "standard error" of the statistic.

t-distribution family:  like standard normal only slightly fatter in the tails, slightly more spread.  Mean = 0. Symmetrical around 0.
          t(k) is the t distribution with k degrees of freedom.
 Comparison with normal (Excel graph)
Lower d.f.--fatter tails.  Higher d.f.--more like standard normal.
    Table C: "critical" t-value in the body, probabilities at top and bottom.  Set up for P-->t.
       Example.  t(20) = 2.086  corresponds to 
                   Confidence level 95% = "middle" probability between -2.086 and +2.086
        one-sided P = .025,  probability in the one tail above +2.086 = probability in the one tail below -2.086
        Two-sided P = .05,  probability in the two tails beyond -2.086 and +2.086.
              (For z distribution, the corresponding z* is 1.96; notice t is further out.) 

     Excel t-procedures sheet will find P's from t's.

Standardizing xbar with s instead of sigma results in
   the one-sample t statistic, t-distribution with n-1degrees of freedom.

Conditions for inference about a mean: 
(p. 434)
    ++ SRS
(or reasonable facsimile)
    ++ Population is Normal. 
(Can relax to symmetric, single-peaked unless n "very small")

"One-sample" t- procedures: SRS of size n.  Use Xbar to estimate µ.
Confidence intervals:     Choose t* from table C,  n-1 d.f., level C.

Significance tests:  State hypotheses as in Ch. 15, find t from data, by:
 Calculating the one-sample t-statistic, using the null hypothesis value of µ (call it µ0)
Then proceed as if it were a "z", only using the (n-1) d.f. row in  table C,
to find P-values for the t*'s it's between, write "P-value is between ___ and___".   (or Excel t-procedures)

Example: bacteria per milliliter in 10 specimens of  raw milk from one producer.
  Parameter: actual mean bacteria/ml.
       5370, 4890, 5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
4|5 
4|77
4|889 
5|11 
5|23 
 n = 10,   xbar = 4950,
s = 268.45   SEM = 268.45/sqrt(10) =268.45/3.162=84.89.  deg. of freedom = 9
90% CI:  from t(9) in table, t* = 1.833   CI is 4950+1.833x268.45/sqrt(10)
                                                       4950  +1.833x84.89, or  4950+155.6 bacteria/ml.
If we had KNOWN Population sigma = 268.45, 
  we'd have used z* = 1.645, gotten a narrower CI.   (but we don't know sigma!)

Test:  H0 : µ = 4800                          t = (4950 - 4800)/SEM = 150/84.89 = 1.767
          Ha : µ > 4800                           t is between 1.383 and 1.833   (d.f. = 9)
             (too contaminated)                Table C: One-sided P is between .10 and .05.  Some evidence for Ha
(If the test had been 2-sided, P would be between .20 and .10)
Excel t-procedures:  P-value = .05552

optional:  SPSS for "raw data"

Next time:
MATCHED PAIRS t procedures-- "Paired samples"(SPSS), "Paired comparisons"
   before--after, left hand--right hand, Drug A vs. Drug B on the same person or on a matched pair.
For each pair, find the difference in the observed values.  Then treat these differences as if they are "the" data set, from a normal population, and do One-sample t procedures.
Usually (always?) the null hypothesis will be " µ = 0", there is "no difference" between the treatments.

Example:  wax paper sandwich bags:  Is the wax layer the same inside and out?
25 bags:  measure (wax outside - wax inside) for each.  (pounds per square foot).
Differences:  
xbar = .093,  s = .723   n = 25    SEM = .723/5 = .1446
H0 :
µ = 0 (mean difference is 0)                  t = (.093 - 0)/SEM = .093/.1446 = .643.
Ha : µ Not = 0 (there is a difference)            t is less than .685 (d.f. = 24)
                                                                          which is right-tail t* for probability .25
       Because test is 2-sided, double the tail: .50.  P value is greater than .50.
                                           No evidence for difference.
       Excel t-procedures:   for t = .643, d.f.24, two-sided P = .526 
- - - - - - - - - - - - - - - - - - - - -
ROBUST procedures:  a confidence interval or significance test is called robust if the confidence level or P-value doesn't change very much when the assumptions of the procedure are violated.  pp. 447-450.   Assumption:  Population is Normal.
t-procedures are quite robust against nonnormality. But sensitive to outliers, bad skewness. Look at data.  Need SRS!!
 Details:  n <15   t ok if data roughly symmetric, single peak, no outliers.  Don't use if skewed or outliers.  (How out is an outlier?)
              n > 15  t ok unless there is strong skewness, or outliers.
              n > 40 or so:  t ok even if there is skewness.  (Outliers?  I suggest trying with and without them, see what changes).    

Matched-pairs data (differences) are often more normal in shape than the separate variables ("oddness" is often the same for both items in a pair, and disappears in subtraction.  Another reason why this is a nice experimental design. )


Sievers home  Math151-Sp08/Days40.htm  3:30pm 5/5/08
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.