Math 151 , Day 39, Wednesday, November 29, 2006  After class  Hit reload .

HW Day39:
(Re)read Ch. 16, especially Multiple tests, pp 395-6
Read Ch. 17, p. 414 and p. 417 I
Reading Ch. 18:  We'll repeat the CI and test work, only with s instead of sigma, and t instead of z. First to p. 441, Next, the rest Read it all for Friday!.  Check p. 451 18.15, 15, 17, 18, 19, 20, 21, 22 first.
Hand in  Friday . 
p. 397, 16.10 searching for ESP
p. 408 16.40 success of trainees
p. 408 16.41 schizophrenia markers

p. 423, 17.35 brains
p. 424, 17.37 support groups

Review meaning of P, significance:
p. 384, 15.48 Cicadas
p. 385, 15.52 P?
p. 385, 15.53 sig. def.?

Postpone the rest:  Feel free to try them, keep your paper.
p. 434, 18.1 and 2,  s<-->standard error
p. 436, 18.3 Critical values:  Use Table C and also the Excel t-procedures sheet; be sure your answers are consistent.
p. 436, 18. 4 Critical values:  Use Table C .  For b, make a sketch.  Note the decimal place is different in (a) and (b)
p. 437 18.5 Critical values for CI.
p. 437 18.7 Ancient air  CI Make a dotplot or stemplot to examine the data.  It will look somewhat skewed, but with so little data this kind of scatteredness can happen easily from a normal distribution.  We should report that the skewness may make our CI only approximately accurate.  Xbar = 59.5889% and s = 6.2553% are what you would get if you calculated from the data; use these to make your CI.  Optional:  Check with Excel t-procedures
p. 453 18.29 absenteeism CI
p. 455 18.36 a. Calcium and blood pressure CI (to use Table C where the degrees of freedom aren't given, go to the row with the lower degrees of freedom, here 50.  You're giving up a little bit of sharpness rather than overstate your case. Optional: To see how much difference the "correct" t* would give, use Excel t-procedures )

p. 441 18.8 and 9  is it significant? Also, use Excel t-procedures to find the P-values more exactly.
p. 432 18.25 read carefully.  (one or more t-values was incorrectly computed.)
p. 432 18.10 Ancient air test   See note to 18.7,  for mean and s.d.  Optional: check with Excel t-procedures

 
& & & Leftover problems from Day 30 & & & & & & & &
          These ideas are related to those in Ch. 15.  You can get the answers visually by using the Statistical Significance Applet
p. 290, 11.39 Pollutants in auto exhausts  For 11.39:  You might want to know L so that if you tested your 25 cars and found a high value of x-bar, you would be able to compare it with L; if it was greater than L, you would go back to the manufacturer and say "I  believe you sold me a batch of bad cars, because the chances of getting an average emission level this high if the exhaust system is working properly is only 1 in 100. It is more reasonable to believe the exhaust system is not working, than that we "are" that 1 in 100 possibility."
  p. 290,  11.38 Glucose testing  If we use this cutoff level L to say that people (with a mean of 4 tests) over L "have diabetes", then the chances of declaring that someone "has diabetes" when they really are OK (with mean 125mg/dl) is .05.  .05 or 5% is the chance of a "false positive" using this protocol, when the real mean is 125.
& & & & & & & & & & & & & & & & & &

Read, 
to discuss

Optional 
(more practice) 
 Review meaning of P, significance:
p. 384, 15.47 Rich?
p. 385, 15.51 5%vs.1%?

Your simulation of shoebox results:  (25 each, for mean of 20, mean of 24) 
To the circulating pad: Add your total # where P > .10, (your # of simulations: should be 25), and proportion (total # with P>.10)/25.

Exams returned, to those absent Monday.  Comments
Buffer against one low hour exam:
The final % exam grade minus 10 points will be substituted for the lowest hour exam grade, if it is higher.

Examples:
Ex1 Ex2 Ex3
Ex4 final % final -10
Student 1 Original 85 80 85
60 85 75, replaces lower 60
Treated 85 80 85
75 85 <--ß These will be used.
Student 2 Original 85 80 80
70 75 65, lower than 70, don't replace.
Treated 85 80 80
70 75
Student 3 Original 85 50 75
55 85 75, replaces lower 50
Treated 85 75 75
55 85 <--ßThese will be used

This is to encourage  all to try to put it together for the (cumulative!) final.

Ch. 15: "Significance tests use an elaborate vocabulary, but the basic idea is simple: an outcome that would "rarely" happen if a claim were true--is good evidence that the claim is NOT true." (p.363 top)
HW questions?  Day 38

The game:
Before taking data, define
H0: "Null hypothesis"-- looking for evidence against this.
Ha: "Alternative hypothesis"  --looking for evidence for this.  
 Take data.  Calculate test statistic. For µ, test statistic is the z-score of xbar. (Start with xbar, standardize using mean of H0)
    Is it an unlikely result if  H0 is true?  Then that is evidence against H0.

Measuring the strength of the evidence against H0 (a common measuring stick for all distributions and parameters):
P-value of a test:  The probability, computed assuming that H0 is true, that the observed outcome would take a value as extreme or more extreme than that actually observed (if we could repeat taking-data again).  p. 368.
    The smaller the P-value, the stronger the data's evidence against H0 ( for Ha).
For a test of µ  , using xbar (sigma known), the P-value is
--the area of the tail beyond the observed xbar, in the direction of Ha (one-sided)
(--or twice that area (two-sided).) Applet:  P-value of a test of significance automates this.             
A "Significance level" alpha is a probability level we decide on  in advance as being the "rarely" amount that will push us over into believing (well, sort of) that the H0 claim  is not true.  Simple benchmark numbers for it, like .10 (1 in 10), .05 (1 in 20), .01 (1 in 100).
When the P-value is less  than (or equal to) a particular significance level alpha (say .05), we say,
    "The results are significant at the alpha = .05 level," or "The results are significant (P< .05)" .  Giving actual P is better, if you can.

IF you use a particular alpha as a "cutoff" between "reject H0 " and "failing to reject H0"--we can talk about probability of  rejecting H0 when it's true--and alpha is that probability!

Look at shoeboxes, and simulations.
For the shoeboxes,  the white numbers (where the mean is really 20) rejected H0 : µ = 20  (incorrectly)
                   at the alpha = .10 level in 3 of 18  samples (16.7%) 
     the yellow numbers (where the mean is really bigger than 20--24 I think)   rejected H0 : µ = 20 (correctly!)
                    at the alpha = .10 level in 16 of 19  samples (84.2%)
    Simulation: 
so far, with mean = 20, reject H0 : µ = 20 (incorrectly) at alpha = .10 in 16/125 or .128 of samples (close to .10)
      
with mean = 24, reject H0 : µ = 20 (correctly!) at alpha = .10 in 91/125 or .728 of samples


>>Multiple Tests: beware! pp. 395-6
    If you do 100 tests and use the alpha = .05 significance level for each, then the structure of testing requires this:
    When all 100 null hypotheses H0 are true, out of your 100, about 5 of the 100 (.05) will give "significant" results by chance alone (falsely indicating the alternative hypothesis is to be preferred.)
    Moral: if you use the testing machinery as a screening instrument for many questions, a proportion will give falsely significant results.  You can't accept the results from such multiple tests as good evidence, only as indicating questions requiring further, more specific study. The game gives you one shot, not a hundred shots.    (This is becoming an important issue for developing new statistical techniques, for instance in biology, where microarrays can do a thousand tests at once.)
(not in text)You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. Every data set will turn up with some unusual pattern if you examine it hard enough. 
       (If you must explore and confirm with the same data set, one way is to (randomly) take half the data set, explore and generate hypotheses; then use the other half for confirmatory tests.  You can use P-value to describe unusualness, but be wary of making decisions with it if you didn't expect that particular unusualness.)

>>All the warnings about designing experiments and surveys still apply.

& & & & & & & & &
Today Look back
at 11.38, p. 297.   "backward normal" problem.  From a proportion/probability, find a z*, from that a raw value (here an x-bar).  We can think of this as a significance testing question.  n = 4, sigma = 10 mg/dl.
     H0: µ =125mg/dl (Sheila is normal),   Ha: µ  > 125 (Sheila has gestational diabetes.) 
    Find the L 
so that only .05 of random samples of 4 tests would have mean above L, among people(Sheila) whose real mean is 125. 
      L is the "cutoff" for doing an alpha = .05 test. 
5% of "healthy" people will be diagnosed diabetic (false positive).
           Doctors like a "decision making rule", want an alpha cutoff to apply,  rather than calculating a P-value for each indivual's set of 4 tests..
Note that table C gives us another way to get z*'s for some probabilities!  Bottom row, "one sided P".  The table is set up to go from "tail" probability to z*, without having to calculate "probability to the left."  z* = 1.645. 
Unstandardize z*:  Remember that the standard deviation for xbars from samples of 4 will be sigma/sqrt(4) = 10/2 = 5.
    1.645 s.d.'s above the mean  is (Mean + 1.645× sd) = 125 + 1.645×5 = 125 + 8.225 = 133.225. 
   So L = 133.225, and if doctors use that as a cutoff: "Gestational diabetes if mean of 4 tests > 133.225"  they will call only 5% of healthy people sick. 
   (We haven't calculated what percent of sick people won't be "caught" by this test--we haven't defined "sick" with a number.)
You can check this visually, approximately, using   Statistical Significance Applet   L marks the "cutoff."
& & & & & & & & &

Ch. 18:  Inference for population mean (realistic)
The most unrealistic of our "simple conditions" for inference (p. 344) was that we knew the population standard deviation sigma.  We remove that condition here.
If we substitute s, the sample standard deviation, for sigma, the population standard deviation, in our Normal distribution formulas:
    If n is quite big, the value of the sample standard deviation  will be close to the same as the value from the population, and our work's approximately right.
    But if n is smaller, estimating sigma by s will add in extra variability!   Problem solved by modifying the Z-distribution!

Standard error of the (sample) mean =    Standard deviation of xbar, estimated from the data.
  "Standard error of the mean":  s/sqrt(n) SEM, SEXbar, etc.
        Just like sigma/sqrt(n), only s from data replaces sigma.
  When you estimate the standard deviation of a statistic,
                the resulting estimate is called the "standard error" of the statistic.

t-distribution family:  like standard normal only slightly fatter in the tails, slightly more spread.  Mean = 0. Symmetrical around 0.
    "Degrees of freedom" tell which member of the t family.
      t(k) is the t distribution with k degrees of freedom.
 Comparison with normal (Excel file)
    Lower d.f.--fatter tails.  Higher d.f.--more like standard normal.
    Table C: "critical" t-value in the body, probabilities at top and bottom.  Set up for P-->t.
       Example.  t(20) = 2.086  corresponds to 
                   Confidence level 95% = "middle" probability between -2.086 and +2.086
        one-sided P = .025,  probability in the one tail above +2.086 = probability in the one tail below -2.086
        Two-sided P = .05,  probability in the two tails beyond -2.086 and +2.086.
              (For z distribution, the corresponding z* is 1.96; notice t is further out.)  Excel t-procedures sheet will find P's from t's.

Standardizing xbar with s instead of sigma results in
   the one-sample t statistic
which has the t-distribution with n-1degrees of freedom.

We'll now repeat all the stuff from Chapters 14 & 15,  only wherever there was a z, we'll substitute a t.

Here we go....
Conditions for inference about a mean: 
(p. 434)
    ++ SRS
(or reasonable facsimile)
    ++ Population is Normal. 
(Can relax to symmetric, single-peaked unless n "very small")

"One-sample" t- procedures: SRS of size n.  Use Xbar to estimate µ.
Substitute s for sigma in the standardizing formula. We get t instead of z, with n-1 degrees of freedom.
       Check for at least approximate normality in the data set.

Confidence intervals: 
   Choose t* from table C, using the n-1 row, and confidence level C.
    Special case of common pattern:    estimate + t* SEestimate

Significance tests:  State hypotheses as in Ch. 15, find t from data, by:
 Calculating the one-sample t-statistic, using the null hypothesis value of µ (call it µ0)
Then proceed as if it were a "z", only using the (n-1) d.f. row in  table C,
to find P-values for the t*'s it's between, write "P-value is between ___ and___".
(Or use software which will find P-value exactly. )

Example: bacteria per milliliter in 10 specimens of  raw milk from one producer.
  Parameter: actual mean bacteria/ml.
       5370, 4890, 5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
4|5 
4|77
4|889 
5|11 
5|23 
 n = 10,   xbar = 4950,
s = 268.45   SEM = 268.45/sqrt(10) =268.45/3.162=84.89.  deg. of freedom = 9
90% CI:  from t(9) in table, t* = 1.833   CI is 4950+1.833x268.45/sqrt(10)
                                                       4950 +1.833x84.89, or  4950+155.6 bacteria/ml.
If we had KNOWN Population sigma = 268.45, 
  we'd have used z* = 1.645, gotten a narrower CI.   (but we don't know sigma!)

Test:  H0 : µ = 4800                          t = (4950 - 4800)/SEM = 150/84.89 = 1.767
          Ha : µ > 4800                           t is between 1.383 and 1.833   (d.f. = 9)
             (too contaminated)                Table C: One-sided P is between .10 and .05.  Some evidence for Ha
(If the test had been 2-sided, P would be between .20 and .10)
Excel t-procedures:  P-value = .05552

Next:  Matched pairs, and SPSS for "raw data"--Get Handout for Ch. 18 Next time.


Sievers home  Math151-Fall06/Daym39.htm  2pm 11/29/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.