Math 151 , Day 39,  Friday, May 4, 2007 After class..link added 11am5/4 Hit reload .

HW Day38   Review  Ch. 15, to p. 376. Then 377-79,Table C. Optional: Two-sided Tests from Confidence intervals pp. 379-80
Reread first part and read rest of Ch. 16, to p. 396. Optional: Lightly, for the words and concepts of power, effect size, type I and II errors  pp. 396 to 402.  Check p. 405, 16.20, 16.24, 16.25 (c you should be able to take for granted.) 16.26 (they say b--but you can't really do probabilities on the nonresponse and other errors, so I don't think this is a well posed answer.  A better answer would talk about how much you can trust the interval) 16.27  Ch 17 outlines this section of the book.
Start reading Ch. 18:  We'll repeat the CI and test work, only with s instead of sigma, and t instead of z. 
Hand in  Monday . 
A.  For 15.18, 19, 37, 38, 42, 43,44 You wrote down your H's, your xbar, your P, on a separate sheetYou made a rough sketch of the normal dist. when H0 is true and the direction(s) of evidence for Ha . And marked your z on it.  Use table C with these z's to find "bracketing"  numbers for P:  ___< P < ___.  Check that your P calculated last time is actually between these bracketing numbers.
Two more Table C's
p.379 15.21&22  significance, Table C, 1 and 2 sided
p. 379 15.23 23 significance, Table C, 2 sided

Cautions about significance tests (and CI's)
p. 393, 16.5 Is it significant?
p. 394-5 16.6&7  Acid rain.  Do them by hand, and on the Applet.  You should, of course, get the same answers both ways.
p. 395, 16.8 Acid rain, Confidence intervals.  (there are only 3, not 6, since #6 and #7 are the "same" problem)
p. 396 16.9 rich parents and education

p. 407. 16.32 evidence, pacemakers
p. 407, 16.34 a, b. larger samples
p. 407, 16.35 significance is good for...
p. 408 16.36 sensitive questions (CI)
p. 408 16.37 college degrees (CI)
p. 408 16.39 supermarket shoppers (The data are in order, so a stemplot is easy)

p. 409 16.43 comparing package designs (What did they not tell us that we would want to know?)
p. 409, 16.44 island life (correlation coefficient)
p. 409, 16.45 helping welfare mothers  (The Clinton "welfare reform" depended, probably too much, on studies of this sort)
The last 3 may be a little harder, but try them:
p. 397, 16.10 searching for ESP
p. 408 16.40 success of trainees
p. 408 16.41 schizophrenia markers

Read, 
to discuss
p. 391, 16.3 environment

p. 407,  16.31 sampling at the mall

Optional 

(more practice)
p. 389, 16.1 TV poll

p. 391, 16.2 red lights
Your simulation of shoebox results:  (25 each, for mean of 20, mean of 24) 
To the circulating pad: Add your total # where P < .10, (your # of simulations: should be 25), and proportion (total # with P<.10)/25.   
I did it: My page of results

Exams returned.  Comments   Solutions         
Buffer against one low hour exam:
The final % exam grade minus 10 points will be substituted for the lowest hour exam grade, if it is higher.

Examples:
Ex1 Ex2 Ex3
Ex4 final % final -10
Student 1 Original 85 80 85
60 85 75, replaces lower 60
Treated 85 80 85
75 85 <--ß These will be used.
Student 2 Original 85 80 80
70 75 65, lower than 70, don't replace.
Treated 85 80 80
70 75
Student 3 Original 85 50 75
55 85 75, replaces lower 50
Treated 85 75 75
55 85 <--ßThese will be used

This is to encourage  all to try to put it together for the (cumulative!) final.

Effect of sample size on distribution of x-bars:  NormalandXbar.xls

Ch. 15: "Significance tests use an elaborate vocabulary, but the basic idea is simple: an outcome that would "rarely" happen if a claim were true--is good evidence that the claim is NOT true." (p.363 top)
HW questions?  Day 38
Show real shoebox results.

Day37 for other details.  Summary, comments:

The game:
Before taking data, define
H0: "Null hypothesis" A claim or statement about the population we would like to show is NOT true.
   Stated usually as:  A parameter = a particular value.  H0: µ =1000 hrs.  ("Average lightbulb life".)
Ha: "Alternative hypothesis" A claim or statement about the population we are trying to find evidence FOR.
      Stated usually as: The parameter  is >, or <, (one-tail tests) --
                       or NOT = the particular value. (two-tail)
   Some authorities say you should always do two-sided tests.  Others say:  If you have a hope or suspicion; are only interested in one direction, then do it that way.  What's NOT OK is to look at your data and then decide your alternative hypothesis.

Take data.  Calculate test statistic. For µ, test statistic is the z-score of xbar. (Start with xbar, standardize using mean of H0)
    Is it an unlikely result if  H0 is true?  Then that is evidence against H0.

Measuring the strength of the evidence against H0 (a common measuring stick for all distributions and parameters):
P-value of a test:  The probability, computed assuming that H0 is true, that the observed outcome would take a value as extreme or more extreme than that actually observed (if we could repeat taking-data again).  p. 368.
    The smaller the P-value, the stronger the data's evidence against H0 ( for Ha).

For a test of µ  , using xbar (sigma known), the P-value is
--the area of the tail beyond the observed xbar, in the direction of Ha (one-sided)
(--or twice that area (two-sided).)
<>Applet:  P-value of a test of significance automates this.  (Uses "raw" scale of xbars, rather than z-scores). 

So for a test of a mean, the P-value for one-sided is half that for two sided, IF the result is in the direction of evidence for the alternative.
            
A "Significance level" alpha is a probability level we decide on  in advance as being the "rarely" amount that will push us over into believing (well, sort of) that the H0 claim  is not true. (Historically older language than P-value.  Appropriate levels vary by discipline.)
We tend to use simple benchmark numbers for it, like .10 (1 in 10), .05 (1 in 20), .01 (1 in 100).
When the P-value is less  than (or equal to) a particular significance level alpha (say .05), we say,
    "The results are significant at the alpha = .05 level," or "The results are significant (P< .05)" .  Giving actual P is better, if you can.
Applet:  Applet: Statistical Significance
You can pick the alpha you desire, and see if your x-bar lies outside the "alpha" barrier(s). (approach of p. 376-79) But P-value is more informative.  (Some people  use "P-value" and "Significance level" to mean the same thing--P-value.)

What if you don't have the Z-table but only have the t-table (Table C)?   See Day 38 for details
What if you have a demanded level of significance, alpha?
    Table C: a limited list of probabilities  across the bottom rows:
            = Tail values for the bell curve distribution.   (one sided = one tail, two sided = two symmetrical tails)
        The value in the z* row above  P is the corresponding standard normal value ("critical value"). 
                 Check z* = 1.960, .025 above it (or below -1.960).  .05 farther out than it.  Corresponds to Table A.
      
  Do this: Find your z from the data. Make a sketch of the normal curve and mark your z on it.  Mark the direction(s) of Ha.
    (If your z is in the direction(s) of Ha, continue.  Otherwise the results are hopelessly not significant: you can quit.)
Find the two z*'s in Table C that bracket your z (ignore minus sign).  Find the corresponding P's.
    e.g. z =2.111
                                                 z = 2.111
      z*         2.054 \/ 2.326
One-sided P  ...  .02     .01
Two-sided P  ...  .04     .02 


      So the P-value for your z is: between .02 and .01 (If it's a one sided test)
         &  between double those 2 p's--between .04 and .02 (If it's a two sided test)

    Test is significant at the bigger bracketing probability; not sig. at the smaller.
If you have a specific demanded significance level, compare it with these levels.
            If  a test is significant at level b, then it is significant at every level bigger than b.
            If a test is Not significant at level d, then it is Not significant at every level smaller than d.
    "Significant at a":  probability of getting my results (again) by chance (if H0 is true) is less than (or =) a. My result is less common than a.

- - - - - - - More NEW STUFF- - - -  - -
back to Ch. 16, cautions:  (SRS, normal pop., sigma known)
>>How small a P is convincing evidence against H0?  (What alpha, to "Reject H0?)
  
--Is Ha surprising?  (Entrenched opinion is "for" H0 .  )  Need strong evidence (small P).
   --Is rejecting  H0 expensive?  Need strong evidence for Ha
        [May need to repeat experiment for doubters]
No sharp border between "significant" and "not"--though decisions may need to be made.

>>Statistical significance is not the same as practical significance ("clinical significance") 
      Tiny difference can be statistically significant if sample size is large.
       Big difference may not be statistically significant if sample size is too small.
Do confidence intervals:  Estimate the size of the effect, not just yes/no of test..


>>Multiple Tests: beware! pp. 395-6
    If you do 100 tests and use the alpha = .05 significance level for each, then the structure of testing requires this:
    When all 100 null hypotheses H0 are true, out of your 100, about 5 of the 100 (.05) will give "significant" results by chance alone (falsely indicating the alternative hypothesis is to be preferred.)   (10%--2 or 3-- of your 25 simulations of the shoebox with mean 20 will give "significant" (P< .10) results even though the mean is the null value of 20)  (My results)
    Moral: if you use the testing mechanism as a screening instrument for many questions, a proportion will give falsely significant results.  You can't accept the results from such multiple tests as good evidence, only as indicating questions requiring further, more specific study. The game gives you one shot, not a hundred shots.    (This is becoming an important issue for developing new statistical techniques, for instance in biology, where microarrays can do a thousand tests at once.)
(not in text) You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. Every data set will turn up with some unusual pattern if you examine it hard enough. 
       (If you must explore and confirm with the same data set, one way is to (randomly) take half the data set, explore and generate hypotheses; then use the other half for confirmatory tests.  You can use P-value to describe unusualness, but be wary of making decisions with it if you didn't expect that particular unusualness.)

>>All the warnings about designing experiments and surveys still apply!

& & & & & & & & &
Today Look back
at 11.38, p. 297? Day 38.  


Sievers home  Math151-Sp07/Daysp39.htm  4:45pm 5/4/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.