Math 151 , Fall 2004, Monday Day 37, Nov.22 Hit reload...After class

HW assignment Day 37
(re)Read rest of 6.3.  Read  7.1, at least to p. 374
Hand in:
Table C:   * can be done without Table C; but use Table C now.
p.341, 6.48 CEO pay again (what you would do if you didn't have Table A)
p. 341, *6.46, 6.49 general z statistic, significance,Turn the page--6.49 continues. 
p. 342 *6.50 patent protection; another z.
= = = = = = = = = = 
Fixed significance levels: if you only have table C, what can you say? 
p. 337, 6.37 testing number generator
6.38 nicotine content
= = = = = = = = = = 
Sec. 6.3, (pp. 346-48 is new), and  notes below 
 p. 347 6.58 500 tests for psychic powers

p. 348 6.62, 77 potential schizophrenia markers
* * * * * * * * * * * * * * * * * * * * * * * * *

Read to see the issues: Chapter 7, Sec. 7.1 
Next time:  p. 364 7.1, 7.2, 7.3 "Standard error" & t-distribution family
+++++++++++++++++

Will be assigned next:
p. 373, 7.4  CI  t*

 p. 386, 7.19 Shrimp ATP CI  A common calculational mistake is to divide the SE by 
 square-root-of-n.  But square-root-of-n is already IN SE!  Don't divide by it again!  (I.e. pay 
 attention to the difference between "standard deviation" and "standard error.") 
7.5, 7.6 test, one- & two-sided
7.7 DDT  Find the mean and standard deviation by hand!(only 4 points) and do the rest by
 hand.  Make a note of your results; we will do this on SPSS too, check the results. 


Optional 

 

"Significance ", using table C, see Day 34
Statistically Significant result vs. convincing evidence,  important difference from H0, see Day 33
"Significance testing" vs. "Hypothesis testing"--gathering evidence vs. making decisions. 
= = = = = = = = = = = = = = = = = = = = = =
 More cautions and limitations--Sec 6.3, cont'd (pp. 346-8)
>>(not in text)You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. Every data set will turn up with some unusual pattern if you examine it hard enough. 
       (If you must explore and confirm with the same data set, one way is to (randomly) take half the data set, explore and generate hypotheses; then use the other half for confirmatory tests.  You can use P-value to describe unusualness, but be wary of making decisions with it if you didn't expect that particular unusualness.)

All the warnings about designing experiments and surveys still apply.

>>Multiple Tests: beware! pp. 346-7
    If you do 100 tests and use the alpha = .05 significance level for each, then the structure of testing requires this:
    When all 100 null hypotheses H0 are true, out of your 100, about 5 of the 100 (.05) will give "significant" results by chance alone (falsely indicating the alternative hypothesis is to be preferred.)
    Moral: if you use the testing mechanism as a screening instrument for many questions, a proportion will give falsely significant results.  You can't accept the results from such multiple tests as good evidence, only as indicating questions requiring further, more specific study. The game gives you one shot, not a hundred shots.
= = = = = = = = = = = = = = = = = = = = =

Chapter 7, Inference for Distributions (we'll do 7.1, 7.2, maybe the first segment, to p. 414, of 7.3)

Inference for means, using xbar from a SRS to make inference about µ:

Large n
 Sigma known          Sigma unknown
Small n
 Sigma known          Sigma unknown
normal
Population is 
not normal
 Xbar is normal; 
find z using sigma
 Xbar is normal; 
find z using s.
Xbar is normal; 
find z using sigma
Xbar is normal; 
Find t using s
Xbar is normal-ish (CLTh); 
find z using sigma
Xbar is normal-ish (CLTh); 
find z using s
Unrealistic. sigma's 
only "good" for 
normal pop's.
(See p. 381) 
If you can't use t, 
Find a statistician

t-distribution family:  like standard normal only slightly fatter in the tails.  Mean = 0. Symmetrical around 0.
    "Degrees of freedom" tell which member of the t family.
      t(k) is the t distribution with k degrees of freedom.
 Comparison with normal (Excel file)
    Lower d.f.--fatter tails.  Higher d.f.--more like standard normal.
    Table C:  upper tail:  probability <--> "critical" t-value.

Start working on green box:
Assume Normal population .  Mean µ, s.d. sigma, both unknown.
Take SRS, size n, find xbar, find s (sample standard dev.)

Standard error of the (sample) mean =    Standard deviation of xbar, estimated from the data.
  "Standard error of the mean":  s/sqrt(n) SEM, SEXbar, etc.
        Just like sigma/sqrt(n), only s from data replaces sigma.
  When you estimate the standard deviation of a statistic,
                the resulting estimate is called the "standard error" of the statistic.

Standardizing xbar with s instead of sigma results in
   the one-sample t statistic
which has the t-distribution with n-1degrees of freedom.

We'll now repeat all the stuff from Chapter 6, only wherever there was a z, we'll substitute a t.
Here we go....
"One-sample" t- procedures: SRS of size n.  Use Xbar to estimate µ.
Substitute s for sigma in the standardizing formula. We get t instead of z, with n-1 degrees of freedom.
        It's a good idea to check for at least approximate normality in the data set.

Confidence intervals: 
   Choose t* from table C, using the n-1 row, and confidence level C.
    Special case of common pattern:    estimate + t* SEestimate

Significance tests:  State hypotheses as in Ch. 6, find t from data, by:
 Calculating the one-sample t-statistic, using the null hypothesis value of µ (call it µ0)
Then proceed as if it were a "z", only using the (n-1) d.f. row in  table C,
to find P-values for the t*'s it's between, write "P-value is between ___ and___".
(Or use software which will find P-value exactly. )

Example: bacteria per milliliter in 10 specimens of  raw milk from one producer.
  Parameter: actual mean bacteria/ml.
       5370, 4890, 5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
4|5 
4|77
4|889 
5|11 
5|23 
 n = 10,   xbar = 4950,
s = 268.45   SEM = 268.45/sqrt(10) =268.45/3.162=84.89.  deg. of freedom = 9
90% CI:  from t(9) in table, t* = 1.833   CI is 4950+1.833x268.45/sqrt(10)
                                                       4950 +1.833x84.89, or  4950+155.6 bacteria/ml.
If we had KNOWN Population sigma = 268.45, 
  we'd have used z* = 1.645, gotten a narrower CI.   (but we don't know sigma!)

Test:  H0 : µ = 4800                          t = (4950 - 4800)/SEM = 150/84.89 = 1.767
          Ha : µ > 4800                           t is between 1.383 and 1.833   (d.f. = 9)
             (too contaminated)                P is between .10 and .05.  Some evidence for Ha
(If the test had been 2-sided, P would be between .20 and .10)

SPSS--Get Handout for 7.1 Next time.  Type in above data and find P-value and CI.  (Dataset)
    Analyze>Compare Means>One-Sample T Test:  Test value = 4800.
           P-value is labeled "Sig (2-tailed)"--divide by 2 for 1 tail (if observed is in correct direction)
   Analyze>Descriptive Statistics>Explore.  Statistics button, set Confidence level.


Sievers home  Math151-Fall04/Dayf37.htm  10pm 11/22/04
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.