Math 151 , Spring 2006, Day 33 Friday,  April 21 Hit reload After class

Exam 3, a week from today (Day 36, April 28)  Covers Part III, experiments (one-factor), diagrams, several designs.  (Day 23HW on).  Part IV (what we did), and V thru Monday Day 34   Watch this space for Sample exam problems.
Day 33: Reading:  Chapter 20+21 thru p. 392 (Activstats is good here too.)  Then continue (Alpha levels) through p.397.  Lightly through Power . Read What can go wrong p. 401 and the rest. (SPSS won't do proportion computations, but some other programs do; it's good to have an idea what you might see, p. 402.)   Skip "a better confidence interval", p. 383.
Hand in (All D&V)

Chapter 20, p. 386: 
1, 2 (change c to "...sure that more than 60% of the people like...",) Hypotheses
Negatives (meaning of P)  d is gibberish, tho the back of the book says it's correct!
Dice (Meaning of P) I think the problem is badly presented!  The null hypothesis they are using is that the die is fair; we want to collect evidence to assess the strength of the seller's claim that it is loaded.  (Saying "we don't believe" his claim could make it look like his claim that it's loaded should be the null.  But actually, we are  skeptics who demand evidence against the fair-die null hypothesis before we buy; we also do the test.) 
5, 6 Relief, Cars (Conclusions from P), 
9 Dowsing (doing a test)
13 Pollution (doing it)
19 Women executives
22 Acid rain (inc. CI)
A. a) Use your greeen shoebox result to do a One-sided test against the null hypothesis p = .5, with alternative HA: p < .5.
Postpone the rest
+ + + + + + + + + +  Two sided: For some reason, D&V don't model or assign any 2-sided problems (except #8).  We need to be used to them for later, so here are a few.
A.  b) Use your greeen shoebox result to do a Two sided test against the null hypothesis p = .5.
P. 386 #7 Find the mistakes (The first mistake is that both hypotheses are written with incorrect notation.  The second is that the alternative hypothesis is chosen wrongly.  I would write the company's goal as "more than 90% succeed"--I think that makes it clearer what structure to use.)
#8 Find the mistakes
From ActivStats, copied here:
MRA-304-2:  Kerrich Coin Toss  While he was a prisoner of the Germans during World War II, the British statistician John Kerrich tossed a coin 10,000 times.  He got 5067 heads.  Take Kerrich's tosses to be an SRS from the population of all possible tosses of his coin.  If the coin is perfectly balanced, p = 0.5.  Is there reason to think that Kerrich's coin was not balanced?

TRE-396-9:  Store Checkout-Scanner Accuracy (adapted from ActivStats HW):
In a study of store checkout-scanners, 1234 items were checked and 20 of them were found to be overcharges (based on data from "UPC Scanner Pricing Systems: Are They Accurate?" by Goodstein, Journal of Marketing, Vol. 58).  Before scanners were used, the overcharge rate was estimated to be about 1% . Based on these results, do scanners appear to give a different rate of overcharges than the old method of keying in the price?  (All items had to have individual price tags; scanning is much less labor-intensive.)  Do the steps, finding the P-value and stating a conclusion. 

Read,
  to 
discuss 
Op
tion
 al
Optional but good: MCS-353-71 (adapted):  Political candidates
To get their names on the ballot of a local election, political candidates often must obtain petitions bearing the signatures of a minimum number of registered voters.  In Pinellas County, Florida, a certain political candidate obtained petitions with 18,200 signatures (St. Petersburg Times, Apr. 7, 1992). To verify that the names on the petitions were signed by actual registered voters, election officials randomly sampled 100 of the names and checked each for authenticity.  Only two were invalid signatures. 
a) Is 98 out of 100 verified signatures sufficient to believe that more than 17,000 of the total 18,200 signatures are valid?  (Restate these as proportions, design a test.)
b) Repeat part (a) if only 16,000 valid signatures are required. 
Based on Statistics, McClave and Sincich, pg. 353
c) Construct a 95% CI for the proportion of valid signatures.   Turn the endpoints of the CI into numbers of valid signatures by multiplying by 18,200.
+ + + + + + +
For all the shoebox proportions of # of 1's (p-hat's) you gave me, on one sheet or another, I compiled the results, and added your dots to a dot plot and your  68% CI on the graph.
  We have 15/20 = 75% of our intervals containing p=.4.  Pretty close to the designed 68% Confidence level.

Homework questions? Day 31
Level C confidence interval estimate of population proportion p:
 "One -proportion  z-interval"


 Sample size for desired ME and C Day 31 
  Why this ME "works". Day 31


We're doing "confirmatory" analysis here:  We've explored, developed ideas, things we want to measure and ways to measure them. 
--Estimating parameter value using statistic from sample:  Confidence interval estimates.  Other big category:

Tests:  (Chapter 20, for proportions) You have a hypothesis about the world. And some data.
Does the data lend support to the hypothesis, or is the data inconsistent with the hypothesis?
      (Retain / fail to reject the hypothesis)                       (Reject the hypothesis)
Easier to reject a hypothesis than to show that it's true.

Lots of machinery and vocabulary:
NULL Hypothesis Ho : (Straw man we collect evidence against.  No change from Status quo.)
Assume Ho is true.  Look at evidence (data).  Is it inconsistent with Ho ? Then Reject Ho .
  (How inconsistent with Ho is the data?  a little, somewhat, very?  how do we measure it?  Turn into numbers---)

Ho : a specific model for the population, with a specific parameter value.
example (suppose I hadn't told you...):  Green shoebox is full of 0's and1's.  I tell you Equal numbers.
Ho : p = .5 (proportion of 1's is 50%)  po for a general label.
    Is your  sample (n = 30) far enough away from .5 to say that I'm lying? Suppose you believe I undersupplied 1's:

HA : p < .5  (one-sided alternative hypothesis:  What you hope /fear /would like to prove)
How do we measure "far enough away?"
IF Ho  is true: how far out (weird) is your p-hat?
IF  p = .5, how far from  the "real" p is your p-hat?  po = .5
                 Distribution of p-hats is approx.  N(), N( .5, sqrt(.5 ·.5/30)),  N( .5, .091)  (Usual assumptions.)
    Suppose you got 12 1's.  p-hat = .4. IF  p = .5, p-hat = .4 has a z-score of -.1/.091 = - 1.095 ~ -1.10 Sketch the Normal.
        If you know your z-scores, this is meaningful.  A more universal measure is the
P-value:  The probability, assuming Ho  is true, of observing the result we have (or one more extreme)--if we could do the experiment again...  Strength of evidence against Ho(thus for HA)
   In our example: The probability of getting a p-hat of .4 or below, IF p = .5. Sketch on the curve.
                   The "tail" below z = -1.10.  From normal table, .1357 ~ 13.6%.  So
    P-value = .136.  Not so unusual; happens more than 1 in ten times (13-14 in a hundred).  Suggestive but not "significant" by most people's standards.

Example:  U.S. Consumer Product Safety Comm. says 90% of American homes have smoke detector(s). Fire department runs a big publicity campaign to raise awareness and use.  Have they raised the level in this city? (Assume! that our city matched the national beforehand, with 90%)
 Data:  Building inspectors visit 400 (random) homes, find 376 have detectors.   p is the proportion of detectors in the population (the whole city) .
--HypothesesHo : p = .9  (unchanged after campaign),   HA : p > .9  (raised after campaign)
--Model:  Want to do One-proportion z-test.  Independence? Yes, random sample. n = 400. Population (city) > 4000 homes (10% rule).  .9x400 = 360, .1x400 = 40 so success/failure rule met.  Sample proportion p-hat modeled by Normal OK.
--Mechanics:    n = 400, successes x = 376.  p-hat = .940.   SD(p-hat) = sqrt(.9 ·.1/400)= .015 since we're assuming Ho true.  z=(.940-.9)/.015 = .04/.015 = 2.67.   One-sided alternative, evidence for it is to the right.  So P-value is proportion in the tail above z = 2.67; P=P(z> 2.67) = 1 - .9962 = .0038 ~ 0.4%
--Conclusions:  P-value is very low:  Strong evidence that the city proportion has risen.  (Reject Ho)
Detail:  IF Ho : p = .9  is true (the city proportion is really unchanged) THEN the chances of us seeing a proportion as high as .94 in another sample of 400 is only about 4 in a thousand.  (We prefer to believe that the alternative hypothesis is true than to believe that we got such a "lucky" sample.)

Suppose (as here) we do Reject Ho in favor of HA.  We are pretty sure the true p is different from po but by how much?
    (The "size of the effect" = ptrue - po.)  Still need an estimate of the true p. Use a CI to estimate ptrue.
--Estimate:  How much has the proportion  changed?  Confidence Interval for "new" p.
Now use SE(p-hat) = sqrt(.94 ·.06/400) = .0119 since were not assuming the null hypothesis is true.
95% CI 's ME = 1.96 · .0119 = 0.023324~ .023.
95% CI:  .94 + .023,  I'm 95% confident the real proportion is between 91.7% to 96.3%.

Statistical inference in a nutshell:
Am I surprised (If Hois true)? (Do I reject null)
    How surprised? (give P-value)
  What would not surprise me?  (confidence interval)

Start here Monday:
More about the

Alternative hypothesis:  Null hypothesis is often a particular parameter value.  Alternative is something "different."
   Why? are you doing a test.   Back to shoebox:
       HA : p < .5  You have reason to believe I skimped on the 1's.         One-sided
OR  HA : p > .5   You have reason to believe I put in more 1's than 0's.  One-sided
OR  HA : p not = .5  You believe the 0's and 1's are not equal, but don't know which way. Two-sided.

P-value concept needs refining:
    For One-sided alternatives, P-value is the single "tail" beyond our observed statistic,  in the direction of the alternative hypothesis.
   For a Two-sided alternative, P-value is "double the tail" beyond our observed statistic, because we could be "as or more extreme" in either direction!   (Measuring how weird our observation is, if  Ho is the case.)

Example:  Shoebox, 12/30 1's.  We got z = - 1.10.  If your alternative is HA : p not = .5,
   there is probability .136 below z = - 1.10, and probability .136 above z = + 1.10,
   so the P-value = 2· .136 = .272.  1 in 4?   Not unusual at all.  Can't reject Ho .

 Example. Look at Therapeutic touch. Why? are you testing.  (Same experiment, different HA 's)
      Activstats 20-2, activity 2, HA : p not = .5  The detection is different from chance (better, or worse)
      D&V p. 389-91 Step-by-step,  Ch. 21  HA : p > .5  Can detect "energy field" better than chance.
    n = 150, observed successes 70.  p-hat = .467.   (notice, less able to detect than just chance)
        z-score is - .825 if you round SD to .04(text), -.809 if you round to .0408 (AS).  ( small differences in normal table.)
   HA : p not = .5  There is .2090 to the left of -.81; approximately .21 .  Two-sided P value is .21 + .21 = .42.  No evidence for ability different from chance.
   HA : p > .5   The statistic is actually on the wrong side  of the middle.  So the P-value is still the "up" side; the probability of seeing a p-hat greater than that observed.  So the P-value is 1- left tail value.  Using the above, P = 1-.21 = .79.  The chance of seeing a p-hat greater than what we observed (if they can't detect better than chance) is more than 3 out of 4.   Unsurprising result, no evidence that they can detect energy field.

One or two sided? p. 390   Statististicians differ philosophically.  Some much prefer 2 sided all the time. ("How can we really know which way things are changed/different?"  If it's clear you're expecting to see / looking to prove a particular direction, many (most? Me.) use one-sided. (D differs from V, I think.)  Say up front.  DON'T decide on one-sided/which side after you've seen the data.  That's cheating, statistically.

Skip for now
"A Better Confidence Inteval..." p. 383.  We do a lot of approximating to get our CI's. This turns out to give trouble, especially for p's closeish to 0 or 1.  This is a nice "fix;" relatively newly accepted (1998).  Adds nothing to conceptual understanding.

Next:  "statistical significance" and "alpha"


Sievers home  Math151-Sp06/Daysp33.htm  3:30pm 4/21/06
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.