MATH 251, Probability and Statistics I, Fall 2005, Fri. Nov. 18, Day 36After class

Reading:  Finish 7.1. Error p. 461 bottom: "on moon days it is 1.50 3.02." Also (p.462) their stemplot is on rounded data. The "outliers" don't look so "out" on truncated data.   Read  Inference for nonnormal populations  including Sign test (pp. 465-468). Start 7.2, thru p. 497.   Work on exam!
Hand in: 

A)  Re-create the results on the SPSS handout, for the matched pairs situations. 
B)  For the datasets on the handout, make (by hand) stemplots. 
  b) Matched pairs (full moon) data on p. 460, text. Make stemplots for the 3 variables (aggmoon, aggother, aggdiff).  For diff, you need a -0 and a +0 stem.Compare to p. 462. 

7.31, 7.32 SPSS vit. c, test and CI's; matched pairs +.  For 32b, you have to re-express the 5 "after" numbers as percents (e.g. Sample 1: 20/98 = 20.4%...) and then find a new CI of this data set. 

7.39 SPSS C: Factory to Haiti, matched pairs  The answers weirdly assume you'll do Haiti - Factory, when it seems more natural to do Factory - Haiti; and the SPSS file is set up to do Factory - Haiti.  Do it the natural way.

Using SPSS in lieu of tables: You may use your calculator as an aid.  Sketch the probabilities, and show your computations. If feasible, check with book table. 
C. 1) a) P(t(30) < 1.3)   b) P(t(30) > 1.3)  c) P(-1 <t(30) < 1.3)
    4) a) P(X 14)  b) P(X 10)   c) P ( 11 < X 14)   d) P(X> 10) 
              where X is binomial, B(15, .8)
7.21, 7.22 (SPSS)comparing SPSS with table D results.

7.47 piano lessons sign test by handAlso get an exact value for P from SPSS using the Binomial dist.
7.46 vitamin C--sign test by hand, using table C.  My book has a missing value there: it's not 0.  Note that the values for p = .05 are symmetrical from n = 0 to n and fill in the missing value if you have one..  Also use SPSS to execute a sign test. Analyze>Nonparametric Tests>2 Related Samples. (Be sure you have labels)

7.27, 7.44 TBBMC Read, don't do the problems as written!  Sometimes a quick "sign test" will give an indication of whether there's a significant difference.  For these data (p. 478) just count the number of +'s in the 8 trials.  From your knowledge of flipping coins, will there be a significant difference between the operators?
- - - - -Postpone all the rest (7.2) - - - - - - - -  -
Sec. 7.2, two-sample, by hand.  A table is a good way to organize the work, see example
 For problems involving calculating a CI and/or a test, give the Difference and  SEDiff as well as the answers asked for.
7.83 iron deficiency  (FYI, most US pediatricians recommend iron supplements for both...)
7.82 cocaine & birthweight
7.68&69 Bread Read them and Do only this:  7.69a.  Tell what analysis you would do for 7.69.
7.66 flat screens  Read a, remember how to do it. Do b, c. 
    +7.67 new screens now
7.59 what's wrong?
7.60, 61 short answer questions
7.57 soft drink size--missing information  (A frustrating thing about published info--like this, they often leave out what you need to check up on them, investigate further.)

Read, 
discuss
 
 

 

Optional
(more practice)
SPSS handout for 7.1, t procedures
Exam 2: Takehome.  Due this coming Monday Nov. 21 (Day 37), 1pm under my door or in my hand.
Quiz back Monday.
Look at Power curve (on overhead).

What is the significance to Statistics of the Guinness Stout Bottle ?
~~~~~~~~~~~~~~~
SPSS:  Transform/Compute (first handout)
"tables": CDF functions  take value x, give Prob of being less than or equal to x (like our book's Normal table)
             IDF functions take probability p, give value x such that the probability of being less than or equal to x is p. The help in SPSS on these is sloppy, leaves off the "or equal to." Irrelevant for continuous distributions, crucial for discrete ones.
    CDF.BINOM(4, 5, .5) gives the probability that X is less than or equal to 4, in a B(5, .5) distribution.  You will probably want to increase the number of digits after the decimal point (Decimals).

MATCHED PAIRS t procedures: (get for free!)   Example by hand Day 35
SPSS:  Analyze >Compare Means> Paired-Samples T-test.  handout
        Data in parallel columns--subtracts rightmost from left column. Don't get to choose which way to subtract.
        CI level under Options.
 or  Transform>Compute:  Let Target variable be Difference, Numeric expression be  VarA -VarB.  You can use the Difference to examine for Normality, do one-sample procedures on Difference.

What if t's not suitable?
 Skewness:  Try log or other transformation, work on transformed data.  (Sadly, CI's can't be transformed back. Because  µlog(X) is not equal to log(µX) ) last time.
 Outliers or other nonnormality:  Distribution-free/ nonparametric procedures.  Usually less power than distribution-based. (Uses less information, duh!)  Often based on binomial or similar models.

Sign test is a nice "trick", that turns any paired sample situation into a binomial situation.
For each pair, "success" is that the item from Group A is bigger than the matched item from Group B.  If there are ties, just throw them away (like the flipped coin that balances on its edge).
The null hypothesis is always that the groups are the same, so it is just like a coin-flip, the prob. of success is 1/2 under H0. Then see how likely you are to get at least as many successes as you saw, using the binomial distribution.  That's the p-value, for the alternative  Ha that Group A is bigger on average than Group B.  More specifically, we're testing this:
H0 : (the median of XGroup A-GroupB  is 0) ~ (probability that XGroup A-GroupB is positive = .5)  ~  ( p =.5) .
Ha:  (the median is above 0) ~ (probability that XGroup A-GroupB is positive > .5)  ~  ( p >.5) .
Example:  We suspect that students living on campus for their first semester gain weight.  Poll 11 students, asking just the sign of their weight change:
Get these results +  +  +  0  +  -  +  +  +  -  +  (0 means no change)   8 +'s and 2 -'s out of 10.
If there's no weight gain on average (Median gain is 0) we have a B(10, .5) distribution.  One sided alternative, that median gain is higher.  Let X be B(10, .5).  Then the P-value is P(X = 8, 9, or 10) = .0439 + .0098 + .0010 =  .0547, from Table C in the book.

     Disadvantage:  You're obviously throwing away a lot of information (how big the differences are).  The result is that the power to detect a difference--if there is one--is much less than that of a t-test, where the t is usable.
     The sign test can be extended to a single data set, where you test the median:  If a is the median, then in the population, half the observations will be above a, and half below. Each data point is then like a coin flip, above or below the median.  (Can you see how this could be extended to test for a particular value of the first quartile, for instance?)
SPSS will do the sign test if you have the two "matched pair" variables.   (Be sure you have descriptive labels)
Analyze>Nonparametric Tests>2 Related Samples.  Get a box where you choose the pair (can't choose direction of subtraction).
Under Test Type, choose Sign.  Get counted results and two-sided P-value.
Start here Monday
Sec. 7.2, Comparing two means"Two-sample tests".  Two SRS's, independent, from distinct  populations. (Populations are normally distributed)
Often--comparing means from an experiment with two treatments (usually control and "treatment"). Cf. p. 202.
                /--- Group 1, n1---- Treatment 1---\
              /                                    \
 Random asst.                                       Compare results
              \                                    /
               \--- Group 2, n2---- Treatment 2---/
To examine  the difference of the  two means, µ1 - µ2:
Theoretical assumption is normal populations.  Back to back stemplots are good; boxplots will do.
We use the Difference of the two x-bars,  diff xbar1 - xbar2 .
  The Standard Deviation  is calculated like the hypotenuse of a right triangle (Pythagorean Theorem),  from the individual standard deviations:
 
 

Then the "Two-sample z-statistic  is N(0,1) (p. 488)
But we don't know the population standard deviations!  We need the Standard Error of the difference  xbar1 - xbar2 , and then we can proceed as before, more or less. As usual, we substitute sample standard deviations for population standard deviations, and our z's are replaced by t's.

For testing, if Ho is "population means are equal"

"Two-sample t-statistic"

Unfortunately, this doesn't quite have an exact t-distribution, and its exact distribution is very hard to deal with.

For doing by hand:  df = smaller of (n1- 1) and (n2- 1).
Will give a "conservative" result--slightly wider C.I., slightly less significance, than a "sharper" value.  If your results hinge on the difference between this result and the computer result, they're too close for comfort anyway. Table D? go to lower df. if the one you want isn't given.

From a computer:  df = complicated formula on p. 498.  Produces non-integer degrees of freedom.  Very good approximation to the exact distribution, if both sample sizes are at least 5. Unsuitable for doing by hand.

Once we have (xbar1 - xbar2) , SEdiff and the df, our formulas pattern on the earlier ones. Example by hand.
CI :  estimate + t* . SEestimate
    CI for µ1 - µ2, difference of means,  is 
Test:  H0: µ1 - µ2 = 0 same as µ1 = µ2 , "no difference" always
        Ha: µ1 - µ2 > 0 same as µ1 > µ2Be careful with these, that you know which direction you want.
    or Ha: µ1 - µ2 < 0 same as µ1 < µ2 Often we label our variables "1" and "2" so that we expect µ1 > µ2
    or Ha: µ1 - µ2Not = 0 same as µ1Not =  µ2  (not equal)
        Calculate t, find P-value (approximate, conservative)
Robust? Yes...p. 493  Outliers are bad, as before: Use same guidelines (p. 463) with n = n1 +n2
      Large n's have robustness from CLTh.
   Equal sample sizes help: then robust against non-normality , more so if populations have the same shape, down to n=5 each..
             In doubt? Use the conservative df!

--SPSS will do our computations when we are given raw data. Next.


Sievers home  Math251-Fall05/Dayps36.htm  3pm   11/18/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.