MATH 251, P&S I, Fall 2007, Fri. Nov. 16, Day 36.after class.

Reading:  Finish 7.1. Error p. 461 bottom: "on moon days it is 1.50 3.02."   Read  Inference for nonnormal populations  including Sign test (pp. 465-468). Start 7.2, thru p. 497.   Finish up exam!
Hand in: 

A)  Re-create the results on the SPSS handout, for the matched pairs situations. 
B)  For the datasets on the handout, make (by hand) stemplots. 
  b) Matched pairs (full moon) data on p. 460, text. Make stemplots for the 3 variables (aggmoon, aggother, aggdiff).  For diff, you need a -0 and a +0 stem.Compare to p. 462. Their stemplot is on rounded data. The "outliers" don't look so "out" on truncated data. 

7.31, 7.32 SPSS vit. c, test and CI's; matched pairs +.  For 32b, you have to re-express the 5 "after" numbers as percents (e.g. Sample 1: 20/98 = 20.4%...) and then find a new CI of this data set. 

7.39 SPSS C: Factory to Haiti, matched pairs  The answers weirdly assume you'll do Haiti - Factory, when it seems more natural to do Factory - Haiti; and the SPSS file is set up to do Factory - Haiti.  Do it the natural way.  (Note something weird;  this must not be the same WSB of 7.31and 32 because it starts out with half the vitamin C.  What's up??)

Using SPSS in lieu of tables: You may use your calculator as an aid.  Sketch the probabilities, and show your computations. If feasible, check with book table. 
C. 1) a) P(t(30) < 1.3)   b) P(t(30) > 1.3)  c) P(-1 <t(30) < 1.3)
    4) a) P(X 14)  b) P(X 10)   c) P ( 11 < X 14)   d) P(X> 10) 
              where X is binomial, B(15, .8)
7.21f, 7.22 c only.  (SPSS) You did these using Table D and the Excel t-procedures for Day 35.  Now use the SPSS "tables" to calculate these P-values, and compare with the Excel t-procedures results.

7.47 piano lessons sign test by handAlso get an exact value for P from SPSS using the Binomial dist.
7.46 vitamin C--sign test by hand, using table C.  My book has a missing value in the p=.5, n =5 column: it's not 0.  Note that the values for p = .5 are symmetrical from n = 0 to n and fill in the missing value if you have one..  Also use SPSS to execute a sign test. Analyze>Nonparametric Tests>2 Related Samples. (Be sure you have labels)

7.27, 7.44 TBBMC Read, don't do the problems as written!  Sometimes a quick "sign test" will give an indication of whether there's a significant difference.  For these data (p. 478) just count the number of +'s in the 8 trials.  From your knowledge of flipping coins, will there be a significant difference between the operators?
- - - - -Postpone the rest (7.2)YES - - - - - - - -  -
Sec. 7.2, two-sample, by hand.  A table is a good way to organize the work, see example
 For problems involving calculating a CI and/or a test, give the Difference and  SEDiff as well as the answers asked for.   You can check your hand calculations using the Excel Two-sample calculator.
7.83 iron deficiency  (FYI, most US pediatricians recommend iron supplements for both...)
7.82 cocaine & birthweight
7.68&69, 70, 71. Bread Read them and do only this: Do 7.68a, b.  (You need to calculate 2 variances "by hand". Since n = 2, it's not too onerous with a simple calculator; but you may use any aid you like.)     Tell what analysis you would do for 7.69.   Tell whether the test in 70 will be significant, without computing anything.  Do 7.71.
7.66 flat screens  Read a, remember how to do it. Do b, c. 
    +7.67 new screens now
7.59 what's wrong?
7.60, 61 short answer questions
7.57 soft drink size--missing information  (A frustrating thing about published info--like this, they often leave out what you need to check up on them, investigate further.)

Read, 
discuss
 
 

 

Optional
(more practice)
SPSS handout for 7.1, t procedures
Exam 2: Takehome.  Due next class day,Monday Nov. 19 (Day 37), 3pm under my door or in my hand.
Quiz back Monday for sure.

What is the significance to Statistics of the Guinness Stout Bottle ?
~~~~~~~~~~~~~~~
Homework questions:  Day 35

MATCHED PAIRS t procedures: (get for free!)   Example by hand, robustness  Day 35
SPSS:  Analyze >Compare Means> Paired-Samples T-test.  handout
        Data in parallel columns--subtracts rightmost from left column. Don't get to choose which way to subtract.
        CI level under Options.
 or  Transform>Compute:  Let Target variable be Difference, Numeric expression be  VarA -VarB.  You can use the Difference to examine for Normality, do one-sample procedures on Difference.

What if t's not suitable?
 Skewness:  Try log or other transformation, work on transformed data.  (Sadly, CI's can't be transformed back. Because  µlog(X) is not equal to log(µX) ) last time.
 Outliers or other nonnormality:  Distribution-free/ nonparametric procedures.  Usually less power than distribution-based. (Uses less information, duh!)  Often based on binomial or similar models.

Sign test (p. 465-8) is a nice "trick", that turns any paired sample situation into a binomial situation.
For each pair, "success" is that the item from Group A is bigger than the matched item from Group B.  If there are ties, just throw them away (like the flipped coin that balances on its edge).
The null hypothesis is always that the groups are the same, so it is just like a coin-flip, the prob. of success is 1/2 under H0. Then see how likely you are to get at least as many successes as you saw, using the binomial distribution.  That's the p-value, for the alternative  Ha that Group A is bigger on average than Group B.  More specifically, we're testing this:
H0 : (the median of XGroup A-GroupB  is 0) ~ (probability that XGroup A-GroupB is positive = .5)  ~  ( p =.5) .
Ha:  (the median is above 0) ~ (probability that XGroup A-GroupB is positive > .5)  ~  ( p >.5) .

Example:  We suspect that students living on campus for their first semester gain weight.  Poll 11 students, asking just the sign of their weight change:
Get these results +  +  +  0  +  -  +  +  +  -  +  (0 means no change)   8 +'s and 2 -'s out of 10.
If there's no weight gain on average (Median gain is 0) we have a B(10, .5) distribution.  One sided alternative, that median gain is higher.  Let X be B(10, .5).  Then the P-value is P(X = 8, 9, or 10) = .0439 + .0098 + .0010 =  .0547, from Table C in the book.
SPSS:  Transform/Compute (pp.8&9, first handout)
    CDF.BINOM(7, 10, .5) gives the probability that X is less than or equal to 7, in a B(10, .5) distribution.  You will probably want to increase the number of digits after the decimal point (Decimals).  To find P (X = 8, 9, or 10), subtract the SPSS number from 1.

     Disadvantage:  You're obviously throwing away a lot of information (how big the differences are).  The result is that the power to detect a difference--if there is one--is much less than that of a t-test, where the t is usable.
     The sign test can be extended to a single data set, where you test the median:  If a is the median, then in the population, half the observations will be above a, and half below. Each data point is then like a coin flip, above or below the median.  (Can you see how this could be extended to test for a particular value of the first quartile, for instance?)
SPSS will do the sign test if you have the two "matched pair" variables.   (Be sure you have descriptive labels)
Analyze>Nonparametric Tests>2 Related Samples.  Get a box where you choose the pair (can't choose direction of subtraction).
Under Test Type, choose Sign.  Get counted results and two-sided P-value.
Start here Monday Yes
Sec. 7.2, Comparing two means"Two-sample tests".  Two SRS's, independent, from distinct  populations. (Populations are normally distributed)
Often--comparing means from an experiment with two treatments (usually control and "treatment"). Cf. p. 202.
                /--- Group 1, n1---- Treatment 1---\
              /                                    \
 Random asst.                                       Compare results
              \                                    /
               \--- Group 2, n2---- Treatment 2---/
To examine  the difference of the  two means, µ1 - µ2:
Theoretical assumption is normal populations.  Back to back stemplots are good; boxplots will do.
We use the Difference of the two x-bars,  diff xbar1 - xbar2 .
  The Standard Deviation  is calculated like the hypotenuse of a right triangle (Pythagorean Theorem),  from the individual standard deviations:
 
 

Then the "Two-sample z-statistic  is N(0,1) (p. 488)
But we don't know the population standard deviations!  We need the Standard Error of the difference  xbar1 - xbar2 , and then we can proceed as before, more or less. As usual, we substitute sample standard deviations for population standard deviations, and our z's are replaced by t's.

For testing, if Ho is "population means are equal"

"Two-sample t-statistic"

Unfortunately, this doesn't  have an exact t-distribution, and its exact distribution is very hard to deal with; but if we "adjust" the  degrees of freedom, t is a good approximation..

For doing by hand:  df = smaller of (n1- 1) and (n2- 1).
Will give a "conservative" result--slightly wider C.I., slightly less significance, than a "sharper" value.  If your results hinge on the difference between this result and the computer result, they're too close for comfort anyway. Table D? go to lower df. if the one you want isn't given.

From a computer:  df = complicated formula on p. 498.  Produces non-integer degrees of freedom.  Very good approximation to the exact distribution, if both sample sizes are at least 5. Unsuitable for doing by hand.

Once we have (xbar1 - xbar2) , SEdiff and the df, our formulas pattern on the earlier ones.
CI :  estimate + t* . SEestimate
    CI for µ1 - µ2, difference of means,  is 
Test:  H0: µ1 - µ2 = 0 same as µ1 = µ2 , "no difference" always
        Ha: µ1 - µ2 > 0 same as µ1 > µ2 Be careful with these, that you know which direction you want.
    or Ha: µ1 - µ2 < 0 same as µ1 < µ2 Often we label our variables "1" and "2" so that we expect µ1 > µ2
    or Ha: µ1 - µ2Not = 0 same as µ1Not =  µ2  (not equal)
        Calculate t, find P-value (approximate, conservative) Example by hand. 
 You can check your by-hand work with  Excel Two-sample calculator

Robust? Yes...p. 493  Outliers are bad, as before: Use same guidelines (p. 463) with n = n1 + n2
      Large n's have robustness from CLTh.
   Equal sample sizes help: then robust against non-normality , more so if populations have the same shape, down to n=5 each..
             In doubt? Use the conservative df!

--SPSS will do our computations when we are given raw data. Next.


Sievers home  Math251-Fall07/Day2s36.htm  10pm   11/15/07
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.