MATH 251, Probability and Statistics I, Fall 2005, Wed. Nov. 16, Day 35 after class

Reading:  7.1 through p. 465 (one-sample matched pairs, robustness). Error p. 461 bottom: "on moon days it is 1.50 3.02." Also (p.462) their stemplot is on rounded data. The "outliers" don't look so "out" on truncated data.   Read  Inference for nonnormal populations up to (not including) Sign test (pp. 465-468). (Sign test & start 7.2 next time) Work on exam! 
Hand in: 

Do: 7.8 ADD (CI by hand)
Do: 7.12 sales (test by hand)
A)  Re-create the results on the SPSS handout, Do: for the one-sample and Postpone:  matched pairs situations. 
B)  For the datasets on the handout, make (by hand) stemplots. 
Do: a) Cola--data on handout. Comment on suitability of t-procedures.
Postpone: b) Matched pairs (full moon) data on p. 460, text. Make stemplots for the 3 variables (aggmoon, aggother, aggdiff).  For diff, you need a -0 and a +0 stem.Compare to p. 462.

Do: 7.5 SPSS luck CI Notice the interesting distribution of the sample.  What does the CI  not tell you?
Do: 7.34, 7.35 SPSS blood phosphateCI &test
Do: 7.37 SPSS radon detectors, test

Postpone: 7.31, 7.32 SPSS vit. c, test and CI's; matched pairs +.  For 32b, you have to re-express the 5 "after" numbers as percents (e.g. Sample 1: 20/98 = 20.4%...) and then find a new CI of this data set. 

Postpone: 7.39 SPSS C: Factory to Haiti, matched pairs  The answers weirdly assume you'll do Haiti - Factory, when it seems more natural to do Factory - Haiti; and the SPSS file is set up to do Factory - Haiti.  Do it the natural way.

Do: 7.50 motor insulation (logarithm)

Read, 
discuss
 
 

 

Optional
(more practice)
 
SPSS handout for 7.1, t procedures
Exam 2: Takehome.  Due Monday Nov. 21 (Day 37), 1pm under my door or in my hand.
Quiz, 10:05 today

Will show Friday: SPSS:  Transform/Compute (first handout)
"tables": CDF functions  take value x, give Prob of being less than or equal to x (like our book's Normal table)
             IDF functions take probability p, give value x such that the probability of being less than or equal to x is p. The help in SPSS on these is sloppy, leaves off the "or equal to." Irrelevant for continuous distributions, crucial for discrete ones.

One-sample t procedures by hand: Day 34   Milk bacteria, see bottom  5370, 4890, 5100, 4500, 5260, 5150, 4900, 4760, 4700, 4870
SPSS: Analyze>Compare Means>One-Sample T Test:  Test value = 4800.
           P-value is labeled "Sig (2-tailed)"--divide by 2 for 1 tail (if observed is in correct direction)
   Analyze>Descriptive Statistics>Explore.  Statistics button, set Confidence level.

Will show Friday: also SPSS for it. MATCHED PAIRS t procedures: (get for free!)
   before--after, left hand--right hand, Drug A vs. Drug B on the same person or on a matched pair (Sec. 3.1,pp. 207-8)
For each pair, find the difference in the observed values.  Then treat these differences as if they are "the" data set, from a normal population, and do One-sample t procedures.
Usually the null hypothesis will be "µ = 0", there is "no difference" between the treatments.
Example:  wax paper sandwich bags:  Is the wax layer the same inside and out?
25 bags:  measure (wax outside - wax inside) for each.  (pounds per square foot). n = 25
    Differences:   xbar = .093,  s = .723      SEM = .723/5 = .1446
    H0 : µ = 0 (mean difference is 0)                  t = (.093 - 0)/SEM = .093/.1446 = .643.
    Ha : µ Not = 0 (there is a difference)            t is less than .685 (d.f. = 24) which is right-tail t* for probability .25
                                                                         Because test is 2-sided, double the tail: .50.  P value is > .50.
                                                  No evidence for difference.

Done: Robustness of t-procedures:  A confidence interval or significance test is called robust if the confidence level or P-value doesn't change very much when the assumptions of the procedure are violated.  pp. 462-465.
t-procedures are quite robust against nonnormality. But sensitive to outliers. Look at data.  Need SRS!
 Details:  n <15   t ok unless data clearly not normal, or if there are outliers.
              n > 15  t ok unless there is strong skewness, or outliers.
              n > 40 or so:  t ok even if there is skewness.  (Outliers?  I suggest trying with and without them, see what changes).
Matched-pairs data (differences) are often more normal in shape than the separate variables ("oddness" is often the same for both items in a pair, and disappears in subtraction.  Another reason why this is a nice experimental design. ) (Not  true in the full-moon example.)

What if t's not suitable?
 Skewness:  Try log or other transformation, work on transformed data.  (Sadly, CI's can't be transformed back. Because  µlog(X) is not equal to log(µX) )
 Outliers or other nonnormality:  Distribution-free/ nonparametric procedures.  Usually less power than distribution-based. (Uses less information, duh!)  Often based on binomial or similar models.


Sievers home  Math251-Fall05/Dayps35.htm  11am   11/16/05
This page belongs to Sally Sievers who is solely responsible for its content. Please see our statement of responsibility.
5370
4890
5100
4500
5260
5150
4900
4760
4700
4870