Math 151 , Spring 2005,  Day 22 Monday March 28 Hit reload ...

--SPSS sampling quirk.  It gives you the same samples each time if you start from opening SPSS.  It starts from the same fixed  "seed" each time. (D&V p.217 bottom).  (Like always starting at line 20 of the random number table).  From there, it seems to keep on reading from where it left off, each time you do a new analysis requiring random numbers.  How to get "different" numbers?
At the
beginning of your session:   Do Transform> Random number seed>    Make sure Random Seed is selected, and click OK.   This does the equivalent of closing your eyes and putting your finger on the random number table page, to start with.
(Without your clicking OK, it starts with seed 2,000,000. ) Now you (and others) will get a different sequence of random numbers/samples even though you open SPSS and do exactly the same thing.  You only need to do it once, to start each SPSS session. 
  More here.

 Sample exam 2 available (linked here) & outside my door. (Actual exam would be 5 pages, probably) Solutions outside my door and on reserve (soon.)

Exam 2 this Friday (Day 24, Apr.1).  Covers thru today's HW (but no more than Part III).  Let me know by Wed. if you need a special time to take the exam.
How much computational detail from part II?  You don't need to know the formula for the correlation coefficient, but you should be able to guess roughly the r from a scatterplot, and know and use the properties pp.121-2.You will need to know, among other things,  how to find b0 and b1 from the means, standard deviations, and r of the x-and y-values,  and to give the formula for the regression line, (like 17, p.154); and to graph the regression line on top of the scatterplot.  Also find by hand the value that the line predicts for a particular x.  You should be able to identify and calculate the residual value for a particular x-y point as its vertical distance from the line (negative if the point is below the line), and identify and understand potential influential points.  You should know  that the regression line goes through the point given by the two means, and that the  regression line "rises" r standard deviations in y for each standard deviation increase in x (pp. 137-8); also that the regression line of "weight" on "height" is not the same line as the regression line of "height" on "weight" . You should be able to describe verbally the meaning of R2 in the context of a data set.

Day 22 (Mon. March 28): Reading: D&V Ch 12, 13. Review part III p. 262.  AS13.  Bring questions; Parts II and III 
    Next, D&V Part IV: Ch. 14, Ch.15 thru p.  291 (then Ch. 18 &on.) ActivStats is very good for part IV--Ch11 shows Law of Large Numbers as D&V express it. Ch14, 15 correspond well with the text and present very good examples.
Hand in 
Chapter 13, p257ff.
 1,2,4,5,6,10,11,12 You did the "observational study" ones, and started the "experiment" ones. Finish these for those that are experiments,  add 17, 18
32 Shingles
35 Safety switch
36 Washing clothes

From Review part III, p. 263ff.
26 Laundry
34 Pubs

A. Do the Chart experiment-- ActivStats 13-3, first activity.  Save your data with your name on the file, remembering where you saved it.  Do the next two SPSS activities, on that page. (One error in the tutorials: Your files are NOT of type .txt; they are of type .dat.  Safest--use "all files" to locate them.)  Hand in the graphs you made, writing what results you see and whether you think they are "statistically significant".

B. REDO the assignment on the handout Using SPSS to find a Simple Random Sample, this time doing Transform> Random number seed>    Make sure Random Seed is selected, and click OK. first.  Also: Find the mean  duration, for your sample, and for the whole set..   Bring to class to pool your results.     More here on seeds.  

= = = = = = = = = = = = = =
ActivStats, Ch.11 HW, ACT 1 and ACT 2
Chapter 14, p. 280ff
1 Roulette
Winter
Crash

9 Spinner
11 a Car repairs
13 a M&M's

Using independence:
11 b Car repairs
13 b M&M's
19 Champion bowler
15 Disjoint or indep?  Read p.290 top, with this.

Read,
  to 
discuss 
 
 
 
 
 

Review
Part III:
p. 263 ff:
1 thru 
17 odds,
 +12, 18

 

= = = = =
Chapter 14: Read p.283
#25, read answers in back. (a) should have 0.001, not 0.00 for the answer.

Op-
tion-
al 

Homework questions? Day 21
Chapter 13: Experiment: Continue Day 21   Brief summary:   All about avoiding BIAS
Principles of designing a comparative experiment (p. 243)

Results:  Measure differences in the response variable for different treatments 
 "Statistically Significant" differences--too big to have plausibly occurred by chance

Block designs: (not "completely randomized")
(Randomized) Block design:  Sort experimental units into "Blocks" = groups homogeneous on potentially confounding variables:     Within each block, randomize the treatments. Compare results  within each block, then summarize all results.

Matched pairs is a special case of block design--each pair is a little "block":
Matched pairs: In experiment, to compare Control and experimental treatments (i.e. 2 levels)
   Sort experimental units into "matching" pairs.   One member of pair gets control, other gets experimental.
                Randomize which.  Compare within pair (find difference), then summarize all comparisons.
  Matched with self is common.  Eliminates extraneous variability.
     (Matching is also often used in observational studies)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Part IV: Randomness and Probability(Why?)
 
We know that a sample from a population will not exactly represent the population.  If we take a random sample, the behavior of samples will not be individually predictable, but there will be predictable pattern in many random samples from the same population.  Knowing the pattern will be  as good as we can do.  Need probability.
Recall (Day19): p. 227        Sample Chosen from a  Population
 
Numerical summary: Statistic (Latin)     Parameter(Greek letter)
   
The actual value of the Statistic will vary, depending on the particular sample. "Sampling variability" = "Sampling error"
The Statistic "estimates" the Parameter.  We hope it is close to the parameter.  If we choose simple random samples, we can understand the pattern of values the statistic can take.
Some examples of  statistics:
    Height:   U.S. young women: pop. mean= 64.5", pop. s.d. 2.5"  (text p.66.  Caveat: rounded?)
                                               Math 151, Spring '01,  xbar = 64.2,     s = 3.75.
                                                                        Fall '01,   xbar = 65.01,    s = 3.22.
                                                                     Spring '02,  xbar = 64.53,    s = 2.91.
                                                                       Fall '02,    xbar = 63.89,     s = 2.48.
                                                                   Spring '03,  xbar = 64.98,    s = 3.29
                                                                     Spring '04,  xbar = 65.33,    s = 2.25
    Coin flip: Proportion of heads  p = 1/2 (?)       p-hat =  256/520 = .492  (combined data from many past classes)
    Thumbtack:  Proportion of point-up p =  (??)       p-hat =  441/691 = .6382  (one past class, Math 251)
  
Chance  behavior (a random phenomenon): Unpredictable in the short run,  predictable regular pattern in the long run.   
(Prof. Persi Diaconis (a table magician) can flip a coin so precisely it always comes up the way he wants.  His coinflipping is not a random phenomenon.  Mine is.

"Probability" of particular something happening: proportion of times it would happen in a very long series of independent repetitions (trials) of the phenomenon: "long-run relative frequency".
    (independence:  outcome of one trial must not influence the outcome of any other.)

Law of Large Numbers (LLN):  Relative frequency of repeated independent trials gets closer  to the "true" relative frequency as the number of trials increases.
  (But it may take a long time: Large Numbers of trials. Use  http://www.whfreeman.com/scc -- "Probability " 1 toss at a time--settles down slowly.   )
(&&Another version of  LLN says the mean from a sample of size n gets closer and closer to the true = "population" mean, as you take bigger samples (as n increases).  Activstats presents this, 14-1, and we'll return to this soon.)

Aberrations won't be compensated for; they will only be swamped out.  (Misconception of "law of averages.")

Probability Model:
A Random phenomenon,

    Sample space S:  set of all possible outcomes (no overlap of descriptions) (def. p. 284)
    Event:  any  set of outcomes (including one outcome, & even the set containing no outcomes)
    Probability model: S, and a way of assigning a probability to each event.
&&Sample space depends on what you want to know:
Phenomenon: Flip coin twice.
    S1 = {HH, HT, TH, TT}     S2 = {0, 1, 2} number of heads   S3 = {Y, N} both are heads?

Probability rules:  pp. 274-5, in words, then in notation.
A an event in sample space S, P(A) is "the probability that  A occurs"
    These rules are all true for proportions in long run (Probabilities), prop.of counts, proportions of areas.
    1.  0 < P(A) < 1
    2. P(S) = 1
    3. For any event A, P(A does not occur) = 1 - P(A)
    4.  A and B are  disjoint if they have no outcomes in common (can't happen simultaneously.)
        If A and B are disjoint, their probabilities add:  P(A or B) = P(A) + P(B)

Pick one person from U.S. Pop. (Age 25 +)
Sample space:
No HS degree
       HS only     .
1-3 yrs College
 4 + yrs College
Proportion in pop.
18.3%
33.9%
24.8%
23.0%
Probability 
.183
.339
.248
.230
P(No 4-year "degree") = ?
P( HS or less) = ?

Finite sample spaces (you can list the outcomes)
Assign a probability to each outcome (>0) so they add to 1.   (Sometimes equal values "equally likely" make sense.)
    Prob. of an event is sum of prob's of its outcomes.

Phenomenon: Flip coin twice.
    S1 = {HH, HT, TH, TT}     S2 = {0, 1, 2} number of heads   S3 = {Y, N} both are heads?
Sample space  | HH | HT | TH | TT |
       Prob's | .25| .25| .25| .25|  P(tail followed by head)=?
Sample space  | 2  |    1    |  0 P(at least 1 tail)=?   P(1 of each) = ?
       Prob's | .25|   .50   | .25|  P(at least 1 Head)= ?  P(2 Heads) = ?
Sample space  | Y  |       N      |
       Prob's | .25|     .75      |

Flipping-coin-twice was built from a simpler phenomenon; flipping coin once: P(H) = .5, P(T) = .5

Rule 5.  If A and B are two independent events, the probability that both A and B occur  is the product of the probabilities of the two events.  P(A and B) = P(A)×P(B), if (and only if) A and B are independent.
 
  Rule 5: Can be used to build probabilities for complex phenomena from simpler ones (Ch. 14); to check structure in existing sample space (Ch. 15.)

e.g. Pick 2 people at random from U.S. pop.  (Pop. is so big that it's hardly changed by removing first. Independence OK)
   P(First has 4+ yrs college, and 2nd didn't graduate HS) = .280×.183 = .051
   P(First didn't graduate HS, and 2nd has 4+ yrs college) = .183×.280 = .051
   P(one didn't graduate HS, and the other has 4+ yrs college) = .051+.051= .102


Sievers home  Math151-Sp05/Days22.htm 4pm 3/26/05
This page belongs to Sally Sievers who is solely r esponsible for its content. Please see our statement of responsibility.