forked from pbstark/StatNotes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lec1.tex
4870 lines (3578 loc) · 155 KB
/
lec1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[landscape]{slides}
\usepackage { hyperref}
\usepackage {color,amsmath}
% amsfonts, amsmath, amsthm, amssymb}
\usepackage {graphicx}
% \usepackage{natbib}
\oddsidemargin 0in
\evensidemargin 0in
\topmargin -0.2in
\topskip 0pt
\footskip 20pt
\textheight 6.5in
\textwidth 8.5in
\newcommand{\EE}{\hbox{I\kern-.1667em\hbox{E}}}
\newcommand{\card}{\mbox{\#}}
\newcommand{\diam}{{\rm diam}}
\newcommand{\sgn}{{\rm sgn}}
\newcommand{\med}{{\rm med}}
\newcommand{\MAD}{{\rm MAD}}
\newcommand{\SD}{{\rm SD}}
\newcommand{\Prob}{{\bf P}}
\newcommand{\cGbar}{{\bar{{\c G}}}}
\newcommand{\RR}{{\mbox{RR}}}
\newcommand{\cA}{{\cal A}}
\newcommand{\cB}{{\cal B}}
\newcommand{\cC}{{\cal C}}
\newcommand{\cD}{{\cal D}}
\newcommand{\cF}{{\cal F}}
\newcommand{\cG}{{\cal G}}
\newcommand{\cH}{{\cal H}}
\newcommand{\cI}{{\cal I}}
\newcommand{\cJ}{{\cal J}}
\newcommand{\cK}{{\cal K}}
\newcommand{\cL}{{\cal L}}
\newcommand{\cM}{{\cal M}}
\newcommand{\cN}{{\cal N}}
\newcommand{\cP}{{\cal P}}
\newcommand{\cQ}{{\cal Q}}
\newcommand{\cR}{{\cal R}}
\newcommand{\cS}{{\cal S}}
\newcommand{\cT}{{\cal T}}
\newcommand{\cU}{{\cal U}}
\newcommand{\cV}{{\cal V}}
\newcommand{\cW}{{\cal W}}
\newcommand{\cX}{{\cal X}}
\newcommand{\cY}{{\cal Y}}
\newcommand{\cZ}{{\cal Z}}
\newcommand{\bfC}{{\bf C}}
\newcommand{\bfT}{{\bf T}}
\newcommand{\bfQ}{{\bf Q}}
\newcommand{\bfone}{{\bf 1}}
\def\Real{\hbox{I\kern-.17em\hbox{R}}}
\def\Prob{\hbox{I\kern-.17em\hbox{P}}}
\def\Expect{\hbox{I\kern-.17em\hbox{E}}}
\newcommand{\bfR}{\Real}
\newcommand{\bfRn}{\bfR^n}
\newcommand{\bfRN}{\bfR^N}
\newcommand{\bfRm}{\bfR^m}
\def\bfN{\hbox{I\kern-.17em\hbox{N}}}
\newcommand{\bfZ}{{\bf Z}}
\newcommand{\bfX}{{\bf X}}
\newcommand{\bfx}{{\bf x}}
\newcommand{\bfs}{{\bf s}}
\newcommand{\bfe}{{\bf e}}
\newcommand{\bfhats}{{\bf \hat{s}}}
\newcommand{\bfex}{{\langle \bfe , x \rangle }}
\newcommand{\gbar}{{\bar{g}}}
\newcommand{\Fhatn}{{\hat{F}_n}}
\newcommand{\beq}{\begin{equation}}
\newcommand{\eeq}{\end{equation}}
\newcommand{\Tau}{{\bf T}}
\newcommand{\Eta}{{\c E}}
\newcommand{\sech}{{\rm sech}}
\newcommand{\ip}[2]{{\langle #1 , #2 \rangle }}
%\newcommand{\choose}[2]{{{\left ( \begin{array}{c} #1\\#2 \end{array} \right )}}}
\newcommand{\linSpan}{{\rm span}}
\newcommand{\dom}{{\rm dom}}
\newcommand{\gto}{{\rm G\^{a}teaux }}
\newcommand{\IF}{{\rm IF}}
\newcommand{\supp}{{\rm supp}}
\newcommand{\Pranki}{{P_{(i)}}}
\newcommand{\Prankj}{{P_{(j)}}}
\newcommand{\Prank}[1]{{P_{(#1)}}}
\newcommand{\Hrank}[1]{{H_{(#1)}}}
\newcommand{\Hranki}{{H_{(i)}}}
\newcommand{\Var}{{\bf Var}}
\newcommand{\Bias}{{\bf Bias}}
\newcommand{\jackProb}[1]{{\Prob_{(#1)}}}
\newcommand{\bootProb}{{\Prob^*}}
\newcommand{\rightWarrow}{{\begin{array}{c} {} \\ \rightarrow \\ W \end{array}}}
\newcommand{\Binomial}{{\mbox{ Binomial}}}
\newcommand{\Bernoulli}{{\mbox{ Bernoulli}}}
\newcommand{\heading}[1]{\begin{center}{\large #1}\end{center}}
\renewcommand {\thefootnote} {\fnsymbol{footnote}}
\newcommand{\framed}[1]{%
\begin{center}
\fbox{
\begin{minipage}{6.2in}
#1
\end{minipage}
}
\end{center}
}
\begin{document}
\pagestyle{headings}
\definecolor{one}{rgb}{0.0, 0.6, 0}
\definecolor{two}{rgb}{0.102, 0.153, 0.204}
%----------------------------------------------------------------------------------
\begin{slide}
%\colorbox{blue}{color}
\begin {center}
\textcolor{blue}{\large Statistics 240: Nonparametric and Robust Methods}
\vspace*{0.7in}{Philip B. Stark\\
Department of Statistics\\
University of California, Berkeley\\
{\tt statistics.berkeley.edu/$\sim$stark}
}
ROUGH DRAFT NOTES---WORK IN PROGRESS!
Last edited 10 November 2010
\end{center}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Course Website:}}
\url{http://statistics.berkeley.edu/~stark/Teach/S240/F10}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Example: Effect of treatment in a randomized controlled experiment}}
11 pairs of rats, each pair from the same litter.
Randomly---by coin tosses---put one of each pair into
``enriched'' environment; other sib gets ''normal'' environment.
After 65 days, measure cortical mass (mg).
{\small
\begin{tabular}{lrrrrrrrrrrr}
treatment & 689& 656& 668& 660& 679& 663& 664& 647& 694& 633& 653 \cr
control & 657& 623& 652& 654& 658& 646& 600& 640& 605& 635& 642 \cr
\hline
difference & 32& 33& 16& 6& 21& 17& 64& 7& 89& -2& 11
\end{tabular}
}
{\textcolor{one}{How should we analyze the data?}}
{\tiny
(Cartoon of \cite{rosenzweigEtal72}. See also \cite{bennettEtal69} and
\cite[pp.~498ff]{freedmanEtal07}.
The experiment had 3 levels, not 2, and there were several trials.)
}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Informal Hypotheses}}
{\textcolor{one}{Null hypothesis:}} treatment has ``no effect.''
{\textcolor{one}{Alternative hypothesis:}} treatment increases cortical mass.
Suggests 1-sided test for an increase.
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Test contenders}}
\begin{itemize}
\item 2-sample Student $t$-test:
$$
\frac{\mbox{mean(treatment) - mean(control)}}
{\mbox{pooled estimate of SD of difference of means}}
$$
\item 1-sample Student $t$-test on the differences:
$$
\frac{\mbox{mean(differences)}}{\mbox{SD(differences)}/\sqrt{10}}
$$
Better, since littermates are presumably more homogeneous.
\item Permutation test using $t$-statistic of differences:
same statistic, different way to calculate $P$-value.
Even better?
\end{itemize}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Strong null hypothesis}}
{\textcolor{one}{Treatment has no effect whatsoever---as if cortical mass were
assigned to each rat before the randomization.}}
Then equally likely that the rat with the heavier cortex will be assigned
to treatment or to control, independently across littermate pairs.
Gives $2^{11} = 2,048$ equally likely possibilities:
{\small
\begin{tabular}{lrrrrrrrrrrr}
difference & $\pm$32& $\pm$33& $\pm$16& $\pm$6& $\pm$21& $\pm$17&
$\pm$64& $\pm$7& $\pm$89& $\pm$2& $\pm$11
\end{tabular}
}
For example, just as likely to observe original differences as
{\small
\begin{tabular}{lrrrrrrrrrrr}
difference & -32& -33& -16& -6& -21& -17& -64& -7& -89& -2& -11
\end{tabular}
}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Weak null hypothesis}}
{\textcolor{one}{On average across pairs, treatment makes no difference.}}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Alternatives}}
{\textcolor{one}{Individual's response depends only on that individual's assignment}}
Special cases: shift, scale, etc.
{\textcolor{one}{Interactions/Interference: my response could depend on whether you are assigned to treatment or control.}}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Assumptions of the tests}}
\begin{itemize}
\item 2-sample $t$-test:
{\textcolor{one}{masses are iid sample from normal distribution,
same unknown variance, same unknown mean.}}
Tests weak null hypothesis (plus normality, independence, non-interference, etc.).
\item 1-sample $t$-test on the differences:
{\textcolor{one}{mass differences are iid sample from normal
distribution, unknown variance, zero mean.}}
Tests weak null hypothesis (plus normality, independence, non-interference, etc.)
\item Permutation test:
{\textcolor{one}{Randomization fair, independent across pairs.}}
Tests strong null hypothesis.
\end{itemize}
Assumptions of the permutation test are true by design: That's how treatment
was assigned.
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Student $t$-test calculations}}
Mean of differences: 26.73mg \\
Sample SD of differences: 27.33mg \\
$t$-statistic: $26.73/(27.33/\sqrt{10}) = 3.093$.\\[1ex]
$P$-value for 1-sided $t$-test: 0.0044
{\textcolor{one}{Why do cortical weights have normal distribution?}}
{\textcolor{one}{Why is variance of the difference between treatment and control
the same for different litters?}}
{\textcolor{one}{Treatment and control are {\em dependent\/} because assigning
a rat to treatment excludes it from the control group, and vice versa.}}
{\textcolor{one}{Does $P$-value depend on assuming differences
are iid sample from a normal distribution? If we reject the null, is that because
there is a treatment effect, or because the other assumptions are wrong?}}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Permutation $t$-test calculations}}
Could enumerate all $2^{11} = 2,048$ equally likely possibilities.
Calculate $t$-statistic for each.
$P$-value is
$$
P = \frac{\mbox{number of possibilities with $t \ge 3.093$}}{\mbox{2,048}}
$$
(For mean instead of $t$, would be $2/2,048 = 0.00098$.)
For more pairs, impractical to enumerate, but can simulate:
Assign a random sign to each difference. \\
Compute $t$-statistic \\
Repeat 100,000 times
$$
P \approx \frac{\mbox{number of simulations with $t \ge 3.093$}}{\mbox{100,000}}
$$
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Calculations}}
{\small
\begin{verbatim}
simPermuTP <- function(z, iter) {
# P.B. Stark, statistics.berkeley.edu/~stark 5/14/07
# simulated P-value for 1-sided 1-sample t-test under the
# randomization model.
n <- length(z)
ts <- mean(z)/(sd(z)/sqrt(n-1)) # t test statistic
sum(replicate(iter, {zp <- z*(2*floor(runif(n)+0.5)-1);
tst <- mean(zp)/(sd(zp)/sqrt(n-1));
(tst >= ts)
}
)
)/iter
}
simPermuTP(diffr, 100000)
0.0011
\end{verbatim}
(versus 0.0044 for Student's t distribution)
}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Other tests: sign test, Wilcoxon signed-rank test}}
{\textcolor{one}{Sign test:}}
Count pairs where treated rat has heavier cortex, i.e., where
difference is positive.
Under strong null, distribution of the number of positive differences
is Binomial(11, 1/2). Like number of heads in 11 independent tosses
of a fair coin. (Assumes no ties w/i pairs.)
$P$-value is chance of 10 or more heads in 11 tosses of a fair coin: 0.0059.
Only uses signs of differences, not information that only the smallest absolute
difference was negative.
{\textcolor{one}{Wilcoxon signed-rank test}} uses information about the
ordering of the
differences: rank the absolute values of the differences, then give them
the observed signs and sum them. Null distribution: assign signs at random
and sum.
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Still more tests, for other alternatives}}
All the tests we've seen here are sensitive to {\em shifts\/}--the alternative
hypothesis is that treatment
increases response (cortical mass).
There are also nonparametric tests that are sensitive to other
treatment effects, e.g., treatment increases the variability of the
response.
And there are tests for whether treatment has any effect at all on
the distribution of the responses.
You can design a test statistic to be sensitive to any change that
interests you, then use the permutation distribution to get a $P$-value
(and simulation to approximate that $P$-value).
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Silliness}}
Treat ordinal data (e.g., Likert scale) as if measured on a linear scale;
use Student $t$-test.
Maybe not so silly for large samples$\ldots$
$t$-test asymptotically distribution-free.
How big is big?
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Back to Rosenzweig et al.}}
Actually had 3 treatments: enriched, standard, deprived.
Randomized 3 rats per litter into the 3 treatments, independently across
$n$ litters.
{\textcolor{one}{How should we analyze these data?}}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Test contenders}}
$n$ litters, $s$ treatments (sibs per litter).
\begin{itemize}
\item ANOVA--the $F$-test:
$$
F = \frac{\mbox{BSS}/(s-1)}{\mbox{WSS}/(n-s)}
$$
\item Permutation $F$-test: use permutation
distribution instead of $F$ distribution to get $P$-value.
\item Friedman test: Rank within litters. Mean rank for treatment $i$
is $\bar{R}_i$.
$$
Q = \frac{12n}{s(s+1)} \sum_{i=1}^s \left ( \bar{R}_i - \frac{s+1}{2} \right )^2.
$$
$P$-value from permutation distribution.
\end{itemize}
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Strong null hypothesis}}
{\textcolor{one}{Treatment has no effect whatsoever---as if cortical mass were
assigned to each rat before the randomization.}}
Then equally likely that each littermate is assigned to each treatment,
independently across litters.
There are $3! = 6$ assignments of each triple to treatments.
Thus, {\textcolor{one}{$6^n$ equally likely assignments across all litters.}}
For 11 litters, that's 362,797,056 possibilities.
\end{slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Weak null hypothesis}}
{\textcolor{one}{The average cortical weight for all three treatment groups are equal.
On average across triples, treatment makes no difference.}}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Assumptions of the tests}}
\begin{itemize}
\item $F$-test:
{\textcolor{one}{masses are iid sample from normal distribution,
same unknown variance, same unknown mean for all litters and treatments.}}
Tests weak null hypothesis.
\item Permutation $F$-test:
{\textcolor{one}{Randomization was as advertised: fair, independent
across triples.}}
Tests strong null hypothesis.
\item Friedman test:
{\textcolor{one}{Ditto}.}
\end{itemize}
Assumptions of the permutation test and Friedman test are true by design:
that's how treatment was assigned.
Friedman test statistic has $\chi^2$ distribution asymptotically. Ties are a complication.
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc $F$-test assumptions--reasonable?}}
{\textcolor{one}{Why do cortical weights have normal distribution for each
litter and for each treatment?}}
{\textcolor{one}{Why is the variance of cortical weights the same for different
litters?}}
{\textcolor{one}{Why is the variance of cortical weights the same for
different treatments?}}
\end {slide}
%----------------------------------------------------------------------------------
%\begin {slide}
%{\textcolor{blue}{\sc Coding the permutation $F$-test}}
%
%
%
%\end {slide}
%
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc Is $F$ a good statistic for this alternative?}}
{\textcolor{one}{$F$ (and Friedman statistic) sensitive to differences among the
mean responses for each treatment group, no matter what pattern the differences
have.}}
But the treatments and the responses can be ordered: we hypothesize that
more stimulation produces greater cortical mass.
\begin{tabular}{lclcl}
deprived & $\Longrightarrow$ & normal & $\Longrightarrow$ & enriched \cr
low mass & $\Longrightarrow$ & medium mass & $\Longrightarrow$ & high mass
\end{tabular}
{\textcolor{one}{Can we use that to make a more sensitive test?}}
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc A test against an ordered alternative}}
{\textcolor{one}{Within each litter triple, count pairs of responses
that are ``in order.'' Sum across litters.}}
E.g., if one triple had cortical masses
\begin{tabular}{l|r}
deprived & 640 \cr
normal & 660 \cr
enriched & 650 \\
\end{tabular}
that would contribute 2 to the sum: $660 \ge 640$, $650 \ge 640$, but $640 < 650$.
Each litter triple contributes between 0 and 3 to the overall sum.
Null distribution for the test based on the permutation distribution: 6
equally likely assignments per litter, independent across litters.
\end {slide}
%----------------------------------------------------------------------------------
\begin {slide}
{\textcolor{blue}{\sc A different test against an ordered alternative}}
{\textcolor{one}{Within each litter triple, add differences
that are ``in order.'' Sum across litters.}}
E.g., if one triple had cortical masses
\begin{tabular}{l|r}
deprived & 640 \cr
normal & 660 \cr
enriched & 650 \\
\end{tabular}
that would contribute 30 to the sum: $660 - 640 = 20$ and $650 - 640 = 10$, but $640 < 650$,
so that pair contributes zero.
Each litter triple contributes between 0 and $2\times{\mbox{ range }}$ to the sum.
Null distribution for the test based on the permutation distribution: 6
equally likely assignments per litter, independent across litters.
\end {slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{\sc Quick overview of nonparametrics, robustness}}\\
{\textcolor{one}{Parameters: related notions}}
\begin{itemize}
\item Constants that index a family of functions--e.g., the normal curve
depends on $\mu$ and $\sigma$ ($f(x) =
(2 \pi)^{1/2} \sigma^{-1} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$)
\item A property of a probability distribution, e.g., 2nd moment, a percentile, etc.
\end{itemize}
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{one}{Parametric statistics:}} assume a functional form for the
probability distribution of the observations; worry perhaps about some parameters
in that function.
{\textcolor{one}{Non-parametric statistics:}} fewer, weaker assumptions about the
probability distribution. E.g., randomization model, or observations are iid.
{\textcolor{one}{Density estimation, nonparametric regression:}}
Infinitely many parameters. Requires regularity assumptions to make inferences.
Plus iid or something like it.
{\textcolor{one}{Semiparametrics:}} Underlying functional form unknown, but relationship
between different groups is parametric. E.g., Cox proportional hazards model.
{\textcolor{one}{Robust statistics:}} assume a functional form for the probability
distribution, but worry about whether the procedure is sensitive to ``small'' departures
from that assumed form.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Groups}}
A {\em group\/} is an ordered pair $(\cG, \times)$, where $\cG$ is
a collection of objects (the elements of the group) and $\times$ is a mapping
from $\cG \bigotimes \cG$ onto $\cG$,
\begin{eqnarray*}
\times : & \cG \bigotimes \cG & \rightarrow \cG \\
& (a, b) & \mapsto a \times b,
\end{eqnarray*}
satisfying the following axioms:
\begin{enumerate}
\item $\exists e \in \cG$ s.t. $\forall a \in \cG$, $e \times a = a$.
The element $e$ is called the {\em identity\/}.
\item For each $a \in \cG$, $\exists a^{-1} \in \cG$ s.t. $a^{-1}\times a = e$.
(Every element has an inverse.)
\item If $a, b, c \in \cG$, then $a \times (b \times c) = (a \times b)\times c$.
(The group operation is associative.)
\end{enumerate}
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Abelian groups}}
If, in addition, for every $a, b \in \cG$, $a \times b = b \times a$ (if the group
operation commutes), we say that $(\cG, \times)$ is an {\em Abelian group\/}
or {\em commutative group\/}.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{The permutation group}}
Consider a collection of $n$ objects, numbered $1$ to $n$.
A {\em permutation\/} is an ordering of the objects.
We can represent the permutation as a vector.
The $k$th component of the vector is the number of the object
that is $k$th in the ordering.
For instance, if we have $5$ objects, the permutation
\beq
(1, 2, 3, 4, 5)
\eeq
represents the objects in their numbered order, while
\beq
(1, 3, 4, 5, 2)
\eeq
is the permutation that has item~1 first, item~3 second,
item~4 third, item~5 fourth, and item~2 fifth.
Permutations as matrices.
Associativity follows from associativity of matrix multiplication.
[FIX ME!]
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{The permutation group is not Abelian}}
For instance, consider the permutation group on 3 objects.
Let $\pi_1 \equiv (2, 1, 3)$ and $\pi_2 \equiv (1, 3, 2)$.
Then $\pi_1 \pi_2 (1, 2, 3) = (3, 1, 2)$, but $\pi_2 \pi_1 (1, 2, 3) = (2, 3, 1)$.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Simulation: pseudo-random number generation}}
Most computers cannot generate truly random numbers, although
there is special equipment that can (usually, these rely on
a physical source of ``noise,'' such as a resistor or a
radiation detector).
Most so-called random numbers generated by computers are really
``pseudo-random'' numbers, sequences generated by a software algorithm
called a pseudo-random number generator (PRNG)
from a starting point, called a {\em seed\/}.
Pseudo-random numbers behave much like random numbers for many purposes.
The seed of a pseudo-random number generator can be thought of as the initial
state of the algorithm.
Each time the algorithm produces a number, it alters its state---deterministically.
If you start a given algorithm from the same seed, you will get
the same sequence of pseudo-random numbers.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
Each pseudo-random number generator has only finitely many states.
Eventually---after the {\em period\/} of the generator, the generator
gets back to its initial state and the sequence repeats.
If the state of the PRNG is $n$ bits long, the period of the PRNG is at most
$2^n$ bits---but can be substantially shorter, depending on the algorithm.
Better generators have more states and longer periods, but that comes
at a price: speed.
There is a tradeoff between the computational efficiency of a pseudo-random
number generator and the difficulty of telling that its output is not really
random (measured, for example, by the number of bits one must examine).
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Evaluating PRNGs}}
See \url{http://csrc.nist.gov/rng/} for a suite of tests of pseudo-random number
generators.
Tests can be based on statistics such as the number of zero and one bits in a
block or sequence, the number of runs in sequences of differing lengths,
the length of the longest run, spectral properties, compressibility (the less
random a sequence is, the easier it is to compress), and so on.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
You should check which PRNG is used by any software package you rely on
for simulations.
Linear Congruential Generators are of the form
$$ x_i = ((a x_{i-1} +b) \mod m)/(m-1).$$
They used to be popular but are best avoided.
(They tend to have a short period, and the sequences have underlying
regularity that can spoil performance for many purposes.
For instance, if the LCG is used to generate $n$-dimensional points, those points
lie on at most $m^{1/n}$ hyperplanes in $\bfR^n$.
For statistical simulations, a particularly good, efficient
pseudo-random number generator
is the Mersenne Twister.
The state of the Mersenne Twister is a 624-vector of 32-bit integers and a pointer to
one of those vectors.
It has a period of $2^{19937}-1$, which is on the order of $10^{6001}$.
It is implemented in R (it's the default), Python, Perl, and many other languages.
For cryptography, a higher level of randomness is needed than for most
statistical simulations.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
No pseudo-random number generator is best for all purposes.
But some are truly terrible.
For instance, the PRNG in Microsoft Excel is a faulty implementation of an
algorithm (the Wichmann-Hill algorithm, which combines four LCGs)
that isn't good in the first place.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
McCullough, B.D., Heiser, David A., 2008. On the accuracy of statistical procedures in Microsoft Excel 2007. {\em Computational Statistics and Data Analysis 52\/} (10), 4570--4578.
\url{http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V8V-4S1S6FC-5-3F&_cdi=5880&_user=4420&_orig=mlkt&_coverDate=06\%2F15\%2F2008&_sk=999479989&view=c&wchp=dGLbVzb-zSkWb&md5=85d93a6c0700f2dbc483f5ed6b239db2&ie=/sdarticle.pdf}
Excerpt: Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests in three areas: statistical distributions, random number generation, and estimation. Additional errors in specific Excel procedures are discussed. Microsoft's continuing inability to correctly fix errors is discussed. No statistical procedure in Excel should be used until Microsoft documents that the procedure is correct; it is not safe to assume that Microsoft Excel's statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.
If users could set the seeds, it would be an easy matter to compute successive values of the WH RNG and thus ascertain whether Excel is correctly generating WH RNGs. We pointedly note that Microsoft programmers obviously have the ability to set the seeds and to verify the output from the RNG; for some reason they did not do so. Given Microsoft's previous failure to implement correctly the WH RNG, that the Microsoft programmers did not take this easy and obvious opportunity to check their code for the patch is absolutely astounding.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
McCullough, B.D., 2008. Microsoft's `Not the Wichmann-Hill' random number generator.
{\em Computational Statistics and Data Analysis 52\/} (10), 4587--4593.
\url{http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B6V8V-4S21TGC-2-22&_cdi=5880&_user=4420&_orig=search&_coverDate=06\%2F15\%2F2008&_sk=999479989&view=c&wchp=dGLbVtz-zSkzk&md5=38238ccd25a60a408480df345be88e34&ie=/sdarticle.pdf}
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Drawing (pseudo-)random samples using PRNGs}}
A standard technique for drawing a pseudo-random sample of size $n$ from
$N$ item is to assign each of the $N$ items a pseudo-random number, then take the
sample to be the $n$ items that were assigned the $n$ smallest pseudo-random numbers.
Note that when $N$ is large and $n$ is a moderate fraction of $N$, PRNGs might
not be able to generate all ${{N}\choose{n}}$ subsets.
Henceforth, will assume that the PRNG is ``good enough'' that its departure from
randomness does not affect the accuracy our simulations enough to matter.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Bernoulli trials}}
A {\em Bernoulli trial\/} is a random experiment with two possible outcomes, success and failure.
The probability of success is $p$; the probability of failure is $1-p$.
Events $A$ and $B$ are {\em independent\/} if $P(AB) = P(A)P(B)$.
A collection of events is independent if the probability of the intersection of every subcollection is equal to the product of the probabilities of the members of that subcollection.
Two random variables are independent if every event determined by the first random variable is independent of every event determined by the second.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Binomial distribution}}
Consider a sequence of $n$ independent Bernoulli trials with the same probability $p$
of success in each trial.
Let $X$ be the total number of successes in the $n$ trials.
Then $X$ has a binomial probability distribution:
\beq
\Pr(X=x) = {{n}\choose{x}} p^x (1-p)^{n-x}.
\eeq
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Hypergeometric distribution}}
A {\em simple random sample of size $n$\/} from a finite population of $N$ things is a
random sample drawn without replacement in such a way that each of the
${{N}\choose{n}}$ subsets of size $n$ from the population is equally likely to be the sample.
Consider drawing a simple random sample from a population of $N$ objects of which
$G$ are good and $N-G$ are bad.
Let $X$ be the number of good objects in the sample.
Then $X$ has a hypergeometric distribution:
\beq
P(X=x) = \frac{{{G}\choose{x}} {{N-G}\choose{n-x}}}{{{N}\choose{n}}},
\eeq
for $\max(0, n-(N-G)) \le x \le \min(n, G)$.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Hypothesis testing}}
Much of this course concerns hypothesis tests.
We will think of a test as consisting of a set of possible outcomes (data), called
the {\em acceptance region\/}.
The complement of the acceptance region is the {\em rejection region\/}.
We reject the null hypothesis if the data are in the rejection region.
The {\em significance level\/} $\alpha$ is an upper bound on the
chance that the outcome will turn out to be in the rejection region
if the null hypothesis is true.
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Conditional tests}}
The chance is sometimes a conditional probability rather than an unconditional
probability.
That is, we have a rule that generates an acceptance region that depends
on some aspect of the data.
We've already seen an example of that in the rat cortical mass experiment.
There, we conditioned on the cortical masses, but not on the
the randomization.
If we test to obtain conditional significance level $\alpha$ (or smaller) no matter what
the data are, then the unconditional significance level is still $\alpha$:
\begin{eqnarray*}
\Pr \{ \mbox{Type I error} \} &=& \int_x \Pr \{ \mbox{Type I error} | X = x \} \mu(dx) \\
& \le & \sup_x \Pr \{ \mbox{Type I error} | X = x \}.
\end{eqnarray*}
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{$P$-values}}
Suppose we have a family of hypothesis tests for testing a given
null hypothesis at every significance level $\alpha \in (0, 1)$.
Let $A_\alpha$ denote the acceptance region for the test at
level $\alpha$.
Suppose further that the tests {\em nest\/}, in the sense that
if $\alpha_1 < \alpha_2$, then $A_{\alpha_1} \subset A_{\alpha_2}$
Then the $P$-value of the hypothesis (for data $X$) is
\beq
\inf \{ \alpha : X \notin A_\alpha \}
\eeq
\end{slide}
%----------------------------------------------------------------------------------
\begin{slide}
{\textcolor{blue}{Confidence sets}}
We have a collection of hypotheses $\cH$.
We know that some $H \in \cH$ must
be true---but we don't know which one.
A rule that uses the data to select a subset of $\cH$ is
a {\em $1-\alpha$ confidence procedure\/} if the chance that it selects a subset
that includes $H$ is at least $1-\alpha$.
The subset that the rule selects is called a {\em $1-\alpha$ confidence set\/}.
The {\em coverage probability at $G$\/} is the chance that the rule selects a
set that includes $G$ if $G \in \cH$ is the true hypothesis.
\end{slide}