forked from pbstark/StatNotes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ch4.htm
1200 lines (1081 loc) · 47.7 KB
/
ch4.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:pref="http://www.w3.org/2002/Math/preference"
pref:renderer="css">
<head>
<script language="JavaScript1.4" type="text/javascript"><!--
pageModDate = "Saturday 1 March 2008 10:15 PST";
// copyright 1997-2008 by P.B. Stark, statistics.berkeley.edu/~stark.
// All rights reserved.
// -->
</script>
<script language="JavaScript1.4" type="text/javascript" src="../../../Java/irGrade.js">
</script>
<script language="JavaScript1.4" type="text/javascript"><!--
var cNum = "240.4";
writeChapterHead('SeEd',cNum,'Statistics 240 Notes, part 4',false,'../../../SticiGui/',false);
// -->
</script>
</head>
<body onload="setApplets()" onunload="killApplets()">
<script language="JavaScript1.4" type="text/javascript"><!--
// writeChapterNav('../..');
writeChapterTitle();
// -->
</script>
<noscript>
You need a browser that supports JavaScript and Java to use this site,
and you must enable JavaScript and Java in your browser.
</noscript>
<meta HTTP-EQUIV="expires" CONTENT="0">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<form method="post">
<ul>
<li>
Ranks. Behavior under the null and under a shift alternative.
</li>
<li>
The Wilcoxon rank-sum test against a shift alternative.
Connection to permutation tests including Fisher's exact test.
</li>
<li>
The Siegel-Tukey test for an alternative that the dispersions differ.
</li>
<li>
The Smirnov test against the omnibus alternative.
</li>
</ul>
<p>
References: Lehmann, E.L., 1998. <em>Nonparametrics: Statistical
Methods Based on Ranks</em>. Upper Saddle River, N.J.: Prentice Hall;
<a href="http://statistics.berkeley.edu/~stark/SticiGui">SticiGui</a>
<a href="http://statistics.berkeley.edu/~stark/SticiGui/Text/ch19.htm">Chapter 19</a>.
</p>
<h2>Ranks</h2>
<p>
The <em>rank</em> of an observation is its position in the list of observations
after sorting the observations from smallest to largest.
The <em>j</em>th smallest observation is the observation with rank <em>j</em>.
If the data are
{ <em>t</em><sub>1</sub>, … <em>t</em><sub><em>N</em></sub> },
then <em>t</em><sub>(<em>j</em>)</sub> is the observation with rank <em>j</em>,
for <em>j</em>=1, 2, … , <em>N</em>.
Thus
</p>
<p class="math">
<em>t</em><sub>(1)</sub> ≤ <em>t</em><sub>(2)</sub> ≤ … ≤
<em>t</em><sub>(<em>N</em>)</sub>.
</p>
<p>
Ties can be broken arbitrarily.
The smallest datum is <em>t</em><sub>(1)</sub> and the largest observation is
<em>t</em><sub>(<em>N</em>)</sub>.
</p>
<p>
A large class of nonparametric methods is based on working with the ranks of
the data instead of the original values of the data.
That is, <em>t</em><sub>(1)</sub> is replaced by 1, <em>t</em><sub>(2)</sub>
is replaced by 2, and so on.
</p>
<h2>Why Rank-based methods?</h2>
<p>
In some problems, only the ranks of the data are observed.
This often happens in situations where it is possible to make comparative
judgements about outcomes, but not absolute judgements.
For example, a doctor might be able to rank a collection of patients
according to the severity with which they suffer from some disease they share,
but might not be able to rate the severity on an objective quantitative
scale—and certainly not a scale for which
5 is "better than" 4 by the same amount that 2 is "better than" 1.
Similarly, a consumer might be able to order a collection of items by preference:
she prefers A to B, B to C and C to D, for example.
Yet she might not be able to quantify the strength of her preferences.
Having techniques for dealing with ranks is then useful.
</p>
<p>
Working with ranks is also gives a useful replacement for the permutation test
we derived in the previous chapter:
The null distribution of <em>Y</em> for the permutation test
depended on the observed responses, but the ranks of <em>N</em>
distinct observations are always the integers from 1 to <em>N</em>.
Thus if we replace each observation by its rank and then perform a
permutation test, the null distribution of the new test statistic depends only on
<em>N</em> and <em>n</em>.
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"> <!--
var fStr = 'This assumes that there are no ties among the responses—we treat ' +
' ties later.';
writeFootnote(fCtr++, fCtr.toString(), fStr);
// -->
</script>
That lets us tabulate critical values for the test.
(This is less important than it once was, because modern computers allow us to
simulate the null distribution of the test statistic economically.)
Moreover, the normal approximation to the test statistic based
on the sum of the ranks of the responses of the treated subjects is
good, which can be very useful.
</p>
<p>
In the following exposition, the notation and most of the derivations follow
Lehmann, E.L. (1998. _Nonparametrics: Statistical
Methods Based on Ranks_. Upper Saddle River, N.J.: Prentice Hall).
</p>
<h2><a id="wilcoxon"></a>The Wilcoxon rank-sum test</h2>
<p>
In the randomization model we have been discussing, let
<em>S</em><sub>1</sub>, … <em>S</em><sub><em>n</em></sub> be
the ranks of the <em>n</em> responses of the treatment group among the
<em>N</em> responses of all subjects.
That is, <em>S</em><sub>1</sub> is the rank of the response of the first treated subject,
<em>S</em><sub>2</sub> is the rank of the response of the second treated subject,
and so on.
Each <em>S</em><sub><em>j</em></sub> takes a value between 1 and <em>N</em>, inclusive.
Assume for the moment that no two responses are equal, so that the ranks are not
ambiguous.
Define
</p>
<p class="math">
<em>W</em><sub>s</sub> ≡ <em>S</em><sub>1</sub> + <em>S</em><sub>2</sub> +
<em>S</em><sub>3</sub> +
… + <em>S</em><sub><em>n</em></sub>.
</p>
<p>
The statistic <em>W</em><sub>s</sub> is the <em>Wilcoxon rank-sum</em>.
We can base a test of the strong null hypothesis on <em>W</em><sub>s</sub>.
To test against the alternative that treatment tends to increase responses,
we would reject for large values of <em>W</em><sub>s</sub>.
To test against the alternative that treatment tends to decrease responses,
we would reject for small values of <em>W</em><sub>s</sub>.
The critical value of the test is set using the probability distribution of
<em>W</em><sub>s</sub> on the assumption that the strong null hypothesis is
true.
For a level-alpha test against the alternative that treatment increases
responses, we would find the smallest <em>c</em> such that, if the
strong null is true, P(<em>W</em><sub>s</sub> ≥ <em>c</em>) ≤ α.
We would then reject the strong null if the observed value of
<em>W</em><sub>s</sub> is <em>c</em> or greater.
Recall that
</p>
<p class="math">
1 + 2 + … + <em>k</em> = <em>k</em>(<em>k</em>+1)/2.
</p>
<p>
If the treated subjects have the smallest possible ranks, 1 to <em>n</em>,
then
</p>
<p class="math">
<em>W</em><sub>s</sub>= 1 + 2 + … + <em>n</em> = <em>n</em>(<em>n</em>+1)/2.
</p>
<p>
If the treated subjects have the largest possible ranks,
<em>N</em>−<em>n</em>+1 to <em>N</em>, then
</p>
<p class="math">
<em>W</em><sub>s</sub>= (<em>N</em>−<em>n</em>+1) + (<em>N</em>−<em>n</em>+2) +
… + <em>N</em>
</p>
<p class="math">
= (<em>N</em>−<em>n</em>) + 1 + (<em>N</em>−<em>n</em>) + 2 + … +
(<em>N</em>−<em>n</em>) + <em>n</em>
</p>
<p class="math">
= <em>n</em>(<em>N</em>−<em>n</em>) + (1 + 2 + … + <em>n</em>)
</p>
<p class="math">
= <em>n</em>(<em>N</em>−<em>n</em>) + <em>n</em>(<em>n</em>+1)/2.
</p>
<p>
All the integers between <em>n</em>(<em>n</em>+1)/2 and
<em>n</em>(<em>N</em>−<em>n</em>) + <em>n</em>(<em>n</em>+1)/2 are possible values
of <em>W</em><sub>s</sub>.
The null distribution of <em>W</em><sub>s</sub> under the strong null
hypothesis is symmetric about
<em>n</em>(<em>N</em>+1)/2.
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"> <!--
var fStr = 'In the randomization model, each subset of <em>n</em> of the <em>N</em> ranks is ' +
' equally likely to be the ranks of the treatment group. The probability ' +
'that the treatment ranks are 1, 2, … , <em>n</em> is equal to the ' +
'probability that the treatment ranks are <em>N</em>, <em>N</em>−1, … , ' +
'<em>N</em>−<em>n</em>+1. More generally, consider re-labeling the <em>j</em>th-ranked ' +
'observation to be the <em>N</em>−<em>j</em>+1st-ranked observation. The labels of ' +
'the observations would still be 1, 2, … , <em>N</em>, so the probability ' +
'distribution of the sum of the labels on the treated subjects would be the same ' +
'as the probability distribution of <em>W</em><sub>s</sub> under the strong null ' +
'hypothesis. However, if the sum of the treatment ranks was <em>W</em><sub>s</sub>, ' +
'the sum of the new labels would be </p>' +
'<p class="math"><em>n</em>×(<em>N</em>+1)−<em>W</em><sub>s</sub>.</p>' +
'<p>Thus</p><p class="math">P(<em>W</em><sub>s</sub> = <em>k</em>) = ' +
'P(<em>W</em><sub>s</sub> = <em>n</em>×(<em>N</em>+1) − <em>k</em>)</p>' +
'<p>That is, the probability distribution of <em>W</em>>sub>s</sub> ' +
'is symmetric about <em>n</em>×(<em>N</em>+1)/2. This argument is ' +
'essentially that in Lehmann, E.L., 1998. <em>Nonparametrics: Statistical ' +
'Methods Based on Ranks</em>. Upper Saddle River, N.J.: Prentice Hall, ' +
'pp. 12-13.</p>';
writeFootnote(fCtr++, fCtr.toString(), fStr);
// -->
</script>
Thus the expected value of <em>W</em><sub>s</sub> under the null hypothesis
is <em>n</em>×(<em>N</em>+1)/2.
(This also follows from the fact that the expected value of the sample sum of
a simple random sample of size <em>n</em> from a box of <em>N</em> numbers is
equal to <em>n</em> times the mean of the <em>N</em> numbers; the mean of the
integers 1 to <em>N</em> is (<em>N</em>+1)/2.)
The variance of <em>W</em><sub>s</sub> under the strong null hypothesis
is <em>m</em><em>n</em>(<em>N</em>+1)/12.
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"> <!--
var fStr = 'This follows from the fact that the variance of the sample sum ' +
'of a simple random sample of size <em>n</em> from a list of <em>N</em> ' +
'numbers is </p><p class="math">' +
'(<em>N</em>−<em>n</em>)×<em>n</em>×(variance of list)/(<em>N</em>−1) = ' +
'<em>m</em>×<em>n</em>×(variance of list)/(<em>N</em>−1).</p><p>' +
'The variance of the list of the integers 1 to <em>N</em> is </p><p class="math">' +
'(<em>N</em><sup>2</sup>−1)/12 = (<em>N</em>−1)×(<em>N</em>+1)/12,</p><p>' +
'so the variance of <em>W</em><sub>s</sub> is </p><p class="math">' +
'<em>m</em>×<em>n</em>(<em>N</em>−1)×(<em>N</em>+1)/(12×(<em>N</em>−1)) ' +
'= <em>m</em>×<em>n</em>×(<em>N</em>+1)/12.</p>';
writeFootnote(fCtr++, fCtr.toString(), fStr);
// -->
</script>
</p>
<p>
Define <em>W</em><sub>r</sub> to be the sum of the control ranks.
Because the treatment ranks and control ranks together comprise all the ranks,
</p>
<p class="math">
<em>W</em><sub>r</sub> + <em>W</em><sub>s</sub> = 1 + 2 + … + <em>N</em>
= <em>N</em>(<em>N</em>+1)/2.
</p>
<p>
Thus,
</p>
<p class="math">
<em>W</em><sub>r</sub> = <em>N</em>(<em>N</em>+1)/2 − <em>W</em><sub>s</sub>.
</p>
<p>
It follows that the (null) expected value of <em>W</em><sub>r</sub> is
</p>
<p class="math">
<em>N</em>(<em>N</em>+1)/2 − <em>n</em>(<em>N</em>+1)/2 = <em>m</em>(<em>N</em>+1)/2,
</p>
<p>
and that the (null) variance of <em>W</em><sub>r</sub> is also
<em>m</em><em>n</em>(<em>N</em>+1)/12.
When <em>n</em> and <em>m</em> are both large, the normal approximations to the
null distributions of
<em>W</em><sub>r</sub> and to <em>W</em><sub>s</sub> tend to be accurate.
We can check this by simulation using the sampling applet; see
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
citeFig();
// -->
</script>.
You can change the contents of the box on the right and you can change the sample size to
see how the approximation varies with <em>N</em> and <em>n</em>.
To check the approximation, first draw 10,000 samples to get an approximation to the
null probability distribution of <em>W</em><sub>s</sub>.
Then highlight various intervals using the scrollbars in the figure and compare the
area under the histogram with the area under the normal curve.
</p>
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
var qStr = 'Simulated null distribution of the Wilcoxon rank-sum <em>W</em><sub>s</sub> and its ' +
'normal approximation for <em>N</em>=20, <em>n</em>=<em>m</em>=10.';
writeFigureCaption(qStr);
// -->
</script>
<p align="center">
<applet code="SampleDist.class" codebase="../../../Java/" align="baseline" width="640"
archive="PbsGui.zip" height="360">
<param name="variables" value="sum">
<param name="startWith" value="sum">
<param name="boxContents" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20">
<param name="showBoxHist" value="false">
<param name="boxHistControl" value="false">
<param name="sources" value="box">
<param name="replaceControl" value="false">
<param name="replace" value="false">
<param name="curveControls" value="true">
<param name="showCurve" value="true">
<param name="sampleSize" value="10">
You need Java to see this.
</applet>
</p>
<p>
The Wilcoxon rank-sum test is exactly what you get if you replace each observation by its
rank, then do a permutation test using the sum of the (ranks of the) responses in the
treatment group.
Thus, the code in the previous chapter can be used to simulate the null distribution,
by replacing the raw data with their ranks.
However, the normal approximation to the null distribution
of <em>W</em><sub>s</sub> can be better than the normal approximation to the null
distribution of the sample sum of the responses to treatment, because the ranks
are evenly spread out, whereas the raw responses can be highly skewed or multimodal.
</p>
<h2><a id="mann_whitney"></a>Mann-Whitney Statistics</h2>
<p>
Define <em>W</em><sub>XY</sub> ≡ <em>W</em><sub>s</sub> − <em>n</em>(<em>n</em>+1)/2.
This subtracts from <em>W</em><sub>s</sub> its minimum possible value.
Let <em>W</em><sub>YX</sub> ≡ <em>W</em><sub>r</sub> − <em>m</em>(<em>m</em>+1)/2,
<em>W</em><sub>r</sub> minus its minimum possible value.
Under the strong null hypothesis, the probability distribution of <em>W</em><sub>XY</sub>
is the same as the probability distribution of <em>W</em><sub>YX</sub>,
a consequence of the symmetry of the probability distribution of <em>W</em><sub>s</sub>.
The statistics <em>W</em><sub>XY</sub> and <em>W</em><sub>YX</sub> are called
the <em>Mann-Whitney</em> statistics.
</p>
<p>
Let { <em>X</em><sub>1</sub>, … , <em>X</em><sub><em>m</em></sub> } denote
the control responses and let
{ <em>Y</em><sub>1</sub>, … , <em>Y</em><sub><em>n</em></sub> }
denote the treatment responses as before.
Consider
</p>
<p class="math">
#{ (<em>i</em>, <em>j</em>) : 1 ≤ <em>i</em>≤ <em>m</em>,
1 ≤ <em>j</em>≤ <em>n</em>, and
<em>X</em><sub><em>i</em></sub> < <em>Y</em><sub><em>j</em></sub> }.
</p>
<p>
This is the number of (control, treatment) pairs such that the control
response is less than the treatment response.
</p>
<p>
Let { <em>S</em><sub><em>j</em></sub>: <em>j</em> = 1, … , <em>n</em> }
be the ranks of the treatment responses as
before, and let
{ <em>R</em><sub><em>j</em></sub>: <em>j</em> = 1, … , <em>m</em> }
be the control responses.
Let { <em>S</em><sub>(<em>j</em>)</sub>: <em>j</em> = 1, … , <em>n</em> }
and { <em>R</em><sub>(<em>j</em>)</sub>: <em>j</em> = 1, … , <em>m</em> }
be the corresponding ordered ranks.
Use the value of the treatment response to partition the set of (control, treatment)
pairs for which the control response is less than the treatment response: The total number of
such pairs for which the control response is less than the treatment response
is the number of pairs where the control response is less than the smallest treatment
response, plus the number of pairs where the control response is less than the second-smallest
treatment response, and so on.
This will help us count the pairs.
</p>
<p>
How many response values are less than <em>S</em><sub>(1)</sub>, the rank of the smallest
treatment response?
By definition, there are <em>S</em><sub>(1)</sub>−1 of them—all of which are control
responses.
The number of response values that are less than <em>S</em><sub>(2)</sub> is
<em>S</em><sub>(2)</sub>−1, one of which is <em>S</em><sub>(1)</sub>, so the
total number of control responses that are less than <em>S</em><sub>(2)</sub> is
<em>S</em><sub>(2)</sub>−2.
The total number of control responses that are less than <em>S</em><sub>(<em>j</em>)</sub>
is <em>S</em><sub>(<em>j</em>)</sub> − <em>j</em>, so the total number of (control, treatment)
pairs with the control response less than the treatment response is
</p>
<p class="math">
<em>S</em><sub>(1)</sub>−1 + <em>S</em><sub>(2)</sub>−2 + … +
<em>S</em><sub>(<em>n</em>)</sub> − <em>n</em> = <em>W</em><sub>s</sub> −
(1 + 2 + … + <em>n</em>) =
<em>W</em><sub>s</sub> − <em>n</em>(<em>n</em>+1)/2 = <em>W</em><sub>XY</sub>.
</p>
<p>
Thus
</p>
<p class="math">
<em>W</em><sub>XY</sub> =
#{ (<em>i</em>, <em>j</em>) : 1 ≤ <em>i</em>≤ <em>m</em>,
1 ≤ <em>j</em>≤ <em>n</em>, and
<em>X</em><sub><em>i</em></sub> < <em>Y</em><sub><em>j</em></sub> }.
</p>
<p>
The Mann-Whitney statistic <em>W</em><sub>XY</sub> (and the Wilcoxon rank sum
<em>W</em><sub>s</sub>, up to an additive constant)
measures the number of (control, treatment) pairs for which the treatment response
is at least as large as the control response.
The larger the positive effect of treatment, the larger the Mann-Whitney and Wilcoxon
rank sum statistics tend to be.
</p>
<h2>Tied observations</h2>
<p>
If the control and treatment responses are random samples from continuous distributions,
the chance of ties among the data is zero.
However, ties can and do occur in practice, if only because of limits on the
precision with which data can be recorded.
When there are tied observations, the ranks are not uniquely defined.
We can patch the rank-sum approach by assigning each set of tied observations
the <em>mid-rank</em> of the set.
For example, if the sorted data are {1, 2, 3, 4, 4} as above, the mid-rank of the
last two observations (tied at 4) would be 4.5.
If the sorted data were {1, 2, 3, 3, 3, 4, 4, 5}, the corresponding mid-ranks would
be 1, 2, 4, 4, 4, 6.5, 6.5, and 8.
The Wilcoxon rank-sum statistic generalized to use mid-ranks is denoted
<em>W</em><sub>s</sub><sup>*</sup>.
Once there are ties, the null distribution of the rank sum depends on the observations:
which mid-ranks are represented.
When there are ties, the normal approximation to the null probability distribution
of the rank-sum tends to be worse, because there are fewer possible values of
the rank sum (the null probability distribution is "chunkier").
</p>
<p>
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
citeFig();
// -->
</script>
shows the sampling applet again, for the mid-ranks of the data set {1, 2, 3, 4, 4},
namely, {1, 2, 3, 4.5, 4.5}, with <em>N</em>=5, <em>n</em>=2.
Draw 10,000 samples to see the simulated null distribution of the rank sum.
Note that the distribution of the rank-sum is skewed in this case.
Compare the simulated null distribution of the rank-sums with and without
midranks by replacing the contents of the box by {1, 2, 3, 4, 5}, which
would be the ranks if there were no ties in the data.
Note that the distribution is then symmetric.
Replace the contents of the box with the mid-ranks
1, 2, 4, 4, 4, 6.5, 6.5, and 8 and change the sample size (<em>n</em>) to 4.
Compare the simulated sampling distribution of the rank sum with that
for ranks 1, 2, 3, 4, 5, 6, 7, 8,
and check the accuracy of the normal approximation in both cases.
</p>
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
var qStr = 'Simulated null distribution of <em>W</em><sub>s</sub><sup>*</sup>, the Wilcoxon ' +
' rank sum generalized for ties.';
writeFigureCaption(qStr);
// -->
</script>
<p align="center">
<applet code="SampleDist.class" codebase="../../../Java/" align="baseline" width="640"
archive="PbsGui.zip" height="360">
<param name="variables" value="sum">
<param name="startWith" value="sum">
<param name="boxContents" value="1,2,3,4.5,4.5">
<param name="showBoxHist" value="false">
<param name="boxHistControl" value="false">
<param name="bins" value="10">
<param name="sources" value="box">
<param name="replaceControl" value="false">
<param name="replace" value="false">
<param name="curveControls" value="true">
<param name="showCurve" value="true">
<param name="sampleSize" value="2">
You need Java to see this.
</applet>
</p>
<p>
Calculating midranks seems to involve a number of steps: sort the observations,
identify ties, and average the ranks of each group of ties.
An alternative point of view makes the computation much simpler.
If an observation is not tied, its rank is equal to the number of
observations that are less than or equal to it.
That is, suppose the data are <em>z</em><sub>1</sub>, … , <em>z</em><sub><em>N</em></sub>.
If the multiplicity of the value <em>z</em><sub><em>k</em></sub> is one,
then
</p>
<p class="math">
rank(<em>z</em><sub><em>k</em></sub>) = #{ <em>j</em>: <em>z</em><sub><em>j</em></sub> ≤
<em>z</em><sub><em>k</em></sub> }.
</p>
<p>
On the other hand, if there are <em>M</em>(<em>z</em><sub><em>k</em></sub>)
observations in all whose values are
equal to <em>z</em><sub><em>k</em></sub>, then the midrank of <em>z</em><sub><em>k</em></sub>
is
</p>
<p class="math">
midrank(<em>z</em><sub><em>k</em></sub>) =
#{ <em>j</em>: <em>z</em><sub><em>j</em></sub> ≤ <em>z</em><sub><em>k</em></sub> } −
(<em>M</em>(<em>z</em><sub><em>k</em></sub>)−1)/2.
</p>
<p>
We can calculate <em>M</em>(<em>z</em><sub><em>k</em></sub>) by
</p>
<p class="math">
<em>M</em>(<em>z</em><sub><em>k</em></sub>) = #{ <em>j</em>: <em>z</em><sub><em>j</em></sub> ≤
<em>z</em><sub><em>k</em></sub> } −
#{ <em>j</em>: <em>z</em><sub><em>j</em></sub> < <em>z</em><sub><em>k</em></sub> }.
</p>
<p>
Thus
</p>
<p class="math">
midrank(<em>z</em><sub><em>k</em></sub>) = <big>(</big>
#{ <em>j</em>: <em>z</em><sub><em>j</em></sub> ≤ <em>z</em><sub><em>k</em></sub> } +
#{ <em>j</em>: <em>z</em><sub><em>j</em></sub> < <em>z</em><sub><em>k</em></sub> } +
1 <big>)</big>/2.
</p>
<p>
Here is a <a href="http://www.mathworks.com">Matlab</a> function to compute the
midranks of a list:
</p>
<div class="code">
<p>
<pre>
function r = midRanks(x)
% function r = midRanks(x)
% P.B. Stark, statistics.berkeley.edu/~stark 2/17/2003
% vector of midranks of a vector x.
% midrank is (#<=) - ((#=) - 1)/2 = ((#<=) + (#<) + 1)/2
for j = 1:length(x),
r(j) = sum(x <= x(j)) - (sum(x == x(j)) - 1)/2;
end;
return;
</pre>
</p>
</div>
<p>
Here is an R function to do the same thing:
</p>
<div class="code">
<p>
<pre>
midRanks <- function(x) {
# P.B. Stark, statistics.berkeley.edu/~stark 2/5/2008
# vector of midranks of a vector x.
mr <- array(0,length(x));
for (j in 1:length(x)) {
mr[j] <- sum(x <= x[j]) - (sum(x == x[j]) - 1)/2;
}
mr
}
</pre>
</p>
</div>
<p>
Here is a Matlab function to simulate the distribution of the sample
sum of a simple random sample of size <em>n</em> from a population of size <em>N</em>.
</p>
<div class="code">
<p>
<pre>
function dist = simSampleSum(z, n, iter)
% function dist = simSampleSum(z, n, iter)
% P.B. Stark, statistics.berkeley.edu/~stark 2/17/2003
% Simulate the sampling distribution of the sum of n of the elements of z
dist = zeros(iter,1);
N = length(z);
if (n > N)
disp(['error in simSampleSum: sample size exceeds population size']);
return;
elseif (n == N)
dist = dist + sum(z); % constant random variable
elseif (n == 0)
return; % zero
else
for i=1:iter
zp = z(randperm(length(z))); % random permutation of data
dist(i) = sum(zp(1:n)); % add the first n
end;
end;
return;
</pre>
</p>
</div>
<p>
Here is a Matlab function to simulate the null distribution of the Wilcoxon
rank sum using midranks.
</p>
<div class="code">
<p>
<pre>
function dist = simWilcox(x, y, iter)
% function dist = simWilcox(x, y, iter)
% P.B. Stark, statistics.berkeley.edu/~stark 2/17/2003
% simulates the null distribution of the Wilcoxon rank sum using midranks,
% for data x (control) and y (treatment) using iter pseudorandom samples
z = midRanks([x, y]);
n = length(y);
dist = simSampleSum(z, n, iter);
return;
</pre>
</p>
</div>
<p>
Here's an example of using the Matlab functions just defined to approximate
the 1-sided (upper tail) <em>P</em>-value for the Wilcoxon rank sum statistic
with midranks:
</p>
<div class="code">
<p>
<pre>
iter = 10000;
zr = midRanks([x y]); % the control responses are x; the treatment
% responses are y. zr now has their midranks.
dist = simWilcox(x, y, iter);
pVal = sum(dist >= sum(zr[(length(x)+1):length(zr)]) )/iter;
</pre>
</p>
</div>
<h2>The Siegel-Tukey Test</h2>
<p>
When the alternative is that the dispersion of the control responses differs
from the dispersion of the treatment responses, the Wilcoxon rank-sum statistic
has poor power.
We could imagine a permutation test based on the inter-quartile range of the
responses in the treatment group or the standard deviation of the responses
of the treatment group; under the strong null hypothesis, we could find
the distribution of that test statistic by calculation or simulation.
The randomization, in which every subset of <em>n</em> of the <em>N</em>
responses equally is likely to be the treatment responses, makes such calculations
and tests straightforward, at least conceptually.
</p>
<p>
There is a clever way to relabel the data that allows us to use the
calculations and tables for the Wilcoxon rank-sum statistic to test against
the dispersion alternative—we just redefine what we mean by "rank."
Suppose that the alternative is that the treatment decreases the dispersion of
the responses around some common measure of location (such as the median).
Then responses far from the middle rank should be less likely to occur in the
treatment group than in the control group.
Label the smallest observation "1", the largest "2",
the second-smallest "3", the second-largest "4", and so on,
ignoring the possibility of ties for the moment.
Consider the sum <em>T</em> of the <em>n</em> labels of the treated subjects.
The null distribution of <em>T</em> is the same as the null distribution of
the Wilcoxon rank-sum statistic <em>W</em><sub>s</sub>, and under the alternative
that treatment decreases dispersion, <em>T</em> tends to be large.
Therefore, rejecting the strong null hypothesis when
</p>
<p class="math">
<em>T</em> > <em>c</em>
</p>
<p>
where <em>c</em> is the appropriate quantile of the null distribution of the
Wilcoxon rank-sum statistic, is a reasonable test
against this alternative.
This is the <em>Siegel-Tukey</em> test.
</p>
<p>
However, labeling the largest observation "1", the smallest "2",
the second-largest "3", the second-smallest "4", and so on,
leads to a test statistic with the same distribution under the strong null hypothesis,
but that typically gives a different <em>P</em>-value for a given set of data.
A different test statistic that is more symmetric is to assign the smallest and largest
observations the label 1.5, the second-smallest and second-largest the label 3.5,
the third-smallest and third-largest 5.5, and so on.
Ties can be treated as they are in the generalization of the Wilcoxon rank-sum
statistic <em>W</em><sub>s</sub><sup>*</sup>, by using "mid-labels"
(just like mid-ranks).
To find significance levels or <em>P</em>-values when <em>N</em> is large,
it is sometimes necessary to resort to simulation and settle for an approximation.
</p>
<p>
The Siegel-Tukey test tacitly assumes that the treatment responses and the
control responses are scattered about a common typical value—that is,
that there is no shift difference between the two groups.
If there is a known shift between control and treatment, it could be subtracted
before ranking the data.
If the shift is unknown, it could be estimated from the data, but then the
nominal significance level of the test will not be its real level.
For large samples, the difference might be unimportant.
</p>
<h2>The Smirnov Test</h2>
<p>This material is from Lehmann (1998). </p>
<p>
We have considered shift and dispersion alternatives; now we move to the
omnibus alternative.
The Smirnov test is based on the difference between the
empirical cumulative distribution function (cdf) of the treatment responses
and the empirical cdf of the control responses.
It has some power against all kinds of violations of the strong null hypothesis.
However, it has less power than <em>W</em><sub>s</sub> against
the alternatives for which the Wilcoxon rank-sum test is designed.
Let <em>F</em><sub>Y,<em>n</em></sub>
denote the empirical cdf of the treatment responses:
</p>
<p class="math">
<em>F</em><sub>Y,<em>n</em></sub>(<em>y</em>)
≡ #{ <em>y</em><sub><em>j</em></sub>:
<em>y</em><sub><em>j</em></sub> ≤ <em>y</em> }/<em>n</em>,
</p>
<p>
and define <em>F</em><sub>X,<em>m</em></sub> analogously as the empirical cdf of the
control responses.
If there are no ties within the treatment group, <em>F</em><sub>Y,<em>n</em></sub>
jumps by 1/<em>n</em> at each treatment response value.
If there are no ties within the control group,
<em>F</em><sub>X,<em>m</em></sub> jumps by 1/<em>m</em> at each control response value.
If <em>k</em> of the treatment responses are tied and
equal to <em>y</em><sub>0</sub>, then <em>F</em><sub>Y,<em>n</em></sub> jumps by <em>k</em>/<em>n</em>
at <em>y</em><sub>0</sub>.
The Smirnov test statistic is
</p>
<p class="math">
<em>D</em><sub><em>m</em>,<em>n</em></sub> ≡ sup |<em>F</em><sub>Y,<em>n</em></sub>(<em>y</em>)
− <em>F</em><sub>X,<em>m</em></sub>(<em>y</em>) |,
</p>
<p>
where the supremum is over all real values of <em>y</em>.
It is easy to see that the supremum is attained at one of the data values.
We can also see that
the supremum depends only on the ranks of the data, because the
order of the jumps matters, but the precise values of <em>y</em> at which the jumps occur
do not matter.
Therefore, the test
</p>
<p class="math">
Reject if <em>D<sub>m</sub></em><sub>,<em>n</em></sub>
> <em>c</em>,
</p>
<p>
for an appropriately chosen value of <em>c</em>, is a nonparametric test of
the strong null hypothesis.
</p>
<p>
Let's calculate the null distribution of <em>D<sub>m</sub></em><sub>,<em>n</em></sub>
for <em>n</em>=3, <em>m</em>=2.
There are <sub>5</sub>C<sub>3</sub>=10 possible
assignments of 3 of the subjects to treatment, each of which has probability
1/10 under the strong null hypothesis.
Let us assume that the 5 data are distinct (no ties).
Then
<script language="JavaScript1.4" type="text/javascript" type="text/javascript">
<!--
citeTable();
// -->
</script>
lists the possibilities and the corresponding values of <em>D<sub>m</sub></em><sub>,<em>n</em></sub>.
</p>
<p><script language="JavaScript1.4" type="text/javascript" type="text/javascript">
<!--
var qStr = 'Possible values of <em>D</em><sub><em>m</em>,<em>n</em></sub> for <em>n</em>=3, ' +
'<em>m</em>=2';
writeTableCaption(qStr);
var header = ['treatment ranks', 'control ranks', '<em>D</em><sub><em>m</em>,<em>n</em></sub>'];
var list = new Array(3);
list[0] = ['1, 2, 3', '1, 2, 4', '1, 2, 5', '1, 3, 4', '1, 3, 5',
'1, 4, 5', '2, 3, 4', '2, 3, 5', '2, 4, 5', '3, 4, 5'
];
list[1] = ['4, 5', '3, 5', '3, 4', '2, 5', '2, 4',
'2, 3', '1, 5', '1, 4', '1, 3', '1, 2'
];
list[2] = ['1', '2/3', '2/3', '1/2', '1/3',
'2/3', '1/2', '1/2', '2/3', '1'
];
listToTable(header, list, 'transpose', 'center');
// -->
</script>
<p>
The null probability distribution of <em>D<sub>m</sub></em><sub>,<em>n</em></sub>
is given in
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
citeTable();
// -->
</script>
</p>
<p><script language="JavaScript1.4" type="text/javascript" type="text/javascript">
<!--
var qStr = 'Null probability distribution of <em>D</em><sub>2,3</sub>';
writeTableCaption(qStr);
var header = ['<em>d</em>', 'P(<em>D</em><sub>2,3</sub>=<em>d</em>)'];
var list = new Array(2);
list[0] = ['1/3', '1/2', '2/3', '1'];
list[1] = ['1/10', '3/10', '4/10', '2/10'];
listToTable(header, list, 'transpose', 'center');
// -->
</script>
<p>
Thus a Smirnov test (against the omnibus alternative) with
significance level 0.2 would reject the strong null hypothesis when
<em>D</em><sub>2,3</sub>=1;
smaller significance levels are not attainable when <em>n</em> and <em>m</em> are so small.
Critical values can be calculated analytically fairly easily when
<em>n</em>=<em>m</em> (when the treatment and control groups are the same size)
and the significance level is an integer multiple <em>a</em> of 1/<em>n</em>.
Let <em>k</em> = ⌊ <em>n</em>/<em>a</em> ⌋ be the largest integer such that <em>n</em>
− <em>k</em>×<em>a</em> ≥ 0.
Then, under the strong null hypothesis, provided there are no ties,
</p>
<p class="math">
P(<em>D<sub>m</sub></em><sub>,<em>n</em></sub>
> <em>a</em>/<em>n</em>) = <big>(</big>
<sub>2<em>n</em></sub>C<em><sub>n</sub></em><sub>−<em>a</em></sub>
− <sub>2<em>n</em></sub>C<em><sub>n</sub></em><sub>−2<em>a</em></sub> +
<sub>2<em>n</em></sub>C<em><sub>n</sub></em><sub>−3<em>a</em></sub>
− … + (−1)<em><sup>k</sup></em><sup>−1</sup>×
<sub>2<em>n</em></sub>C<em><sub>n</sub></em><sub>−<em>ka</em></sub>
<big>)</big>/(2×<sub>2<em>n</em></sub>C<em><sub>n</sub></em>).
</p>
<p>
When there are ties, <em>D<sub>m</sub></em><sub>,<em>n</em></sub> tends to
be smaller, so tail probabilities from this expression still give conservative
tests.
<script language="JavaScript1.4" type="text/javascript" type="text/javascript"><!--
var fStr = 'If both members of a tie are assigned to the same group (treatment or control)' +
'the tie does not change the value of <em>D</em><sub><em>m</em>,<em>n</em></sub> ' +
'from the value it would have had if the pair differed slightly. ' +
'If one member of a tie is assigned to treatment and one to control, the tie ' +
'can decrease the value of <em>D</em><sub><em>m</em>,<em>n</em></sub> from the ' +
'value it would have had if the observations differed slightly. Thus ' +
'<em>D</em><sub><em>m</em>,<em>n</em></sub> is stochastically smaller when there ' +
'are ties.';
writeFootnote(fCtr++, fCtr.toString(), fStr);
// -->
</script>
There are also limit theorems that give asymptotic approximations; see
<em>Lehmann</em> (1998, pp. 37ff., 421).
When <em>n</em> is not equal to <em>m</em>,
the calculations are harder and the asymptotics change.
Of course, we can
always approximate the null distribution of <em>D<sub>m</sub></em><sub>,<em>n</em></sub>
by simulation.
</p>
<p>
Here is a <a href="http://www.mathworks.com">Matlab</a> m-file to calculate the Smirnov
statistic.
Let <em>x</em>=(<em>x</em><sub>1</sub>, … , <em>x</em><sub><em>m</em></sub>)
denote the vector of control responses and let
<em>y</em>=(<em>y</em><sub>1</sub>, … , <em>y</em><sub><em>n</em></sub>)
denote the vector of treatment responses.
</p>
<div class="code">
<p>
<pre>
function s = smirnov(x, y)
% function s = smirnov(x, y)
% P.B. Stark, statistics.berkeley.edu/~stark 2/17/2003
% calculates the Smirnov distance between two vectors of data
m = length(x);
n = length(y);
z = [x y];
s = 0;
for j=1:m+n,
s = max(s, abs(sum(x <= z(j))/m - sum(y <= z(j))/n));
end;
return;
</pre>
</p>
</div>
<p>
The following Matlab function simulates the null distribution of the Smirnov statistic.
Again, the data are <em>x</em> and <em>y</em>.
</p>
<div class="code">
<p>
<pre>
function dist = simSmir(x, y, iter)
% function dist = simSmir(x, y, iter)
% P.B. Stark, statistics.berkeley.edu/~stark 2/17/2003
% simulates the null distribution of the Smirnov distance for data x and y
% using iter pseudorandom samples
dist = zeros(iter,1);
m = length(x);
n = length(y);
z = [x y]; % pool the N = m+n observations
for i=1:iter
zp = z(randperm(length(z))); % random permutation of the data
dist(i) = smirnov(zp(1:m),zp(m+1:m+n)); % first m control, last n treatment
end;
return;
</pre>
</p>
</div>
<p>
Using these functions, you can find an approximate <em>P</em>-value for
the Smirnov test using the following script, which assumes that the data are
already in the vectors <em>x</em> and <em>y</em> and that <em>iter</em> has
been set (to a number like 10,000).
</p>
<div class="code">
<p>
<pre>
testVal = smirnov(x, y);
simDist = simSmir(x, y, iter);
pValue = sum(simDist >= testVal)/iter;
</pre>
</p>
</div>
<p>
Here are versions of the functions in R:
</p>
<div class="code">
<p>
<pre>
smirnov <- function(x, y) {
# P.B. Stark, statistics.berkeley.edu/~stark 9/8/2006
z <- c(x,y);
s <- 0;
for (j in 1:length(z)) {
s <- max(s, abs(sum(x <= z[j])/length(x) - sum(y <= z[j])/length(y)));
}
s
}
</pre>
</p>
<p>