-
Notifications
You must be signed in to change notification settings - Fork 0
/
StatisticalLiteracyCourse.Rnw
1873 lines (1140 loc) · 51.5 KB
/
StatisticalLiteracyCourse.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\usetheme{Singapore}
\usepackage{graphicx}
%\usepackage{multimedia}
\logo{\includegraphics[width = 0.1\textwidth]{hestemlogo}}
\usepackage{keystroke}
\usepackage{marvosym}
\usepackage{ifsym}
%\usepackage{hyperref}
\usepackage{tikz}
\usepackage[tikz]{bclogo}
\usetikzlibrary{arrows,decorations,backgrounds,fit,positioning,calc}
\author{Paul Hewson}
\title{Insights from Data}
\newcommand{\A}{{\color{green}A}}
\newcommand{\B}{{\color{red}B}}
\newcommand{\C}{{\color{blue}C}}
\newcommand{\R}{{\color{red}R}}
\makeatletter
\newcommand{\Repeat}[1]{%
\expandafter\@Repeat\expandafter{\the\numexpr #1\relax}%
}
\def\@Repeat#1{%
\ifnum#1>0
\expandafter\@@Repeat\expandafter{\the\numexpr #1-1\expandafter\relax\expandafter}%
\else
\expandafter\@gobble
\fi
}
\def\@@Repeat#1#2{%
\@Repeat{#1}{#2}#2%
}
\makeatother
\begin{document}
\mode<article>{\sffamily}
\mode<beamer>{\frame{\titlepage}}
\mode<article>{
\maketitle
\tableofcontents
}
\section{Pre-amble}
\mode<article>{This course has been developed with funding from HE-STEM and in partnership with Devon County Council. The support of both is gratefully acknowledged.
The aim of the course is to provide a ``one day'' introduction to statistical literacy for use in the workplace. In doing that, we are trying to make a variety of delivery modes available, a blended course (a combination of online pre-course work, a group contact session and a little online post-course work) as well as a fully online course. At the time of writing we continue to evaluate the relative merits of each delivery mode.
What this does mean is that for any course, there are online materials, in the form of a Moodle course, which accompanies these notes. Currently, a live version of these materials are being hosted by the Royal Statistical Society Centre for Statistical Education, and these should allow guest access. A zip archive of the course is also available for anyone wishing to adopt the course.
All course materials are being offered under the GNU Public Licence. These learning materials are free to use by anyone, with the sole restriction that you cannot subsequently restrict anyone else's use of these materials.
The course is intended to consist of:
\begin{itemize}
\item Some pre-course activities
\begin{itemize}
\item watch a video, answer some questions about the video
\item submit ``newspaper'' articles into a database
\end{itemize}
\item A course - either a ``contact'' session or an entirely online course
\item Some post-course activities
\end{itemize}
}
\mode<article>{\newpage}
\subsection{Activity 1}
\mode<article>{
Following any housekeeping and ice-breaker activities, the course starts with a discussion of the video (and quiz) that was set for pre-reading.
}
\mode<beamer>{
\begin{frame}[label=smidsyvideo]
\frametitle{Pre-course activity 1}
\begin{itemize}
\item You were asked to watch a short video clip, and answer some questions (online).
\item There were few ``correct'' answers, the main point of the exercise was to get you thinking
\item Please now imagine you are a public servant, with finite money to spend on
\begin{itemize}
\item An intervention to reduce ``Single vehicle loss of control'' injuries, or
\item an intervention to reduce ``Sorry Mate I didn't see you collisions''.
\end{itemize}
\item In groups, make a decision. Also, be prepared to explain \emph{why} you chose the intervention you did.
\end{itemize}
\end{frame}
}
\mode<article>{
\includeslide{smidsyvideo}
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss how we priortise an action in this scenario
\end{bclogo}
The fundamental goal of the video exercise was to get people thinking about the evidence we need in order to make a decision about an intervention. Road travel is a ubiquitous activity. Many people expect the state (central/local government/police/other) to do ``something'' about road injury. But how do we decide the priorities. Part of the discussion could involve the following ideas:
\begin{itemize}
\item Direct resources to what I think is the most important
\item Direct resources to what elected representatives think is most important
\item Direct resources to what the public think is most important
\item Inform a debate on resources using something approaching objective evidence?
\item Balance the needs with the likely effectiveness of any intervention (maybe one problem is more common, but I don't really know how to fix it)
\end{itemize}
Hopefully, we arrive at a position where (a) we see some need for data to support a decision and (b) we acknowledge that these data are unlikely on their own to make the decision for us. In this case, one major complication may be that some problems are more amenable to our interventions than others. Even if one crash type were more numerically common than the other, maybe we don't have an effective remedy, and would be better spending our money on something that makes a difference. However, we should be basing our decisions on data about how common things are, and how effective treatments are. We shouldn't be basing it on guesswork or anecdote.
}
\frame{
\frametitle{Learning outcome 1}
\begin{bclogo}[couleur=red!30, arrondi=0.1, sousTitre=GAISE 2010: 1, logo=\bcinfo, ombre=True]{}
Data beat Anecdotes
\end{bclogo}
}
\mode<article>{\newpage}
\subsection{Activity 2}
\mode<beamer>{
\begin{frame}[label=pemdas]
\frametitle{Do we really use data?}
The plural of anecdote IS NOT data
\begin{itemize}
\item<1-> Saying ``data beat anecdotes'' is a cornerstone of statistical literacy
\item<2-> But we contradict it regularly.
\item<3-> How often do you hear about my dear old Aunt Sally who smokes 80 cigarettes a day, drinks 8 bottles of gin and has still lived to be one hundred and fourteen years old
\item<3-> Who points out that this is ``the exception that proves the rule''
\end{itemize}
\end{frame}
}
\mode<article>{
We first need to address the question of data beating anecdotes.
\includeslide{pemdas}
With a mature audience, we need to do this in a way that acknowledges the fact that no administrative data (if any data) give a perfect representation of reality. We therefore have to move to a discussion of decision making in the face of imperfect data.
}
\mode<beamer>{
\begin{frame}[label=jbest]
\frametitle{Joel Best: Lies, Damned Lies and Statistics}
\begin{itemize}
\item Who created this statistic?
\item Why was this statistic created?
\item How was this statistic created?
\end{itemize}
So what?
\begin{itemize}
\item Adopt outright cynicism: don't believe anything based on data ever
\item Adopt na\"ive acceptance (especially if the ``facts'' suit us).
\end{itemize}
Both positions have the advantage of being thought free. However, for a professional, a critically cautious position between these two extremes is needed.
%get that mark twain quote on avoiding thinking
\end{frame}
}
\mode<article>{
\includeslide{jbest}
The aim of of this slide is to prompt a more informed discussion about the actual data that concerns course participants. What does actually get recorded? How does it get recorded. In injury prevention, one ``Gold Standard'' source of data are the police collected, so-called ``STATs 19'' data.
\begin{itemize}
\item Who: collected by the police, in response to information about a collision (so for example, minor collisions involving uninsured drivers may never get reported to the police)
\item Why: apart from the urgent human needs, police involvement is directed towards determining whether an offence has taken place, and whether a prosecution is possible. This might limit the value of the data for preventative action (if I get hit as a pedestrian / cyclists, I might not care that it was the car driver's fault. I might care about ways I can prevent it happening again)
\item How: a long, complicated form, sometimes collated in response to a public report a while after the collision occurred. All evidence is retrospective and ``first impressions''
\end{itemize}
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Using other examples provided by the course participants, discuss the limitations of the data using Best's Who/How/Why.
\end{bclogo}
}
\mode<article>{\newpage}
\section{Critical Numeracy}
\frame{
\frametitle{Example: consider two courses}
\begin{itemize}
\item In two years, the dropout rate for Course A doubles
\item In the same two years, the dropout rate for Course B increases by 50\%
\item You only have resources to intervene with one course. Which course do you decide to ``sort out''?
\end{itemize}
}
\frame{
\frametitle{Possible numbers}
Conveniently (and implausibly) there are 100 students on each course:
\begin{center}
\begin{table}
\begin{tabular}{lrr}
Course & 2008 & 2009 \\
\hline
A & 4 & 8 \\
B & 40 & 60 \\
\end{tabular}
\end{table}
\end{center}
\pause
Changes in dropout rate:
\begin{itemize}
\item A: The dropout rate has increases from 4\% to 8\%
\item B: The dropout rate has increases from 40\% to 60\%
\end{itemize}
\pause
\begin{itemize}
\item<1-> A: The ``relative risk'' is $\frac{0.08}{0.04} = 2$.
\item<1-> B: The ``relative risk'' is $\frac{0.6}{0.4} = 1.5$
\item<2-> A: The ``absolute difference in risk'' is $0.08-0.04=0.04$
\item<2-> B The ``absolute difference in risk'' is $0.6-0.4 = 0.2$
\end{itemize}
}
\frame{
\frametitle{Relative and absolute risk}
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Never mind the jargon!!! The way you compare proportions (or percentages) can alter your impressions.
\end{bclogo}
Note to self - mention percentage points!
}
\mode<article>{\newpage}
\section{The way we collect data}
\mode<article>{
(Maybe it's time to find a better example). This case study was used by Sharon Lohr in her excellent (one of the best around) textbook on survey sampling. There's also an interesting parallel with road injury. Few people admit to being a ``below average'' driver. Likewise few people admit to being a ``below average'' lover, so it does seem to have some audience resonance. We start by stating a few bare ``facts'' from the summary of results.}
\mode<beamer>{
\begin{frame}[label = sherehite]
\frametitle{Shere Hite (1987), Women in Love: A cultural revolution in progress}
\begin{itemize}
\item There were 4,500 respondents to this survey - does that sounds like a big number?
\item What do you feel is a good number for a survey (note, YouGov predict the election within a few percentage points on smaller numbers than this)
\end{itemize}
\end{frame}
}
\mode<article>{
\includeslide{sherehite}
The aim of this slide is to prompt a discussion about survey methods. Hopefully, most people would accept that 4,500 is quite large by survey standards. It is though well worth prompting a discussion about what we mean by ``quite large'', as well as asking about the size of surveys people rely on in their own practice.
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss the validity of results based on a sample size of 4,500. Discuss the sample sizes used by participants in their own practice.
\end{bclogo}
Once we've exhausted discussion about sample size we can move on to consider the results.}
\mode<beamer>{
\begin{frame}[label=sherehiteresults]
\frametitle{Women in Love?}
\begin{itemize}
\item 84\% of women are not emotionally satisfied with their relationships
\item 70\% of women who have been married for more than 5 years have had affairs
\item 95\% have been emotionally or psychologically harassed by the male
\item 84\% have suffered condescension from the male
\end{itemize}
What is this survey telling us about the wellbeing of women in the US in the 1980s?
\end{frame}
}
\mode<article>{
\includeslide{sherehiteresults}
This should fuel a good open ended discussion. The findings out to be a little challenging. It is worth letting this discussion run, with careful prompting. I suppose it's bad pedagogy to let anyone comprehensively challenge a theory of human relationships before you display the next slide, but the more people try to engage with the results the better
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss the findings of Shere Hite's survey.
\end{bclogo}
}
\mode<beamer>{
\begin{frame}[label=sherehiteresponse]
\frametitle{Women who fill in surveys in Love}
\begin{itemize}
\item 100,000 surveys were sent out, only 4,500 came back
\item There were 127 essay type questions
\item Are the 4,500 women who filled in such a survey typical of the 100,000 who were invited to take part? Are the 100,000 who were invited to take part typical of women in the US in the 1980s
\end{itemize}
\end{frame}
}
\mode<article>{
\includeslide{sherehiteresponse}
It is this last slide on response rate that really provides the key to the results presented. It is worth giving more information on the way the data were collected. Actually 100,000 surveys were sent out, only 4,500 came back. It may well be that those 100,000 surveys were sent to an essentially representative sample frame (alhough that is a little unlikely). But the key point is that we have a 4.5\% response rate. In addition, there were 127 essay type questions. So the concluding question we leave with is ``will the 4.5\% of respondents (who are willing to write all those essays about intimate aspects of their life) be typical of US women?''
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss the validity of the conclusions in the light of the low response rate
\end{bclogo}
Hopefully at this point the audience will be ready to discuss the next point.
}
\frame{
\frametitle{Learning outcome 2}
\begin{bclogo}[couleur=red!30, arrondi=0.1, sousTitre=GAISE 2010: 2, logo=\bctakecare, ombre=True]{}
Random sampling allows results of surveys %and experiments
to be extended to the
population from which the sample was taken.
%Random sampling from a population is what allows us to understand the population to which they apply.
\end{bclogo}
}
\mode<beamer>{
\begin{frame}[label=surveydiscussion]
\frametitle{Surveys or data?}
\begin{itemize}
\item There are large national surveys (carried out by agencies or national statistical bodies) which use fairly sophisticated variations on ``random sampling'', but done in a way they can produce results that are ``representative'' (think about election forecasting).
\item What about the surveys we commission locally / use locally?
\item What problems might there be with non-response?
\item Do we think these are representative?
\item What could we do to make them more representative?
\end{itemize}
\end{frame}
\begin{frame}[label=whatpop]
\frametitle{What's the population}
\begin{itemize}
\item A big part of understanding (and designing) surveys is thinking about the relevant population
\end{itemize}
\end{frame}
}
\mode<article>{
\includeslide{surveydiscussion}
First, we need to make a technical point (about non-response) and the value of random sampling. It can be a strange idea to think that randomly selected people can ``represent'' a population.
But secondly, we need to think carefully about the ``population to which these results apply''
\includeslide{whatpop}
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss the target ``population'' which is being examined by this survey. Consider examples provided by the course participants.
\end{bclogo}
}
\mode<beamer>{
\begin{frame}[label=brakemobilephone]
\frametitle{Mobile Phone Killer Crash Risk}
Source: www.brake.org.uk/handheld-mobile-phone-use-one-rise-brake-reaction
\begin{itemize}
\item Survey of 21 year old students
\item Is that typical
\end{itemize}
\end{frame}
}
\mode<article>{
\includeslide{brakemobilephone}
The slide above is typical of one that has been submitted by a course participant in the pre-course activities. The aim now is to discuss carefully what this headline is telling us
}
\mode<beamer>{
\frame{
\frametitle{Happiness Census}
100 people questioned, all respond.
%\Smiley
%\Repeat{0}{test }
\Large
\Repeat{20}{{\color{red}\Smiley }}
\Repeat{20}{{\color{red}\Smiley }}
\Repeat{20}{{\color{red} \Smiley }}
\Repeat{20}{{\color{red} \Smiley }}
\Repeat{20}{{\color{blue} \Frowny }}
%\Repeat{3}{test }
%\Repeat{4}{test }
%\Repeat{5}{test }
\normalsize
\begin{itemize}
\item 80 / 100 are happy, the proportion of happy people is 80\%
\item If this were a random sample of (say) 1,000 students we could do something ``statistical'' (compute a confidence interval for the proportion of happy students in the population). But we don't need to. Here, 100 is the population
\end{itemize}
}
\frame{
\frametitle{Happiness Census}
More realistically, 100 people questioned, 60 respond.
%\Smiley
%\Repeat{0}{test }
\Large
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{8}{{\color{red}\Smiley}}\Repeat{12}{{\color{blue}\Frowny}}
\normalsize
\Repeat{20}{{\color{gray}\Neutral}}
\Repeat{20}{{\color{gray}\Neutral}}
%\Repeat{3}{test }
%\Repeat{4}{test }
%\Repeat{5}{test }
\normalsize
\begin{itemize}
\item 48 / 60 are happy; the proportion of happy people \emph{\color{red}who responded to the ``survey''} is 80\%
\item We \emph{\color{blue}can}\footnote{\color{blue}although we shouldn't} plug these numbers into a computer and do something statistical to get a 95\% confidence interval for the \emph{\color{red}population} proportion as (68\%, 88\%). But just see how silly that is on the next two slides.
\end{itemize}
}
\frame{
\frametitle{If all the non-responders were too busy being happy to sit behind a computer filling in surveys}
100 people questioned, 60 respond.
%\Smiley
%\Repeat{0}{test }
\Large
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{8}{{\color{red}\Smiley}}\Repeat{12}{{\color{blue}\Frowny}}
\Repeat{20}{{\color{gray}\Smiley}}
\Repeat{20}{{\color{gray}\Smiley}}
%\Repeat{3}{test }
%\Repeat{4}{test }
%\Repeat{5}{test }
\normalsize
\begin{itemize}
\item In the population, we have 88 / 100 who are are happy; the proportion of happy people \emph{\color{red}in the population} is 88\%
\end{itemize}
}
\frame{
\frametitle{Or if all the non-responders were too fed up to fill in surveys (no computers, we didn't teach them how to use computers etc.)}
100 people questioned, 60 respond.
%\Smiley
%\Repeat{0}{test }
\Large
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{20}{{\color{red}\Smiley}}
\Repeat{8}{{\color{red}\Smiley}}\Repeat{12}{{\color{blue}\Frowny}}
\Repeat{20}{{\color{gray}\Frowny}}
\Repeat{20}{{\color{gray}\Frowny}}
\normalsize
\begin{itemize}
\item In the population, we have 48 / 100 who are are happy; the proportion of happy people \emph{\color{red}in the population} is 48\%
\end{itemize}
}
\frame{
\frametitle{Non-response to surveys}
\begin{itemize}
\item Non-response is a real problem
\item There are some (really quite advanced, i.e., rather specialised, i.e., think expensive expert) methods which try to help out. Clever as they are (i.e., I like playing with them) they are not a magic wand.
\item Putting effort into minimising non-response (or at least understanding who is and isn't responding) is really quite important
\end{itemize}
}
}
\mode<article>{\newpage}
\section{Randomness is Natural}
\mode<article>{
I've just bought myself a full size Alan Sugar facemask to make this next exercise a little more interesting. However, the basic idea is one of W.Edwards Deming.
\begin{itemize}
\item Get (about) 8 volunteers
\item Give them a task. I have used sampling beads from an urn (``make batches of yellow beads'', I have used die (``get a five or a six'') and I have used coins (``throw the coin eight times and get at least six heads'').
\item Collect results from the volunteers. Put on the Alan Sugar mask and ``fire'' the worst performing individual. Promote the best performing individual (who is now called Karen or Nick). Maybe put a big ``D'' hat on anyone who almost got fired. Run another round.
\item Fire/promote/reward as necessary
\item Repeat until there is a winner, who gets a bag of chocolates or other token prize
\end{itemize}
At this point we can have an interesting discussion about ``fairness''. One advantage of the dice/coins is that we can fairly easily get a sense of what a typical value should be. In fact, we can even formalise this as an expected value.
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Carry out this (Deming) based illustration of randomness. Discuss the fairness of the findings. Consider expected value if appropriate.
\end{bclogo}
}
\mode<beamer>{
\begin{frame}[label=pi1]
\frametitle{Fictional performance indicator}
\includegraphics[width=0.7\textwidth]{pi1}
\end{frame}
\begin{frame}[label=pi2]
\frametitle{How we normally use these data}
\includegraphics[width=0.7\textwidth]{pi2}
\end{frame}
\begin{frame}[label=pi3]
\frametitle{Isn't this just random blips either side of a trend?}
\includegraphics[width=0.7\textwidth]{pi3}
\end{frame}
}
\mode<article>{
We close this session with some fictional (or some real, if it's available) Performance INdicator data.
\includeslide{pi1}
This kind of slide is familiar (maybe we don't even need the trend to start with)
\includeslide{pi2}
And this kind of treatment is familiar.
\includeslide{pi3}
So the final slide should prompt a bit of a discussion. How do we determine trends (and should they be rewarded for the trend?); how do we determine blips. I've often ended up explaining things like CUSUM charts at this point - maybe something should be added. We want to avoid cynicism (it's all randomness). But most of the course participants I've had to date are the subject of performance monitoring and perhaps would expect the producers to give them the information as a CUSUM chart.
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Discuss the importance of distinguishing signal from noise (randomness from substantive)
\end{bclogo}
}
\frame{
\frametitle{Some real data}
\begin{center}
\begin{table}
\begin{tabular}{lrr}
Year & Per capita & Hourly \\
\hline
1959 & 12,985 & 6.69 \\
1964 & 14,707 & 7.33 \\
1969 & 17,477 & 7.98 \\
1974 & 18,989 & 8.24 \\
1979 & 21,635 & 8.17 \\
1984 & 23,171 & 7.80 \\
1989 & 26,552 & 7.64 \\
1994 & 28,156 & 7.40 \\
1999 & 32,429 & 7.86
\end{tabular}
\caption{(From US Department of Commerce ``Economic report of the President 2000'' cited by Joel Best, all prices USD)}
\end{table}
\end{center}
\begin{itemize}
\item How do we present these data?
\item What is it telling us?
\end{itemize}
}
\mode<beamer>{
\begin{frame}[label=allmakeheads]
\frametitle{Modelling randomness}
\begin{itemize}
\item Now we need everyone to toss a coin six times
\item Count the number of heads
\item (we may need to do this a second time if the numbers are small)
\item Let's graph the results
\end{itemize}
\end{frame}
\begin{frame}[label=binomial]
\frametitle{One theoretical model}
\begin{displaymath}
P[X=x] = \binom{n}{x} p^x (1-p)^{n-x}
\end{displaymath}
\begin{center}
\includegraphics[width = 0.5\textwidth]{binom}
\end{center}
\end{frame}
}
\mode<article>{
We can do a simple experiment to try to persuade people we have models for randomness. Get everyone to toss six coins, and record the number of heads they get (if it's a very small group they may have to do this twice).
\includeslide{allmakeheads}
We can collate the results as a bar/tally graph, and compare them with a simple theoretical model (the binomial)
\begin{tabular}{l|l}
Heads & People \\
\hline
0 & \\
1 & $\times$ %\StrokeOne
\\
2 & $\times \times \times$%\StrokeThree
\\
3 & $\times \times \times \times \times \times \times $%\StrokeFive \StrokeThree
\\
4 & $\times \times $ %\StrokeTwo
\\
5 & \\
6 & $\times$ %\StrokeOne
\\
\end{tabular}
We can compare this flipchart/whiteboard with the theoretical model below
\begin{bclogo}[couleur=blue!30, arrondi=0.1, sousTitre=Activity, logo=\bcquestion, ombre=True]{}
Get everyone to carry out a coin tossing experiment. Collate the results. Compare them to the ``theoretical'' model.
\end{bclogo}
\includeslide{binomial}
}
\frame{
\frametitle{Learning outcome 3}
\begin{bclogo}[couleur=red!30, arrondi=0.1, sousTitre=GAISE 2012: 2, logo=\bctakecare, ombre=True]{}
Variability is natural and is also predictable and quantifiable
\end{bclogo}
}
\begin{frame}<beamer:1>
\frametitle{What's the first thing you see}
\begin{center}
\includegraphics[width=0.5\textwidth]{Facevase}
\end{center}
\end{frame}
\begin{frame}<beamer:1>
\frametitle{What's the first thing you see}
\begin{center}
\includegraphics[width=0.5\textwidth]{girlsax}
\end{center}
\end{frame}
\begin{frame}<beamer:1>
\frametitle{What's the character in the middle?}
\begin{center}
\includegraphics[width = 0.7\textwidth]{aoi}
\end{center}
\end{frame}
\begin{frame}<beamer:1>
\frametitle{What's the first word you see?}
\begin{center}
\includegraphics[width = 0.7\textwidth]{goodevil}
\end{center}
\end{frame}
%%\begin{frame}<beamer:1>
%%\frametitle{An aside}
%%\begin{center}
%%\includegraphics[width = 0.6\textwidth]{Optical-Illusion-Art-01}
%%\end{center}
%%\end{frame}
%\begin{frame}<beamer:1>
%\frametitle{And finally}
%\begin{center}
%\movie[externalviewer]{\includegraphics[width=0.5\textwidth]{meyou}}{/home/phewson/Downloads/bb.mpeg}
%\end{center}
%\end{frame}
\mode<article>{\newpage}
\section{Contextual variables}
\subsection{Correlation}
\frame{
\frametitle{Opening question}
\begin{itemize}
\item What does the word ``correlation'' mean to you?
\item What do you think it means?
\end{itemize}
}
%\frame{
% \frametitle{Linear association}
%Important jargon for this week
%\begin{itemize}
%\item Correlation
%\item Linear association
%\item Lurking variable
%\end{itemize}
%}
%\section{Lurking variables}
%\frame{
% \frametitle{Correlation does not imply causation}
%\begin{center}
%\includegraphics[width = 0.5\textwidth]{IceCreamSales}
%\end{center}
%}
%\frame{
% \frametitle{Lurking variables}
%\begin{itemize}
%\item<1-| alert@1> Correlation (over time) between consumption of diet fizzy drinks and traffic accidents
%\item<2-| alert@2> Correlation over time between teacher salaries and prescription drug costs
%\item<3-| alert@3> Correlation between average salary and time taken to run 1500 metres.
%\item<4-| alert@4> Correlation between the number of people who attend church and the number of alcohol related violent incident (sample, 200 cities)
%\item<5-| alert@5>
%\end{itemize}
%}
%\section{Correlation coefficient}
\frame{
\frametitle{Correlation coefficient}
\begin{itemize}
\item An attempt to produce a single number to describe the linear association between two variables
\item Upper value is +1 (perfect positive linear association)
\item Lower value is -1 (perfect negative linear association)
\item Middle value is 0 (no linear association)
\end{itemize}
}
\frame{
\frametitle{Correlation coefficient of 1}
<<rho1, fig = TRUE, echo = FALSE, results = hide>>=
x <- runif(10, 0, 10)
y <- x
plot(y~x, pch = 16, col = "red", cex = 2, main = "Correlation coefficient = 1", xlab = "x", ylab = "y")
@
}
\frame{
\frametitle{Correlation coefficient of -1}
<<rhom1, fig = TRUE, echo = FALSE, results = hide>>=
x <- runif(10, 0, 10)
y <- -x + 10
plot(y~x, pch = 16, col = "red", cex = 2, main = "Correlation coefficient = -1", xlab = "x", ylab = "y")
@
}
\frame{
\frametitle{Correlation coefficient of 0}