forked from swcarpentry/r-novice-inflammation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-cond.html
278 lines (277 loc) · 23.1 KB
/
04-cond.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Programming with R</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Programming with R</h1></a>
<h2 class="subtitle">Making choices</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Save plot(s) in a pdf file.</li>
<li>Write conditional statements with <code>if</code> and <code>else</code>.</li>
<li>Correctly evaluate expressions containing <code>&</code> (“and”) and <code>|</code> (“or”).</li>
</ul>
</div>
</section>
<p>Our previous lessons have shown us how to manipulate data, define our own functions, and repeat things. However, the programs we have written so far always do the same things, regardless of what data they’re given. We want programs to make choices based on the values they are manipulating.</p>
<h3 id="saving-plots-to-a-file">Saving Plots to a File</h3>
<p>So far, we have built a function <code>analyze</code> to plot summary statistics of the inflammation data:</p>
<pre class="sourceCode r"><code class="sourceCode r">analyze <-<span class="st"> </span>function(filename) {
<span class="co"># Plots the average, min, and max inflammation over time.</span>
<span class="co"># Input is character string of a csv file.</span>
dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="dt">file =</span> filename, <span class="dt">header =</span> <span class="ot">FALSE</span>)
avg_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, mean)
<span class="kw">plot</span>(avg_day_inflammation)
max_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, max)
<span class="kw">plot</span>(max_day_inflammation)
min_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, min)
<span class="kw">plot</span>(min_day_inflammation)
}</code></pre>
<p>And also built the function <code>analyze_all</code> to automate the processing of each data file:</p>
<pre class="sourceCode r"><code class="sourceCode r">analyze_all <-<span class="st"> </span>function(pattern) {
<span class="co"># Runs the function analyze for each file in the current working directory</span>
<span class="co"># that contains the given pattern.</span>
filenames <-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">path =</span> <span class="st">"data"</span>, <span class="dt">pattern =</span> pattern, <span class="dt">full.names =</span> <span class="ot">TRUE</span>)
for (f in filenames) {
<span class="kw">analyze</span>(f)
}
}</code></pre>
<p>While these are useful in an interactive R session, what if we want to send our results to our collaborators? Since we currently have 12 data sets, running <code>analyze_all</code> creates 36 plots. Saving each of these individually would be tedious and error-prone. And in the likely situation that we want to change how the data is processed or the look of the plots, we would have to once again save all 36 before sharing the updated results with our collaborators.</p>
<p>Here’s how we can save all three plots of the first inflamation data set in a pdf file:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">pdf</span>(<span class="st">"inflammation-01.pdf"</span>)
<span class="kw">analyze</span>(<span class="st">"data/inflammation-01.csv"</span>)
<span class="kw">dev.off</span>()</code></pre>
<p>The function <code>pdf</code> redirects all the plots generated by R into a pdf file, which in this case we have named “inflammation-01.pdf”. After we are done generating the plots to be saved in the pdf file, we stop R from redirecting plots with the function <code>dev.off</code>.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Tip</h2>
</div>
<div class="panel-body">
<p>If you run <code>pdf</code> multiple times without running <code>dev.off</code>, you will save plots to the most recently opened file. However, you won’t won’t be able to open the previous pdf files because the connections were not closed. In order to get out of this situation, you’ll need to run <code>dev.off</code> until all the pdf connections are closed. You can check your current status using the function <code>dev.cur</code>. If it says “pdf”, all your plots are being saved in the last pdf specified. If it says “null device” or “RStudioGD”, the plots will be visualized normally.</p>
</div>
</aside>
<p>We can update the <code>analyze</code> function so that it always saves the plots in a pdf. But that would make it more difficult to interactively test out new changes. It would be ideal if <code>analyze</code> would either save or not save the plots based on its input.</p>
<h3 id="conditionals">Conditionals</h3>
<p>In order to update our function to decide between saving or not, we need to write code that automatically decides between multiple options. The tool R gives us for doing this is called a <a href="reference.html#conditional-statement">conditional statement</a>, and looks like this:</p>
<pre class="sourceCode r"><code class="sourceCode r">num <-<span class="st"> </span><span class="dv">37</span>
if (num ><span class="st"> </span><span class="dv">100</span>) {
<span class="kw">print</span>(<span class="st">"greater"</span>)
} else {
<span class="kw">print</span>(<span class="st">"not greater"</span>)
}
<span class="kw">print</span>(<span class="st">"done"</span>)</code></pre>
<pre class="output"><code>[1] "not greater"
[1] "done"
</code></pre>
<p>The second line of this code uses an <code>if</code> statement to tell R that we want to make a choice. If the following test is true, the body of the <code>if</code> (i.e., the lines in the curly braces underneath it) are executed. If the test is false, the body of the <code>else</code> is executed instead. Only one or the other is ever executed:</p>
<p><img src="fig/python-flowchart-conditional.svg" alt="Executing a Conditional" /></p>
<p>In the example above, the test <code>num > 100</code> returns the value <code>FALSE</code>, which is why the code inside the <code>if</code> block was skipped and the code inside the <code>else</code> statment was run instead.</p>
<pre class="sourceCode r"><code class="sourceCode r">num ><span class="st"> </span><span class="dv">100</span></code></pre>
<pre class="output"><code>[1] FALSE
</code></pre>
<p>And as you likely guessed, the opposite of <code>FALSE</code> is <code>TRUE</code>.</p>
<pre class="sourceCode r"><code class="sourceCode r">num <<span class="st"> </span><span class="dv">100</span></code></pre>
<pre class="output"><code>[1] TRUE
</code></pre>
<p>Conditional statements don’t have to include an <code>else</code>. If there isn’t one, R simply does nothing if the test is false:</p>
<pre class="sourceCode r"><code class="sourceCode r">num <-<span class="st"> </span><span class="dv">53</span>
if (num ><span class="st"> </span><span class="dv">100</span>) {
<span class="kw">print</span>(<span class="st">"num is greater than 100"</span>)
}</code></pre>
<p>We can also chain several tests together when there are more than two options. This makes it simple to write a function that returns the sign of a number:</p>
<pre class="sourceCode r"><code class="sourceCode r">sign <-<span class="st"> </span>function(num) {
if (num ><span class="st"> </span><span class="dv">0</span>) {
<span class="kw">return</span>(<span class="dv">1</span>)
} else if (num ==<span class="st"> </span><span class="dv">0</span>) {
<span class="kw">return</span>(<span class="dv">0</span>)
} else {
<span class="kw">return</span>(-<span class="dv">1</span>)
}
}
<span class="kw">sign</span>(-<span class="dv">3</span>)</code></pre>
<pre class="output"><code>[1] -1
</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sign</span>(<span class="dv">0</span>)</code></pre>
<pre class="output"><code>[1] 0
</code></pre>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sign</span>(<span class="dv">2</span>/<span class="dv">3</span>)</code></pre>
<pre class="output"><code>[1] 1
</code></pre>
<p>Note that the test for equality uses two equal signs, <code>==</code>.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Tip</h2>
</div>
<div class="panel-body">
<p>Other tests include greater than or equal to (<code>>=</code>), less than or equal to (<code><=</code>), and not equal to (<code>!=</code>).</p>
</div>
</aside>
<p>We can also combine tests. An ampersand, <code>&</code>, symbolizes “and”. A vertical bar, <code>|</code>, symbolizes “or”. <code>&</code> is only true if both parts are true:</p>
<pre class="sourceCode r"><code class="sourceCode r">if (<span class="dv">1</span> ><span class="st"> </span><span class="dv">0</span> &<span class="st"> </span>-<span class="dv">1</span> ><span class="st"> </span><span class="dv">0</span>) {
<span class="kw">print</span>(<span class="st">"both parts are true"</span>)
} else {
<span class="kw">print</span>(<span class="st">"at least one part is not true"</span>)
}</code></pre>
<pre class="output"><code>[1] "at least one part is not true"
</code></pre>
<p>while <code>|</code> is true if either part is true:</p>
<pre class="sourceCode r"><code class="sourceCode r">if (<span class="dv">1</span> ><span class="st"> </span><span class="dv">0</span> |<span class="st"> </span>-<span class="dv">1</span> ><span class="st"> </span><span class="dv">0</span>) {
<span class="kw">print</span>(<span class="st">"at least one part is true"</span>)
} else {
<span class="kw">print</span>(<span class="st">"neither part is true"</span>)
}</code></pre>
<pre class="output"><code>[1] "at least one part is true"
</code></pre>
<p>In this case, “either” means “either or both”, not “either one or the other but not both”.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pencil"></span>Challenge - Using conditions to change behaviour</h2>
</div>
<div class="panel-body">
<ul>
<li>Write a function, <code>plot_dist</code>, that plots a boxplot if the length of the vector is greater than a specified threshold and a stripchart otherwise. To do this you’ll use the R functions <code>boxplot</code> and <code>stripchart</code>.</li>
</ul>
<pre class="sourceCode r"><code class="sourceCode r">dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"data/inflammation-01.csv"</span>, <span class="dt">header =</span> <span class="ot">FALSE</span>)
<span class="kw">plot_dist</span>(dat[, <span class="dv">10</span>], <span class="dt">threshold =</span> <span class="dv">10</span>) <span class="co"># day (column) 10</span></code></pre>
<p><img src="fig/04-cond-using-conditions-01-1.png" title="plot of chunk using-conditions-01" alt="plot of chunk using-conditions-01" style="display: block; margin: auto;" /></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot_dist</span>(dat[<span class="dv">1</span>:<span class="dv">5</span>, <span class="dv">10</span>], <span class="dt">threshold =</span> <span class="dv">10</span>) <span class="co"># samples (rows) 1-5 on day (column) 10</span></code></pre>
<p><img src="fig/04-cond-using-conditions-01-2.png" title="plot of chunk using-conditions-01" alt="plot of chunk using-conditions-01" style="display: block; margin: auto;" /></p>
<ul>
<li>One of your collaborators prefers to see the distributions of the larger vectors as a histogram instead of as a boxplot. In order to choose between a histogram and a boxplot we will edit the function <code>plot_dist</code> and add an additional argument <code>use_boxplot</code>. By defualt we will set <code>use_boxplot</code> to <code>TRUE</code> which will create a boxplot when the vector is longer than <code>threshold</code>. When <code>use_boxplot</code> is set to <code>FALSE</code>, <code>plot_dist</code> will instead plot a histogram for the larger vectors. As before, if the length of the vector is shorter than <code>threshold</code>, <code>plot_dist</code> will create a stripchart. A histogram is made with the <code>hist</code> command in R.</li>
</ul>
<pre class="sourceCode r"><code class="sourceCode r">dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"data/inflammation-01.csv"</span>, <span class="dt">header =</span> <span class="ot">FALSE</span>)
<span class="kw">plot_dist</span>(dat[, <span class="dv">10</span>], <span class="dt">threshold =</span> <span class="dv">10</span>, <span class="dt">use_boxplot =</span> <span class="ot">TRUE</span>) <span class="co"># day (column) 10 - create boxplot</span></code></pre>
<p><img src="fig/04-cond-conditional-challenge-hist-1.png" title="plot of chunk conditional-challenge-hist" alt="plot of chunk conditional-challenge-hist" style="display: block; margin: auto;" /></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot_dist</span>(dat[, <span class="dv">10</span>], <span class="dt">threshold =</span> <span class="dv">10</span>, <span class="dt">use_boxplot =</span> <span class="ot">FALSE</span>) <span class="co"># day (column) 10 - create histogram</span></code></pre>
<p><img src="fig/04-cond-conditional-challenge-hist-2.png" title="plot of chunk conditional-challenge-hist" alt="plot of chunk conditional-challenge-hist" style="display: block; margin: auto;" /></p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot_dist</span>(dat[<span class="dv">1</span>:<span class="dv">5</span>, <span class="dv">10</span>], <span class="dt">threshold =</span> <span class="dv">10</span>) <span class="co"># samples (rows) 1-5 on day (column) 10</span></code></pre>
<p><img src="fig/04-cond-conditional-challenge-hist-3.png" title="plot of chunk conditional-challenge-hist" alt="plot of chunk conditional-challenge-hist" style="display: block; margin: auto;" /></p>
</div>
</section>
<h3 id="saving-automatically-generated-figures">Saving Automatically Generated Figures</h3>
<p>Now that we know how to have R make decisions based on input values, let’s update <code>analyze</code>:</p>
<pre class="sourceCode r"><code class="sourceCode r">analyze <-<span class="st"> </span>function(filename, <span class="dt">output =</span> <span class="ot">NULL</span>) {
<span class="co"># Plots the average, min, and max inflammation over time.</span>
<span class="co"># Input:</span>
<span class="co"># filename: character string of a csv file</span>
<span class="co"># output: character string of pdf file for saving</span>
if (!<span class="kw">is.null</span>(output)) {
<span class="kw">pdf</span>(output)
}
dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="dt">file =</span> filename, <span class="dt">header =</span> <span class="ot">FALSE</span>)
avg_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, mean)
<span class="kw">plot</span>(avg_day_inflammation)
max_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, max)
<span class="kw">plot</span>(max_day_inflammation)
min_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, min)
<span class="kw">plot</span>(min_day_inflammation)
if (!<span class="kw">is.null</span>(output)) {
<span class="kw">dev.off</span>()
}
}</code></pre>
<p>We added an argument, <code>output</code>, that by default is set to <code>NULL</code>. An <code>if</code> statement at the beginning checks the argument <code>output</code> to decide whether or not to save the plots to a pdf. Let’s break it down. The function <code>is.null</code> returns <code>TRUE</code> if a variable is <code>NULL</code> and <code>FALSE</code> otherwise. The exclamation mark, <code>!</code>, stands for “not”. Therefore the line in the <code>if</code> block is only executed if <code>output</code> is “not null”.</p>
<pre class="sourceCode r"><code class="sourceCode r">output <-<span class="st"> </span><span class="ot">NULL</span>
<span class="kw">is.null</span>(output)</code></pre>
<pre class="output"><code>[1] TRUE
</code></pre>
<pre class="sourceCode r"><code class="sourceCode r">!<span class="kw">is.null</span>(output)</code></pre>
<pre class="output"><code>[1] FALSE
</code></pre>
<p>Now we can use <code>analyze</code> both interactively:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">analyze</span>(<span class="st">"data/inflammation-01.csv"</span>)</code></pre>
<p><img src="fig/04-cond-inflammation-01-1.png" title="plot of chunk inflammation-01" alt="plot of chunk inflammation-01" style="display: block; margin: auto;" /><img src="fig/04-cond-inflammation-01-2.png" title="plot of chunk inflammation-01" alt="plot of chunk inflammation-01" style="display: block; margin: auto;" /><img src="fig/04-cond-inflammation-01-3.png" title="plot of chunk inflammation-01" alt="plot of chunk inflammation-01" style="display: block; margin: auto;" /></p>
<p>and to save plots:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">analyze</span>(<span class="st">"data/inflammation-01.csv"</span>, <span class="dt">output =</span> <span class="st">"inflammation-01.pdf"</span>)</code></pre>
<p>This now works well when we want to process one data file at a time, but how can we specify the output file in <code>analyze_all</code>? We need to substitute the filename ending “csv” with “pdf”, which we can do using the function <code>sub</code>:</p>
<pre class="sourceCode r"><code class="sourceCode r">f <-<span class="st"> "data/inflammation-01.csv"</span>
<span class="kw">sub</span>(<span class="st">"csv"</span>, <span class="st">"pdf"</span>, f)</code></pre>
<pre class="output"><code>[1] "data/inflammation-01.pdf"
</code></pre>
<p>Now let’s update <code>analyze_all</code>:</p>
<pre class="sourceCode r"><code class="sourceCode r">analyze_all <-<span class="st"> </span>function(pattern) {
<span class="co"># Runs the function analyze for each file in the current working directory</span>
<span class="co"># that contains the given pattern.</span>
filenames <-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">path =</span> <span class="st">"data"</span>, <span class="dt">pattern =</span> pattern, <span class="dt">full.names =</span> <span class="ot">TRUE</span>)
for (f in filenames) {
pdf_name <-<span class="st"> </span><span class="kw">sub</span>(<span class="st">"csv"</span>, <span class="st">"pdf"</span>, f)
<span class="kw">analyze</span>(f, <span class="dt">output =</span> pdf_name)
}
}</code></pre>
<p>Now we can save all of the results with just one line of code:</p>
<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">analyze_all</span>(<span class="st">"inflammation"</span>)</code></pre>
<p>Now if we need to make any changes to our analysis, we can edit the <code>analyze</code> function and quickly regenerate all the figures with <code>analzye_all</code>.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pencil"></span>Challenge - Changing the behaviour of the plot command</h2>
</div>
<div class="panel-body">
<ul>
<li>One of your collaborators asks if you can recreate the figures with lines instead of points. Find the relevant argument to <code>plot</code> by reading the documentation (<code>?plot</code>), update <code>analyze</code>, and then recreate all the figures with <code>analyze_all</code>.</li>
</ul>
</div>
</section>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Key Points</h2>
</div>
<div class="panel-body">
<ul>
<li>Save a plot in a pdf file using <code>pdf("name.pdf")</code> and stop writing to the pdf file with <code>dev.off()</code>.</li>
<li>Use <code>if (condition)</code> to start a conditional statement, <code>else if (condition)</code> to provide additional tests, and <code>else</code> to provide a default.</li>
<li>The bodies of conditional statements must be surrounded by curly braces <code>{ }</code>.</li>
<li>Use <code>==</code> to test for equality.</li>
<li><code>X & Y</code> is only true if both X and Y are true.</li>
<li><code>X | Y</code> is true if either X or Y, or both, are true.</li>
</ul>
</div>
</aside>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Next Steps</h2>
</div>
<div class="panel-body">
<p>We have now seen the basics of interactively building R code. The last thing we need to learn is how to build command-line programs that we can use in pipelines and shell scripts, so that we can integrate our tools with other people’s work. This will be the subject of our next and final lesson.</p>
</div>
</aside>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/r-novice-inflammation">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>