-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchapter1.html
1399 lines (1385 loc) · 224 KB
/
chapter1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Another Book on Data Science - Introduction to R/Python Programming</title>
<meta name="description" content="data science, R, Python, programming, machine learning">
<meta name="author" content="Nailong Zhang">
<!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
<!--[if lt IE 9]>
<script src="https://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<!-- Le styles -->
<link rel="stylesheet" href="bootstrap-1.1.0.min.css">
<link rel="stylesheet" href="style.css">
<link rel="stylesheet" href="small-screens.css">
<link rel="stylesheet" href="vs.css">
<link rel="stylesheet" href="code.css">
<link rel="stylesheet" href="application.css">
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-142297640-1', 'anotherbookondatascience.com');
ga('send', 'pageview');
</script>
</head>
<body>
<div class="topbar">
<div class="fill">
<div class="container-fluid fixed">
<h3><a href="index.html">Introduction to R/Python Programming</a></h3>
<ul class="nav secondary-nav">
<li><a href="chapter2.html">Next»</a></li>
</ul>
</div>
</div>
</div>
<div class="container-fluid" style="padding-top: 60px;">
<p>Sections in this Chapter:</p>
<ul>
<li><a href="#calculator">Calculator</a></li>
<li><a href="#variable">Variable & type</a></li>
<li><a href="#functions">Functions</a></li>
<li><a href="#control">Control flows</a></li>
<li><a href="#builtindata">Some built-in data structures</a></li>
<li><a href="#revisit">Revisit of variables</a></li>
<li><a href="#oop">Object-oriented programming (<span class="caps">OOP</span>) in R/Python</a></li>
<li><a href="#miscellaneous">Miscellaneous</a></li>
</ul>
<h2 id="calculator">Calculator</h2>
<p>R and Python are general-purpose programming languages that can be used for writing softwares in a variety of domains. But for now, let us start from using them as basic calculators. The first thing is to have them installed. R<sup class="footnote" id="fnr1"><a href="#fn1">1</a></sup> and Python<sup class="footnote" id="fnr2"><a href="#fn2">2</a></sup> can be downloaded from their official website. In this book, I would keep using R 3.5 and Python 3.7.</p>
<p>To use R/Python as basic calculators, let’s get familiar with the interactive mode. After the installation, we can type R or Python (it is case insensitive so we can also type r/python) to invoke the interactive mode. Since Python 2 is installed by default on many machines, in order to avoid invoking Python 2 we type python3.7 instead.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">~</span> <span class="o">$</span>R
<span class="lineno"> 2 </span>
<span class="lineno"> 3 </span>R version <span class="m">3.5.1</span> <span class="p">(</span><span class="m">2018-07-02</span><span class="p">)</span> <span class="o">--</span> <span class="s">"Feather Spray"</span>
<span class="lineno"> 4 </span>Copyright <span class="p">(</span>C<span class="p">)</span> <span class="m">2018</span> The R Foundation <span class="kr">for</span> Statistical Computing
<span class="lineno"> 5 </span>Platform<span class="o">:</span> x86_64<span class="o">-</span>apple<span class="o">-</span>darwin15.6.0 <span class="p">(</span><span class="m">64</span><span class="o">-</span>bit<span class="p">)</span>
<span class="lineno"> 6 </span>
<span class="lineno"> 7 </span>R is free software and comes with ABSOLUTELY NO WARRANTY.
<span class="lineno"> 8 </span>You are welcome to redistribute it under certain conditions.
<span class="lineno"> 9 </span>Type <span class="s">'license()'</span> or <span class="s">'licence()'</span> <span class="kr">for</span> distribution details.
<span class="lineno">10 </span>
<span class="lineno">11 </span> Natural language support but running <span class="kr">in</span> an English locale
<span class="lineno">12 </span>
<span class="lineno">13 </span>R is a collaborative project with many contributors.
<span class="lineno">14 </span>Type <span class="s">'contributors()'</span> <span class="kr">for</span> more information and
<span class="lineno">15 </span><span class="s">'citation()'</span> on how to cite R or R packages <span class="kr">in</span> publications.
<span class="lineno">16 </span>
<span class="lineno">17 </span>Type <span class="s">'demo()'</span> <span class="kr">for</span> some demos<span class="p">,</span> <span class="s">'help()'</span> <span class="kr">for</span> on<span class="o">-</span>line help<span class="p">,</span> or
<span class="lineno">18 </span><span class="s">'help.start()'</span> <span class="kr">for</span> an HTML browser interface to help.
<span class="lineno">19 </span>Type <span class="s">'q()'</span> to quit R.
<span class="lineno">20 </span>
<span class="lineno">21 </span><span class="o">></span> </code></pre></figure><language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">~</span> <span class="err">$</span><span class="n">python3</span><span class="o">.</span><span class="mi">7</span>
<span class="lineno">2 </span><span class="n">Python</span> <span class="mf">3.7</span><span class="o">.</span><span class="mi">1</span> <span class="p">(</span><span class="n">default</span><span class="p">,</span> <span class="n">Nov</span> <span class="mi">6</span> <span class="mi">2018</span><span class="p">,</span> <span class="mi">18</span><span class="p">:</span><span class="mi">45</span><span class="p">:</span><span class="mi">35</span><span class="p">)</span>
<span class="lineno">3 </span><span class="p">[</span><span class="n">Clang</span> <span class="mf">10.0</span><span class="o">.</span><span class="mi">0</span> <span class="p">(</span><span class="n">clang</span><span class="o">-</span><span class="mf">1000.11</span><span class="o">.</span><span class="mf">45.5</span><span class="p">)]</span> <span class="n">on</span> <span class="n">darwin</span>
<span class="lineno">4 </span><span class="n">Type</span> <span class="s2">"help"</span><span class="p">,</span> <span class="s2">"copyright"</span><span class="p">,</span> <span class="s2">"credits"</span> <span class="ow">or</span> <span class="s2">"license"</span> <span class="k">for</span> <span class="n">more</span> <span class="n">information</span><span class="o">.</span>
<span class="lineno">5 </span><span class="o">>>></span> </code></pre></figure><p>The messages displayed by invoking the interactive mode depend on both the version of R/Python installed and the machine. Thus, you may see different messages on your local machine.</p>
<p>As the messages said, to quit R we can type <code>q()</code>. There are 3 options prompted by asking the user if the workspace should be saved or not. Since we just want to use R as a basic calculator, we quit without saving workspace.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="kp">q</span><span class="p">()</span>
<span class="lineno">2 </span>Save workspace image<span class="o">?</span> <span class="p">[</span>y<span class="o">/</span>n<span class="o">/</span><span class="kt">c</span><span class="p">]</span><span class="o">:</span> n
<span class="lineno">3 </span><span class="o">~</span> <span class="o">$</span></code></pre></figure><p>To quit Python, we can simply type <code>exit()</code>.</p>
<p>Once we are inside the interactive mode, we can use R/Python as a calculator.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="m">1+1</span>
<span class="lineno">2 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno">3 </span><span class="o">></span> <span class="m">2</span><span class="o">*</span><span class="m">3+5</span>
<span class="lineno">4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">11</span>
<span class="lineno">5 </span><span class="o">></span> <span class="kp">log</span><span class="p">(</span><span class="m">2</span><span class="p">)</span>
<span class="lineno">6 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0.6931472</span>
<span class="lineno">7 </span><span class="o">></span> <span class="kp">exp</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="lineno">8 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="mi">1</span><span class="o">+</span><span class="mi">1</span>
<span class="lineno"> 2 </span><span class="mi">2</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="mi">2</span><span class="o">*</span><span class="mi">3</span><span class="o">+</span><span class="mi">5</span>
<span class="lineno"> 4 </span><span class="mi">11</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">log</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="lineno"> 6 </span><span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno"> 7 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno"> 8 </span><span class="ne">NameError</span><span class="p">:</span> <span class="n">name</span> <span class="s1">'log'</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">defined</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">exp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="lineno">10 </span><span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno">11 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno">12 </span><span class="ne">NameError</span><span class="p">:</span> <span class="n">name</span> <span class="s1">'exp'</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">defined</span></code></pre></figure></div>
</div>
<p>From the code snippet above, R is working as a calculator perfectly. However, errors are raised when we call <code>log(2)</code> and <code>exp(2)</code> in Python. The error messages are self-explanatory – <code>log</code> function and <code>exp</code> function don’t exist in the current Python environment. In fact, <code>log</code> function and <code>exp</code> function are defined in the <code>math</code> module in Python. A module<sup class="footnote" id="fnr3"><a href="#fn3">3</a></sup> is a file consisting of Python code. When we invoke the interactive mode of Python, a few built-in modules are loaded into the current environment by default. But the <code>math</code> module is not included in these built-in modules. That explains why we got the <code>NameError</code> when we try to use the functions defined in the <code>math</code> module. To resolve the issue, we should first load the functions to use by using the <code>import</code> statement as follows.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="kn">from</span> <span class="nn">math</span> <span class="k">import</span> <span class="n">log</span><span class="p">,</span><span class="n">exp</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">log</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="lineno">3 </span><span class="mf">0.6931471805599453</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">exp</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="lineno">5 </span><span class="mf">1.0</span></code></pre></figure><h2 id="variable">Variable & Type</h2>
<p>In the previous section we have seen how to use R/Python as calculators. Now, let’s see how to write real programs. First, let’s define some variables.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> a<span class="o">=</span><span class="m">2</span>
<span class="lineno"> 2 </span><span class="o">></span> b<span class="o">=</span><span class="m">5.0</span>
<span class="lineno"> 3 </span><span class="o">></span> x<span class="o">=</span><span class="s">'hello world'</span>
<span class="lineno"> 4 </span><span class="o">></span> a
<span class="lineno"> 5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno"> 6 </span><span class="o">></span> b
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">5</span>
<span class="lineno"> 8 </span><span class="o">></span> x
<span class="lineno"> 9 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"hello world"</span>
<span class="lineno">10 </span><span class="o">></span> e<span class="o">=</span>a<span class="o">*</span><span class="m">2</span><span class="o">+</span>b
<span class="lineno">11 </span><span class="o">></span> e
<span class="lineno">12 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">9</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">a</span><span class="o">=</span><span class="mi">2</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">b</span><span class="o">=</span><span class="mf">5.0</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="s1">'hello world'</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">a</span>
<span class="lineno"> 5 </span><span class="mi">2</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">b</span>
<span class="lineno"> 7 </span><span class="mf">5.0</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 9 </span><span class="s1">'hello world'</span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">e</span><span class="o">=</span><span class="n">a</span><span class="o">*</span><span class="mi">2</span><span class="o">+</span><span class="n">b</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">e</span>
<span class="lineno">12 </span><span class="mf">9.0</span></code></pre></figure></div>
</div>
<p>Here, we defined 4 different variables <code>a, b, x, e</code>. To get the type of each variable, we can utilize the function <code>typeof()</code> in R and <code>type()</code> in Python, respectively.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno">2 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"character"</span>
<span class="lineno">3 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span>e<span class="p">)</span>
<span class="lineno">4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"double"</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">str</span><span class="s1">'></span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
<span class="lineno">4 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">float</span><span class="s1">'></span></code></pre></figure></div>
</div>
<p>The type of <code>x</code> in R is called character, and in Python is called str.</p>
<h2 id="functions">Functions</h2>
<p>We have seen two functions <code>log</code> and <code>exp</code> when we use R/Python as calculators. A function is a block of code which performs a specific task. A major purpose of wrapping a block of code into a function is to reuse the code.</p>
<p>It is simple to define functions in R/Python.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> fun1<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span><span class="kr">return</span><span class="p">(</span>x<span class="o">*</span>x<span class="p">)}</span>
<span class="lineno">2 </span><span class="o">></span> fun1
<span class="lineno">3 </span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span><span class="kr">return</span><span class="p">(</span>x<span class="o">*</span>x<span class="p">)}</span>
<span class="lineno">4 </span><span class="o">></span> fun1<span class="p">(</span><span class="m">2</span><span class="p">)</span>
<span class="lineno">5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">4</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">fun1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno">2 </span><span class="o">...</span> <span class="k">return</span> <span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="c1"># note the indentation</span>
<span class="lineno">3 </span><span class="o">...</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">fun1</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="lineno">5 </span><span class="mi">4</span></code></pre></figure></div>
</div>
<p>Here, we defined a function <code>fun1</code> in R/Python. This function takes <code>x</code> as input and returns the square of <code>x</code>. When we call a function, we simply type the function name followed by the input argument inside a pair of parentheses. It is worth noting that input or output are not required to define a function. For example, we can define a function <code>fun2</code> to print <code>Hello World!</code> without input and output.</p>
<p>One major difference between R and Python codes is that Python codes are structured with indentation. Each logical line of R/Python code belongs to a certain group. In R, we use <code>{}</code> to determine the grouping of statements. However, in Python we use leading whitespace (spaces and tabs) at the beginning of a logical line to compute the indentation level of the line, which is used to determine the statements’ grouping. Let’s see what happens if we remove the leading whitespace in the Python function above.</p>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">fun1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno">2 </span><span class="o">...</span> <span class="k">return</span> <span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="c1"># note the indentation</span>
<span class="lineno">3 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">2</span>
<span class="lineno">4 </span> <span class="k">return</span> <span class="n">x</span><span class="o">*</span><span class="n">x</span> <span class="c1"># note the indentation</span>
<span class="lineno">5 </span> <span class="o">^</span>
<span class="lineno">6 </span><span class="ne">IndentationError</span><span class="p">:</span> <span class="n">expected</span> <span class="n">an</span> <span class="n">indented</span> <span class="n">block</span></code></pre></figure><p>We got an <code>IndentationError</code> because of missing indentation.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> fun2<span class="o">=</span><span class="kr">function</span><span class="p">(){</span><span class="kp">print</span><span class="p">(</span><span class="s">'Hello World!'</span><span class="p">)}</span>
<span class="lineno">2 </span><span class="o">></span> fun2<span class="p">()</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"Hello World!"</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">fun2</span><span class="p">():</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Hello World!'</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">...</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">fun2</span><span class="p">()</span>
<span class="lineno">4 </span><span class="n">Hello</span> <span class="n">World</span><span class="err">!</span>\<span class="n">end</span><span class="p">{</span><span class="n">python</span><span class="p">}</span></code></pre></figure></div>
</div>
<p>Let’s go back to <code>fun1</code> and have a closer look at the <code>return</code>. In Python, if we want to return something we have to use the keyword <code>return</code> explicitly. <code>return</code> in R is a function but it is not a function in Python and that is why no parenthesis follows <code>return</code> in Python. In R, <code>return</code> is not required even though we need to return something from the function. Instead, we can just put the variables to return in the last line of the function defined in R. That being said, we can define <code>fun1</code> as follows.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> fun1<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span>x<span class="o">*</span>x<span class="p">}</span></code></pre></figure><p>Sometimes we want to give a default value to an argument for a function, and both R and Python allow functions to have default values.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> log_fun <span class="o">=</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">,</span> base<span class="o">=</span><span class="m">2</span><span class="p">){</span>
<span class="lineno"> 2 </span><span class="o">+</span> <span class="kr">return</span><span class="p">(</span><span class="kp">log</span><span class="p">(</span>x<span class="p">,</span> base<span class="p">))</span>
<span class="lineno"> 3 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno"> 4 </span><span class="o">></span> log_fun<span class="p">(</span><span class="m">5</span><span class="p">,</span> base<span class="o">=</span><span class="m">2</span><span class="p">)</span>
<span class="lineno"> 5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2.321928</span>
<span class="lineno"> 6 </span><span class="o">></span> log_fun<span class="p">(</span><span class="m">5</span><span class="p">,</span> <span class="m">2</span><span class="p">)</span>
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2.321928</span>
<span class="lineno"> 8 </span><span class="o">></span> log_fun<span class="p">(</span>base<span class="o">=</span><span class="m">2</span><span class="p">,</span> <span class="m">5</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2.321928</span>
<span class="lineno">10 </span><span class="o">></span> </code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">log_fun</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">base</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">...</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">log_fun</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span>
<span class="lineno"> 5 </span><span class="mf">2.321928094887362</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">log_fun</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="n">base</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="lineno"> 7 </span><span class="mf">2.321928094887362</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">log_fun</span><span class="p">(</span><span class="n">base</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="lineno"> 9 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span>
<span class="lineno">10 </span><span class="ne">SyntaxError</span><span class="p">:</span> <span class="n">positional</span> <span class="n">argument</span> <span class="n">follows</span> <span class="n">keyword</span> <span class="n">argument</span></code></pre></figure></div>
</div>
<p>In Python we have to put the arguments with default values to the end, which is not required in R. However, from readability perspective, it is better always to put them to the end. You may have noticed the error message above about positional argument. In Python there are two types of arguments, i.e., positional arguments and keyword arguments. Simply speaking, a keyword argument must be preceded by an identifier, e.g., base in the example above. And positional arguments refer to non-keyword arguments.</p>
<h2 id="control">Control flows</h2>
<p>To implement a complex logic in R/Python, we may need control flows.</p>
<h3>If/else</h3>
<p>Let’s define a function to return the absolute value of input.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> fun3<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span>
<span class="lineno"> 2 </span><span class="o">+</span> <span class="kr">if</span> <span class="p">(</span>x<span class="o">>=</span><span class="m">0</span><span class="p">){</span>
<span class="lineno"> 3 </span><span class="o">+</span> <span class="kr">return</span><span class="p">(</span>x<span class="p">)}</span>
<span class="lineno"> 4 </span><span class="o">+</span> <span class="kp">else</span><span class="p">{</span>
<span class="lineno"> 5 </span><span class="o">+</span> <span class="kr">return</span><span class="p">(</span><span class="o">-</span>x<span class="p">)}</span>
<span class="lineno"> 6 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno"> 7 </span><span class="o">></span> fun3<span class="p">(</span><span class="m">2.5</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2.5</span>
<span class="lineno"> 9 </span><span class="o">></span> fun3<span class="p">(</span><span class="m">-2.5</span><span class="p">)</span>
<span class="lineno">10 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2.5</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">fun3</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="k">if</span> <span class="n">x</span><span class="o">>=</span><span class="mi">0</span><span class="p">:</span>
<span class="lineno"> 3 </span><span class="o">...</span> <span class="k">return</span> <span class="n">x</span>
<span class="lineno"> 4 </span><span class="o">...</span> <span class="k">else</span><span class="p">:</span>
<span class="lineno"> 5 </span><span class="o">...</span> <span class="k">return</span> <span class="o">-</span><span class="n">x</span>
<span class="lineno"> 6 </span><span class="o">...</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="n">fun3</span><span class="p">(</span><span class="mf">2.5</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="mf">2.5</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">fun3</span><span class="p">(</span><span class="o">-</span><span class="mf">2.5</span><span class="p">)</span>
<span class="lineno">10 </span><span class="mf">2.5</span></code></pre></figure></div>
</div>
<p>The code snippet above shows how to use <code>if/else</code> in R/Python. The subtle difference between R and Python is that the condition after <code>if</code> must be embraced by parenthesis in R but it is optional in Python.</p>
<p>We can also put <code>if</code> after <code>else</code>. But in Python, we use <code>elif</code> as a shortcut.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> fun4<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span>
<span class="lineno"> 2 </span><span class="o">+</span> <span class="kr">if</span> <span class="p">(</span>x<span class="o">==</span><span class="m">0</span><span class="p">){</span>
<span class="lineno"> 3 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span><span class="s">'zero'</span><span class="p">)}</span>
<span class="lineno"> 4 </span><span class="o">+</span> <span class="kr">else</span> <span class="kr">if</span> <span class="p">(</span>x<span class="o">></span><span class="m">0</span><span class="p">){</span>
<span class="lineno"> 5 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span><span class="s">'positive'</span><span class="p">)}</span>
<span class="lineno"> 6 </span><span class="o">+</span> <span class="kp">else</span><span class="p">{</span>
<span class="lineno"> 7 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span><span class="s">'negative'</span><span class="p">)}</span>
<span class="lineno"> 8 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno"> 9 </span><span class="o">></span> fun4<span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="lineno">10 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"zero"</span>
<span class="lineno">11 </span><span class="o">></span> fun4<span class="p">(</span><span class="m">1</span><span class="p">)</span>
<span class="lineno">12 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"positive"</span>
<span class="lineno">13 </span><span class="o">></span> fun4<span class="p">(</span><span class="m">-1</span><span class="p">)</span>
<span class="lineno">14 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"negative"</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">fun4</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="k">if</span> <span class="n">x</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
<span class="lineno"> 3 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'zero'</span><span class="p">)</span>
<span class="lineno"> 4 </span><span class="o">...</span> <span class="k">elif</span> <span class="n">x</span><span class="o">></span><span class="mi">0</span><span class="p">:</span>
<span class="lineno"> 5 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'positive'</span><span class="p">)</span>
<span class="lineno"> 6 </span><span class="o">...</span> <span class="k">else</span><span class="p">:</span>
<span class="lineno"> 7 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s1">'negative'</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="o">...</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">fun4</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="lineno">10 </span><span class="n">zero</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">fun4</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="lineno">12 </span><span class="n">positive</span>
<span class="lineno">13 </span><span class="o">>>></span> <span class="n">fun4</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="lineno">14 </span><span class="n">negative</span></code></pre></figure></div>
</div>
<h3>For loop</h3>
<p>Similar to the usage of <code>if</code> in R, we also have to use parenthesis after the keyword <code>for</code> in R. But in Python there should be no parenthesis after <code>for</code>.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">){</span><span class="kp">print</span><span class="p">(</span>i<span class="p">)}</span>
<span class="lineno">2 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno">4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">3</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">):</span><span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">...</span>
<span class="lineno">3 </span><span class="mi">1</span>
<span class="lineno">4 </span><span class="mi">2</span>
<span class="lineno">5 </span><span class="mi">3</span></code></pre></figure></div>
</div>
<p>There is something more interesting than the <code>for loop</code> itself in the snippets above.<br />
In the R code, the expression <code>1:3</code> creates a vector with elements 1,2 and 3. In the Python code, we use the <code>range()</code> function for the first time. Let’s have a look at the type of them.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">)</span>
<span class="lineno">2 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"integer"</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="lineno">2 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">range</span><span class="s1">'></span></code></pre></figure></div>
</div>
<p><code>range()</code> function returns a <code>range</code> type object, which represents an immutable sequence of numbers. <code>range()</code> function can take three arguments, i.e., <br />
<code>range(start, stop, step)</code>. However, <code>start</code> and <code>step</code> are both optional. It’s critical to keep in mind that the <code>stop</code> argument that defines the upper limit of the sequence is exclusive. And that is why in order to loop through 1 to 3 we have to pass 4 as the <code>stop</code> argument to <code>range()</code> function. The <code>step</code> argument specifies how much to increase from one number to the next.<br />
The default values of <code>start</code> and <code>step</code> are 0 and 1, respectively.</p>
<h3>While loop</h3>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> i<span class="o">=</span><span class="m">1</span>
<span class="lineno">2 </span><span class="o">></span> <span class="kr">while</span> <span class="p">(</span>i<span class="o"><=</span><span class="m">3</span><span class="p">){</span>
<span class="lineno">3 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span>i<span class="p">)</span>
<span class="lineno">4 </span><span class="o">+</span> i<span class="o">=</span>i<span class="m">+1</span>
<span class="lineno">5 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno">6 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno">7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno">8 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">3</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="k">while</span> <span class="n">i</span><span class="o"><=</span><span class="mi">3</span><span class="p">:</span>
<span class="lineno">3 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="lineno">4 </span><span class="o">...</span> <span class="n">i</span><span class="o">+=</span><span class="mi">1</span>
<span class="lineno">5 </span><span class="o">...</span>
<span class="lineno">6 </span><span class="mi">1</span>
<span class="lineno">7 </span><span class="mi">2</span>
<span class="lineno">8 </span><span class="mi">3</span></code></pre></figure></div>
</div>
<p>You may have noticed that in Python we can do <code>i+=1</code> to add 1 to <code>i</code>, which is not feasible in R by default. Both for loop and while loop can be nested.</p>
<h3>Break/continue</h3>
<p>Break/continue helps if we want to break the for/while loop earlier, or to skip a specific iteration. In R, the keyword for continue is called <code>next</code>, in contrast to <code>continue</code> in Python. The difference between <code>break</code> and <code>continue</code> is that calling <code>break</code> would exit the innermost loop (when there are nested loops, only the innermost loop is affected); while calling <code>continue</code> would just skip the current iteration and continue the loop if not finished.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> <span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">){</span>
<span class="lineno"> 2 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span>i<span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">+</span> <span class="kr">if</span> <span class="p">(</span>i<span class="o">==</span><span class="m">1</span><span class="p">)</span> <span class="kr">break</span>
<span class="lineno"> 4 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno"> 5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno"> 6 </span><span class="o">></span> <span class="kr">for</span> <span class="p">(</span>i <span class="kr">in</span> <span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">){</span>
<span class="lineno"> 7 </span><span class="o">+</span> <span class="kr">if</span> <span class="p">(</span>i<span class="o">==</span><span class="m">2</span><span class="p">){</span><span class="kr">next</span><span class="p">}</span>
<span class="lineno"> 8 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span>i<span class="p">)</span>
<span class="lineno"> 9 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno">10 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno">11 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">3</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">...</span> <span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">1</span><span class="p">:</span> <span class="k">break</span>
<span class="lineno"> 4 </span><span class="o">...</span>
<span class="lineno"> 5 </span><span class="mi">1</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">):</span>
<span class="lineno"> 7 </span><span class="o">...</span> <span class="k">if</span> <span class="n">i</span><span class="o">==</span><span class="mi">2</span><span class="p">:</span> <span class="k">continue</span>
<span class="lineno"> 8 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="o">...</span>
<span class="lineno">10 </span><span class="mi">1</span>
<span class="lineno">11 </span><span class="mi">3</span></code></pre></figure></div>
</div>
<h2 id="builtindata">Some built-in Data Structures</h2>
<p>In the previous sections, we haven’t seen much difference between R and Python. However, regarding the built-in data structures, there are some significant differences we would see in this section.</p>
<h3>vector in R and list in Python</h3>
<p>In R, we can use function <code>c()</code> to create a vector; A vector is a sequence of elements with the same type. In Python, we can use <code>[]</code> to create a list, which is also a sequence of elements. But the elements in a list don’t need to have the same type. To get the number of elements in a vector in R, we use the function <code>length()</code>; and to get the number of elements in a list in Python, we use the function <code>len()</code>.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="o">></span> y<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">'hello'</span><span class="p">,</span><span class="s">'world'</span><span class="p">,</span><span class="s">'!'</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">></span> x
<span class="lineno"> 4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">5</span> <span class="m">6</span>
<span class="lineno"> 5 </span><span class="o">></span> y
<span class="lineno"> 6 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"hello"</span> <span class="s">"world"</span> <span class="s">"!"</span>
<span class="lineno"> 7 </span><span class="o">></span> <span class="kp">length</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">4</span>
<span class="lineno"> 9 </span><span class="o">></span> z<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="s">'hello'</span><span class="p">)</span>
<span class="lineno">10 </span><span class="o">></span> z
<span class="lineno">11 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"1"</span> <span class="s">"hello"</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="s1">'hello'</span><span class="p">,</span><span class="s1">'world'</span><span class="p">,</span><span class="s1">'!'</span><span class="p">]</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 4 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno"> 6 </span><span class="p">[</span><span class="s1">'hello'</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">,</span> <span class="s1">'!'</span><span class="p">]</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="mi">4</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">z</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="s1">'hello'</span><span class="p">]</span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">z</span>
<span class="lineno">11 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'hello'</span><span class="p">]</span></code></pre></figure></div>
</div>
<p>In the code snippet above, the first element in the variable <code>z</code> in R is coerced from 1 (numeric) to “1” (character) since the elements must have the same type.</p>
<p>To access a specific element from a vector or list, we could use <code>[]</code>. In R, sequence types are indexed beginning with the one subscript; In contrast, sequence types in Python are indexed beginning with the zero subscript.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x<span class="p">[</span><span class="m">1</span><span class="p">]</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="lineno">3 </span><span class="mi">2</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="lineno">5 </span><span class="mi">1</span></code></pre></figure></div>
</div>
<p>What if the index to access is out of boundary?</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="o">></span> x<span class="p">[</span><span class="m">-1</span><span class="p">]</span>
<span class="lineno"> 3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span> <span class="m">5</span> <span class="m">6</span>
<span class="lineno"> 4 </span><span class="o">></span> x<span class="p">[</span><span class="m">0</span><span class="p">]</span>
<span class="lineno"> 5 </span><span class="kt">numeric</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="lineno"> 6 </span><span class="o">></span> x<span class="p">[</span><span class="kp">length</span><span class="p">(</span>x<span class="p">)</span><span class="m">+1</span><span class="p">]</span>
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="kc">NA</span>
<span class="lineno"> 8 </span><span class="o">></span> <span class="kp">length</span><span class="p">(</span><span class="kt">numeric</span><span class="p">(</span><span class="m">0</span><span class="p">))</span>
<span class="lineno"> 9 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span>
<span class="lineno">10 </span><span class="o">></span> <span class="kp">length</span><span class="p">(</span><span class="kc">NA</span><span class="p">)</span>
<span class="lineno">11 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="lineno">3 </span><span class="mi">6</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="lineno">5 </span><span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno">6 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno">7 </span><span class="ne">IndexError</span><span class="p">:</span> <span class="nb">list</span> <span class="n">index</span> <span class="n">out</span> <span class="n">of</span> <span class="nb">range</span></code></pre></figure></div>
</div>
<p>In Python, negative index number means indexing from the end of the list. Thus, <code>x[-1]</code> points to the last element and <code>x[-2]</code> points to the second-last element of the list. But R doesn’t support indexing with negative number in the same way as Python. Specifically, in R <code>x[-index]</code> returns a new vector with <code>x[index]</code> excluded.</p>
<p>When we try to access with an index out of boundary, Python would throw an <code>IndexError</code>. The behavior of R when indexing out of boundary is more interesting. First, when we try to access <code>x[0]</code> in R we get a <code>numeric(0)</code> whose length is also 0. Since its length is 0, <code>numeric(0)</code> can be interpreted as an empty numeric vector. When we try to access <code>x[length(x)+1]</code> we get a <code>NA</code>. In R, there are also <code>NaN</code> and <code>NULL</code>.</p>
<p><code>NaN</code> means “Not A Number” and it can be verified by checking its type – “double”. <code>0/0</code> would result in a <code>NaN</code> in R. <code>NA</code> in R generally represents missing values. And <code>NULL</code> represents a <span class="caps">NULL</span> (empty) object. To check if a value is <code>NA</code>, <code>NaN</code> or <code>NULL</code>, we can use <code>is.na()</code>, <code>is.nan()</code> or <code>is.null</code>, respectively.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span><span class="kc">NA</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"logical"</span>
<span class="lineno"> 3 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span><span class="kc">NaN</span><span class="p">)</span>
<span class="lineno"> 4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"double"</span>
<span class="lineno"> 5 </span><span class="o">></span> <span class="kp">typeof</span><span class="p">(</span><span class="kc">NULL</span><span class="p">)</span>
<span class="lineno"> 6 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"NULL"</span>
<span class="lineno"> 7 </span><span class="o">></span> <span class="kp">is.na</span><span class="p">(</span><span class="kc">NA</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="kc">TRUE</span>
<span class="lineno"> 9 </span><span class="o">></span> <span class="kp">is.null</span><span class="p">(</span><span class="kc">NULL</span><span class="p">)</span>
<span class="lineno">10 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="kc">TRUE</span>
<span class="lineno">11 </span><span class="o">></span> <span class="kp">is.nan</span><span class="p">(</span><span class="kc">NaN</span><span class="p">)</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">NoneType</span><span class="s1">'></span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="kc">None</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="lineno">4 </span><span class="kc">True</span>
<span class="lineno">5 </span><span class="o">>>></span> <span class="mi">1</span> <span class="o">==</span> <span class="kc">None</span>
<span class="lineno">6 </span><span class="kc">False</span></code></pre></figure></div>
</div>
<p>In Python, there is no built-in <code>NA</code> or <code>NaN</code>. The counterpart of <code>NULL</code> in Python is <code>None</code>. In Python, we can use the <code>is</code> keyword or == to check if a value is equal to <code>None</code>.</p>
<p>From the code snippet above, we also notice that in R the boolean type value is written as “<span class="caps">TRUE</span>/<span class="caps">FALSE</span>”, compared with “True/False” in Python. Although in R “<span class="caps">TRUE</span>/<span class="caps">FALSE</span>” can also be abbreviated as “T/F”, I don’t recommend to use the abbreviation.</p>
<p>There is one interesting fact that we can’t add a <code>NULL</code> to a vector in R, but it is feasible to add a <code>None</code> to a list in Python.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="kc">NA</span><span class="p">,</span><span class="kc">NaN</span><span class="p">,</span><span class="kc">NULL</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="kc">NA</span> <span class="kc">NaN</span>
<span class="lineno">4 </span><span class="o">></span> <span class="kp">length</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno">5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">3</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="kc">None</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="kc">None</span><span class="p">]</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno">5 </span><span class="mi">2</span></code></pre></figure></div>
</div>
<p>Sometimes we want to create a vector/list with replicated elements, for example, a vector/list with all elements equal to 0.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kp">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">10</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x
<span class="lineno">3 </span> <span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span> <span class="m">0</span>
<span class="lineno">4 </span><span class="o">></span> y<span class="o">=</span><span class="kp">rep</span><span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span> <span class="m">5</span><span class="p">)</span>
<span class="lineno">5 </span><span class="o">></span> y
<span class="lineno">6 </span> <span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span> <span class="m">1</span> <span class="m">0</span> <span class="m">1</span> <span class="m">0</span> <span class="m">1</span> <span class="m">0</span> <span class="m">1</span> <span class="m">0</span> <span class="m">1</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="mi">10</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span><span class="o">*</span><span class="mi">5</span>
<span class="lineno">5 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno">6 </span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span></code></pre></figure></div>
</div>
<p>When we use the <code>*</code> operator to make replicates of a list, there is one caveat – if the element inside the list is mutable then the replicated elements point to the same memory address. As a consequence, if one element is mutated other elements are also affected.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="c1"># x is a list which is mutable</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="n">x</span><span class="p">]</span><span class="o">*</span><span class="mi">5</span> <span class="c1"># each element in y points to x</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno"> 4 </span><span class="p">[[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">y</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">=</span><span class="mi">2</span> <span class="c1"># we point y[2] to 2 but x is not mutated</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno"> 7 </span><span class="p">[[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">=-</span><span class="mi">1</span> <span class="c1"># we mutate x by changing y[1][0] from 0 to -1</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno">10 </span><span class="p">[[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">12 </span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span></code></pre></figure><p>How to get a list with replicated elements but pointing to different memory addresses?</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="n">x</span><span class="p">[:]</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">)]</span> <span class="c1"># [:] makes a copy of the list x; another solution is [list(x) for _ in range(5)]</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno">4 </span><span class="p">[[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="lineno">5 </span><span class="o">>>></span> <span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">2</span>
<span class="lineno">6 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno">7 </span><span class="p">[[</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">]]</span></code></pre></figure><p>Beside accessing a specific element from a vector/list, we may also need to do slicing, i.e., to select a subset of the vector/list. There are two basic approaches of slicing:</p>
<ul>
<li>Integer-based</li>
</ul>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x<span class="p">[</span><span class="m">2</span><span class="o">:</span><span class="m">4</span><span class="p">]</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span> <span class="m">3</span> <span class="m">4</span>
<span class="lineno">4 </span><span class="o">></span> x<span class="p">[</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">5</span><span class="p">)]</span> <span class="c1"># a vector of indices</span>
<span class="lineno">5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">5</span>
<span class="lineno">6 </span><span class="o">></span> x<span class="p">[</span><span class="kp">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">2</span><span class="p">)]</span> <span class="c1"># seq creates a vector to be used as indices</span>
<span class="lineno">7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">3</span> <span class="m">5</span></code></pre></figure><language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">4</span><span class="p">]</span> <span class="c1"># x[start:end] start is inclusive but end is exclusive</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">:</span><span class="mi">2</span><span class="p">]</span> <span class="c1"># x[start:end:step]</span>
<span class="lineno">5 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span></code></pre></figure><p>The code snippet above uses hash character <code>#</code> for comments in both R and Python. Everything after <code>#</code> on the same line would be treated as comment (not executable). In the R code, we also used the function <code>seq()</code> to create a vector. When I see a function that I haven’t seen before, I might either google it or use the builtin helper mechanism. Specifically, in R use <code>?</code> and in Python use <code>help()</code>.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> <span class="o">?</span><span class="kp">seq</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">help</span><span class="p">(</span><span class="nb">print</span><span class="p">)</span></code></pre></figure></div>
</div>
<ul>
<li>Condition-based</li>
</ul>
<p>Condition-based slicing means to select a subset of the elements which satisfy certain conditions. In R, it is quite straightforward by using a boolean vector whose length is the same as the vector to slice.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="m">6</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x<span class="p">[</span>x <span class="o">%%</span> <span class="m">2</span><span class="o">==</span><span class="m">1</span><span class="p">]</span> <span class="c1"># %% is the modulo operator in R; we select the odd elements</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">5</span> <span class="m">5</span>
<span class="lineno">4 </span><span class="o">></span> x <span class="o">%%</span> <span class="m">2</span><span class="o">==</span><span class="m">1</span> <span class="c1"># results in a boolean vector with the same length as x</span>
<span class="lineno">5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="kc">TRUE</span> <span class="kc">FALSE</span> <span class="kc">TRUE</span> <span class="kc">TRUE</span> <span class="kc">FALSE</span> <span class="kc">FALSE</span> </code></pre></figure><p>The condition-based slicing in Python is quite different from that in R. The prerequisite is list comprehension which provides a concise way to create new lists in Python. For example, let’s create a list of squares of another list.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="p">[</span><span class="n">e</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">x</span><span class="p">]</span> <span class="c1"># ** is the exponent operator, i.e., x**y means x to the power of y</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">36</span><span class="p">,</span> <span class="mi">36</span><span class="p">]</span></code></pre></figure><p>We can also use <code>if</code> statement with list comprehension to filter a list to achieve list slicing.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">x</span> <span class="k">if</span> <span class="n">e</span><span class="o">%</span><span class="mi">2</span><span class="o">==</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># % is the modulo operator in Python</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span></code></pre></figure><p>It is also common to use <code>if/else</code> with list comprehension to achieve more complex operations. For example, given a list x, let’s create a new list y so that the non-negative elements in x are squared and the negative elements are replaced by 0s.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="p">[</span><span class="n">e</span><span class="o">**</span><span class="mi">2</span> <span class="k">if</span> <span class="n">e</span><span class="o">>=</span><span class="mi">0</span> <span class="k">else</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">x</span><span class="p">]</span>
<span class="lineno">3 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">25</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span></code></pre></figure><p>The example above shows the power of list comprehension. To use <code>if</code> with list comprehension, the <code>if</code> statement should be placed in the end after the <code>for</code> loop statement; but to use <code>if/else</code> with list comprehension, the <code>if/else</code> statement should be placed before the <code>for</code> loop statement.</p>
<p>We can also modify the value of an element in a vector/list variable.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=</span><span class="m">-1</span>
<span class="lineno">3 </span><span class="o">></span> x
<span class="lineno">4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">-1</span> <span class="m">2</span> <span class="m">3</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=-</span><span class="mi">1</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">4 </span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span></code></pre></figure></div>
</div>
<p>Two or multiple vectors/lists can be concatenated easily.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> y<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">)</span>
<span class="lineno">3 </span><span class="o">></span> z<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">5</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="m">7</span><span class="p">,</span><span class="m">8</span><span class="p">)</span>
<span class="lineno">4 </span><span class="o">></span> <span class="kt">c</span><span class="p">(</span>x<span class="p">,</span>y<span class="p">,</span>z<span class="p">)</span>
<span class="lineno">5 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">3</span> <span class="m">4</span> <span class="m">5</span> <span class="m">6</span> <span class="m">7</span> <span class="m">8</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">z</span><span class="o">=</span><span class="p">[</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="mi">7</span><span class="p">,</span><span class="mi">8</span><span class="p">]</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="o">+</span><span class="n">z</span>
<span class="lineno">5 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">]</span></code></pre></figure></div>
</div>
<p>As the list structure in Python is mutable, there are many things we can do with list.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">x</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span> <span class="c1"># append a single value to the list x</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 4 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="p">[</span><span class="mi">5</span><span class="p">,</span><span class="mi">6</span><span class="p">]</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">x</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="c1"># extend list y to x</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">last</span><span class="o">=</span><span class="n">x</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> <span class="c1"># pop the last elememt from x</span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">last</span>
<span class="lineno">11 </span><span class="mi">6</span>
<span class="lineno">12 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">13 </span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span></code></pre></figure><p>I like the list structure in Python much more than the vector structure in R. list in Python has a lot more useful features which can be found from the python official documentation<sup class="footnote" id="fnr5"><a href="#fn5">5</a></sup>.</p>
<h3>array</h3>
<p>Array is one of the most important data structures in scientific programming. In R, there is also an object type “matrix”, but according to my own experience, we can almost ignore its existence and use array instead. We can definitely use list as array in Python, but lots of linear algebra operations are not supported for the list type. Fortunately, there is a Python package <code>numpy</code> off the shelf.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="m">1</span><span class="o">:</span><span class="m">12</span>
<span class="lineno"> 2 </span><span class="o">></span> array1<span class="o">=</span><span class="kt">array</span><span class="p">(</span>x<span class="p">,</span><span class="kt">c</span><span class="p">(</span><span class="m">4</span><span class="p">,</span><span class="m">3</span><span class="p">))</span> <span class="c1"># convert vector x to a 4 rows * 3 cols array</span>
<span class="lineno"> 3 </span><span class="o">></span> array1
<span class="lineno"> 4 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span> <span class="p">[,</span><span class="m">3</span><span class="p">]</span>
<span class="lineno"> 5 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">5</span> <span class="m">9</span>
<span class="lineno"> 6 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">2</span> <span class="m">6</span> <span class="m">10</span>
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">3</span> <span class="m">7</span> <span class="m">11</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="m">4</span><span class="p">,]</span> <span class="m">4</span> <span class="m">8</span> <span class="m">12</span>
<span class="lineno"> 9 </span><span class="o">></span> y<span class="o">=</span><span class="m">1</span><span class="o">:</span><span class="m">6</span>
<span class="lineno">10 </span><span class="o">></span> array2<span class="o">=</span><span class="kt">array</span><span class="p">(</span>y<span class="p">,</span><span class="kt">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">2</span><span class="p">))</span> <span class="c1"># convert vector y to a 3 rows * 2 cols array</span>
<span class="lineno">11 </span><span class="o">></span> array2
<span class="lineno">12 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span>
<span class="lineno">13 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">4</span>
<span class="lineno">14 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">2</span> <span class="m">5</span>
<span class="lineno">15 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">3</span> <span class="m">6</span>
<span class="lineno">16 </span><span class="o">></span> array3 <span class="o">=</span> array1 <span class="o">%*%</span> array2 <span class="c1"># %*% is the matrix multiplication operator</span>
<span class="lineno">17 </span><span class="o">></span> array3
<span class="lineno">18 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span>
<span class="lineno">19 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">38</span> <span class="m">83</span>
<span class="lineno">20 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">44</span> <span class="m">98</span>
<span class="lineno">21 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">50</span> <span class="m">113</span>
<span class="lineno">22 </span><span class="p">[</span><span class="m">4</span><span class="p">,]</span> <span class="m">56</span> <span class="m">128</span>
<span class="lineno">23 </span><span class="o">></span> <span class="kp">dim</span><span class="p">(</span>array3<span class="p">)</span> <span class="c1"># get the dimension of array3</span>
<span class="lineno">24 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">4</span> <span class="m">2</span></code></pre></figure><language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="c1"># we import the numpy module and alias it as np</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">array1</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">13</span><span class="p">)),(</span><span class="mi">4</span><span class="p">,</span><span class="mi">3</span><span class="p">))</span> <span class="c1"># convert a list to a 2d np.array</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">array1</span>
<span class="lineno"> 4 </span><span class="n">array</span><span class="p">([[</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
<span class="lineno"> 5 </span> <span class="p">[</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">],</span>
<span class="lineno"> 6 </span> <span class="p">[</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span>
<span class="lineno"> 7 </span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">]])</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">array1</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">numpy</span><span class="o">.</span><span class="n">ndarray</span><span class="s1">'></span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">array2</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)),(</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">))</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">array2</span>
<span class="lineno">12 </span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="lineno">13 </span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span>
<span class="lineno">14 </span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]])</span>
<span class="lineno">15 </span><span class="o">>>></span> <span class="n">array3</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">array1</span><span class="p">,</span><span class="n">array2</span><span class="p">)</span> <span class="c1"># matrix multiplication using np.dot()</span>
<span class="lineno">16 </span><span class="o">>>></span> <span class="n">array3</span>
<span class="lineno">17 </span><span class="n">array</span><span class="p">([[</span> <span class="mi">22</span><span class="p">,</span> <span class="mi">28</span><span class="p">],</span>
<span class="lineno">18 </span> <span class="p">[</span> <span class="mi">49</span><span class="p">,</span> <span class="mi">64</span><span class="p">],</span>
<span class="lineno">19 </span> <span class="p">[</span> <span class="mi">76</span><span class="p">,</span> <span class="mi">100</span><span class="p">],</span>
<span class="lineno">20 </span> <span class="p">[</span><span class="mi">103</span><span class="p">,</span> <span class="mi">136</span><span class="p">]])</span>
<span class="lineno">21 </span><span class="o">>>></span> <span class="n">array3</span><span class="o">.</span><span class="n">shape</span> <span class="c1"># get the shape(dimension) of array3</span>
<span class="lineno">22 </span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span></code></pre></figure><p>You may have noticed that the results of the R code snippet and Python code snippet are different. The reason is that in R the conversion from a vector to an array is by-column; but in <code>numpy</code> the reshape from a list to an 2D <code>numpy.array</code> is by-row. There are two ways to reshape a list to a 2D numpy.array by column.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">array1</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">13</span><span class="p">)),(</span><span class="mi">4</span><span class="p">,</span><span class="mi">3</span><span class="p">),</span><span class="n">order</span><span class="o">=</span><span class="s1">'F'</span><span class="p">)</span> <span class="c1"># use order='F'</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">array1</span>
<span class="lineno"> 3 </span><span class="n">array</span><span class="p">([[</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span>
<span class="lineno"> 4 </span> <span class="p">[</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span>
<span class="lineno"> 5 </span> <span class="p">[</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">11</span><span class="p">],</span>
<span class="lineno"> 6 </span> <span class="p">[</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">12</span><span class="p">]])</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="n">array2</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)),(</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">))</span><span class="o">.</span><span class="n">T</span> <span class="c1"># use .T to transpose an array</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">array2</span>
<span class="lineno"> 9 </span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span>
<span class="lineno">10 </span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span>
<span class="lineno">11 </span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">]])</span>
<span class="lineno">12 </span><span class="o">>>></span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">array1</span><span class="p">,</span><span class="n">array2</span><span class="p">)</span> <span class="c1"># now we get the same result as using R</span>
<span class="lineno">13 </span><span class="n">array</span><span class="p">([[</span> <span class="mi">38</span><span class="p">,</span> <span class="mi">83</span><span class="p">],</span>
<span class="lineno">14 </span> <span class="p">[</span> <span class="mi">44</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span>
<span class="lineno">15 </span> <span class="p">[</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">113</span><span class="p">],</span>
<span class="lineno">16 </span> <span class="p">[</span> <span class="mi">56</span><span class="p">,</span> <span class="mi">128</span><span class="p">]])</span></code></pre></figure><p>To learn more about numpy, the official website<sup class="footnote" id="fnr6"><a href="#fn6">6</a></sup> has great documentation/tutorials.</p>
<h3>broadcasting</h3>
<p>The term broadcasting describes how arrays with different shapes are handled during arithmetic operations. A simple example of broadcasting is given below.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> x<span class="m">+1</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span> <span class="m">3</span> <span class="m">4</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span>
<span class="lineno">4 </span><span class="n">array</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span></code></pre></figure></div>
</div>
<p>However, the broadcasting rules in R and Python are not exactly the same.</p>
<div class="codewrapper">
<div class="codeleft">
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x <span class="o">=</span> <span class="kt">array</span><span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">6</span><span class="p">),</span> <span class="kt">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">2</span><span class="p">))</span>
<span class="lineno"> 2 </span><span class="o">></span> y <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">></span> z <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">)</span>
<span class="lineno"> 4 </span><span class="c1"># point-wise multiplication</span>
<span class="lineno"> 5 </span><span class="o">></span> x <span class="o">*</span> y
<span class="lineno"> 6 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span>
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">4</span>
<span class="lineno"> 8 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">4</span> <span class="m">10</span>
<span class="lineno"> 9 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">9</span> <span class="m">18</span>
<span class="lineno">10 </span><span class="o">></span> x<span class="o">*</span>z
<span class="lineno">11 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span>
<span class="lineno">12 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">8</span>
<span class="lineno">13 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">4</span> <span class="m">5</span>
<span class="lineno">14 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">3</span> <span class="m">12</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]])</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">z</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="c1"># point-wise multiplication</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span>
<span class="lineno"> 7 </span><span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno"> 8 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno"> 9 </span><span class="ne">ValueError</span><span class="p">:</span> <span class="n">operands</span> <span class="n">could</span> <span class="ow">not</span> <span class="n">be</span> <span class="n">broadcast</span> <span class="n">together</span> <span class="k">with</span> <span class="n">shapes</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="p">(</span><span class="mi">3</span><span class="p">,)</span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">x</span> <span class="o">*</span> <span class="n">z</span>
<span class="lineno">11 </span><span class="n">array</span><span class="p">([[</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span>
<span class="lineno">12 </span> <span class="p">[</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span>
<span class="lineno">13 </span> <span class="p">[</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">]])</span></code></pre></figure></div>
</div>
<p>From the R code, we see the broadcasting in R is like recycling along with the column. In Python, when the two arrays have different dimensions, the one with fewer dimensions is padded with ones on its leading side. According to this rule, when we do <code>x * y</code>, the dimension of <code>x</code> is (3, 2) but the dimension of <code>y</code> is 3. Thus, the dimension of <code>y</code> is padded to (1, 3), which explains what happens when <code>x * y</code>.</p>
<h3>list in R and dictionary in Python</h3>
<p>Yes, in R there is also an object type called list. The major difference between a vector and a list in R is that a list could contain different types of elements. list in R supports integer-based accessing using <code>[[]]</code> (compared to <code>[]</code> for vector).</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">list</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="s">'hello world!'</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="o">></span> x
<span class="lineno"> 3 </span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span>
<span class="lineno"> 4 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno"> 5 </span>
<span class="lineno"> 6 </span><span class="p">[[</span><span class="m">2</span><span class="p">]]</span>
<span class="lineno"> 7 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"hello world!"</span>
<span class="lineno"> 8 </span>
<span class="lineno"> 9 </span><span class="o">></span> x<span class="p">[[</span><span class="m">1</span><span class="p">]]</span>
<span class="lineno">10 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span>
<span class="lineno">11 </span><span class="o">></span> x<span class="p">[[</span><span class="m">2</span><span class="p">]]</span>
<span class="lineno">12 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"hello world!"</span>
<span class="lineno">13 </span><span class="o">></span> <span class="kp">length</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno">14 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span></code></pre></figure><p>list in R could be named and support accessing by name via either <code>[[]]</code> or <code>$</code> operator. But vector in R can also be named and support accessing by name.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">'a'</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="s">'b'</span><span class="o">=</span><span class="m">2</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="o">></span> <span class="kp">names</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno"> 3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"a"</span> <span class="s">"b"</span>
<span class="lineno"> 4 </span><span class="o">></span> x<span class="p">[</span><span class="s">'b'</span><span class="p">]</span>
<span class="lineno"> 5 </span>b
<span class="lineno"> 6 </span><span class="m">2</span>
<span class="lineno"> 7 </span><span class="o">></span> l<span class="o">=</span><span class="kt">list</span><span class="p">(</span><span class="s">'a'</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="s">'b'</span><span class="o">=</span><span class="m">2</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="o">></span> l<span class="p">[[</span><span class="s">'b'</span><span class="p">]]</span>
<span class="lineno"> 9 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno">10 </span><span class="o">></span> l<span class="o">$</span>b
<span class="lineno">11 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">2</span>
<span class="lineno">12 </span><span class="o">></span> <span class="kp">names</span><span class="p">(</span>l<span class="p">)</span>
<span class="lineno">13 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"a"</span> <span class="s">"b"</span></code></pre></figure><p>However, elements in list in Python can’t be named as R. If we need the feature of accessing by name in Python, we can use the dictionary structure. If you used Java before, you may consider dictionary in Python as the counterpart of HashMap in Java. Essentially, a dictionary in Python is a collection of key:value pairs.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">{</span><span class="s1">'a'</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="s1">'b'</span><span class="p">:</span><span class="mi">2</span><span class="p">}</span> <span class="c1"># {key:value} pairs</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 3 </span><span class="p">{</span><span class="s1">'a'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'b'</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="s1">'a'</span><span class="p">]</span>
<span class="lineno"> 5 </span><span class="mi">1</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="s1">'b'</span><span class="p">]</span>
<span class="lineno"> 7 </span><span class="mi">2</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="c1"># number of key:value pairs</span>
<span class="lineno"> 9 </span><span class="mi">2</span>
<span class="lineno">10 </span><span class="o">>>></span> <span class="n">x</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span> <span class="c1"># remove the key 'a' and we get its value 1</span>
<span class="lineno">11 </span><span class="mi">1</span>
<span class="lineno">12 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">13 </span><span class="p">{</span><span class="s1">'b'</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span></code></pre></figure><p>Unlike dictionary in Python, list in R doesn’t support the <code>pop()</code> operation. Thus, in order to modify a list in R, a new one would be created explicitly or implicitly.</p>
<h3>data.frame</h3>
<p>data.frame is a built-in type in R for data manipulation. In Python, there is no such built-in data structure since Python is a more general-purpose programming language. The solution for data.frame in Python is the <code>pandas</code><sup class="footnote" id="fnr7"><a href="#fn7">7</a></sup> module.</p>
<p>Before we dive into data.frame, you may be curious why we need it? In other words, why can’t we just use vector, list, array/matrix and dictionary for all data manipulation tasks? I would say yes – data.frame is not a must-have feature for most of <span class="caps">ETL</span> (extraction, transformation and Load) operations. But data.frame provides a very intuitive way for us to understand the structured data set. A data.frame is usually flat with 2 dimensions, i.e., row and column. The row dimension is across multiple observations and the column dimension is across multiple attributes/features. If you are familiar with relational database, a data.frame can be viewed as a table.</p>
<p>Let’s see an example of using data.frame to represent employees’ information in a company.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> employee_df <span class="o">=</span> <span class="kt">data.frame</span><span class="p">(</span>name<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">"A"</span><span class="p">,</span> <span class="s">"B"</span><span class="p">,</span> <span class="s">"C"</span><span class="p">),</span>department<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">"Engineering"</span><span class="p">,</span><span class="s">"Operations"</span><span class="p">,</span><span class="s">"Sales"</span><span class="p">))</span>
<span class="lineno">2 </span><span class="o">></span> employee_df
<span class="lineno">3 </span> name department
<span class="lineno">4 </span><span class="m">1</span> A Engineering
<span class="lineno">5 </span><span class="m">2</span> B Operations
<span class="lineno">6 </span><span class="m">3</span> C Sales</code></pre></figure><language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="n">employee_df</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s1">'name'</span><span class="p">:[</span><span class="s1">'A'</span><span class="p">,</span><span class="s1">'B'</span><span class="p">,</span><span class="s1">'C'</span><span class="p">],</span><span class="s1">'department'</span><span class="p">:[</span><span class="s2">"Engineering"</span><span class="p">,</span><span class="s2">"Operations"</span><span class="p">,</span><span class="s2">"Sales"</span><span class="p">]})</span>
<span class="lineno">3 </span><span class="o">>>></span> <span class="n">employee_df</span>
<span class="lineno">4 </span> <span class="n">name</span> <span class="n">department</span>
<span class="lineno">5 </span><span class="mi">0</span> <span class="n">A</span> <span class="n">Engineering</span>
<span class="lineno">6 </span><span class="mi">1</span> <span class="n">B</span> <span class="n">Operations</span>
<span class="lineno">7 </span><span class="mi">2</span> <span class="n">C</span> <span class="n">Sales</span></code></pre></figure><p>There are quite a few ways to create data.frame. The most commonly used one is to create data.frame object from array/matrix. We may also need to convert a numeric data.frame to an array/matrix.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">array</span><span class="p">(</span>rnorm<span class="p">(</span><span class="m">12</span><span class="p">),</span><span class="kt">c</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">))</span>
<span class="lineno"> 2 </span><span class="o">></span> x
<span class="lineno"> 3 </span> <span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span> <span class="p">[,</span><span class="m">3</span><span class="p">]</span> <span class="p">[,</span><span class="m">4</span><span class="p">]</span>
<span class="lineno"> 4 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">-0.8101246</span> <span class="m">-0.8594136</span> <span class="m">-2.260810</span> <span class="m">0.5727590</span>
<span class="lineno"> 5 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">-0.9175476</span> <span class="m">0.1345982</span> <span class="m">1.067628</span> <span class="m">-0.7643533</span>
<span class="lineno"> 6 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">0.7865971</span> <span class="m">-1.9046711</span> <span class="m">-0.154928</span> <span class="m">-0.6807527</span>
<span class="lineno"> 7 </span><span class="o">></span> random_df<span class="o">=</span><span class="kp">as.data.frame</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno"> 8 </span><span class="o">></span> random_df
<span class="lineno"> 9 </span> V1 V2 V3 V4
<span class="lineno">10 </span><span class="m">1</span> <span class="m">-0.8101246</span> <span class="m">-0.8594136</span> <span class="m">-2.260810</span> <span class="m">0.5727590</span>
<span class="lineno">11 </span><span class="m">2</span> <span class="m">-0.9175476</span> <span class="m">0.1345982</span> <span class="m">1.067628</span> <span class="m">-0.7643533</span>
<span class="lineno">12 </span><span class="m">3</span> <span class="m">0.7865971</span> <span class="m">-1.9046711</span> <span class="m">-0.154928</span> <span class="m">-0.6807527</span>
<span class="lineno">13 </span><span class="o">></span> <span class="kp">data.matrix</span><span class="p">(</span>random_df<span class="p">)</span>
<span class="lineno">14 </span> V1 V2 V3 V4
<span class="lineno">15 </span><span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">-0.8101246</span> <span class="m">-0.8594136</span> <span class="m">-2.260810</span> <span class="m">0.5727590</span>
<span class="lineno">16 </span><span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">-0.9175476</span> <span class="m">0.1345982</span> <span class="m">1.067628</span> <span class="m">-0.7643533</span>
<span class="lineno">17 </span><span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">0.7865971</span> <span class="m">-1.9046711</span> <span class="m">-0.154928</span> <span class="m">-0.6807527</span></code></pre></figure><language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="lineno"> 3 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno"> 5 </span><span class="n">array</span><span class="p">([[</span><span class="o">-</span><span class="mf">0.54164878</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.14285267</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.39835535</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.81522719</span><span class="p">],</span>
<span class="lineno"> 6 </span> <span class="p">[</span> <span class="mf">0.01540508</span><span class="p">,</span> <span class="mf">0.63556266</span><span class="p">,</span> <span class="mf">0.16800583</span><span class="p">,</span> <span class="mf">0.17594448</span><span class="p">],</span>
<span class="lineno"> 7 </span> <span class="p">[</span><span class="o">-</span><span class="mf">1.21598262</span><span class="p">,</span> <span class="mf">0.52860817</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.61757696</span><span class="p">,</span> <span class="mf">0.18445057</span><span class="p">]])</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">random_df</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">random_df</span>
<span class="lineno">10 </span> <span class="mi">0</span> <span class="mi">1</span> <span class="mi">2</span> <span class="mi">3</span>
<span class="lineno">11 </span><span class="mi">0</span> <span class="o">-</span><span class="mf">0.541649</span> <span class="o">-</span><span class="mf">0.142853</span> <span class="o">-</span><span class="mf">0.398355</span> <span class="o">-</span><span class="mf">0.815227</span>
<span class="lineno">12 </span><span class="mi">1</span> <span class="mf">0.015405</span> <span class="mf">0.635563</span> <span class="mf">0.168006</span> <span class="mf">0.175944</span>
<span class="lineno">13 </span><span class="mi">2</span> <span class="o">-</span><span class="mf">1.215983</span> <span class="mf">0.528608</span> <span class="o">-</span><span class="mf">0.617577</span> <span class="mf">0.184451</span>
<span class="lineno">14 </span><span class="o">>>></span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">random_df</span><span class="p">)</span>
<span class="lineno">15 </span><span class="n">array</span><span class="p">([[</span><span class="o">-</span><span class="mf">0.54164878</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.14285267</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.39835535</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.81522719</span><span class="p">],</span>
<span class="lineno">16 </span> <span class="p">[</span> <span class="mf">0.01540508</span><span class="p">,</span> <span class="mf">0.63556266</span><span class="p">,</span> <span class="mf">0.16800583</span><span class="p">,</span> <span class="mf">0.17594448</span><span class="p">],</span>
<span class="lineno">17 </span> <span class="p">[</span><span class="o">-</span><span class="mf">1.21598262</span><span class="p">,</span> <span class="mf">0.52860817</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.61757696</span><span class="p">,</span> <span class="mf">0.18445057</span><span class="p">]])</span></code></pre></figure><p>In general, operations on an array/matrix is much faster than that on a data frame. In R, we may use the built-in function <code>data.matrix</code> to convert a data.frame to an array/matrix. In Python, we could use the function <code>asarray</code> in <code>numpy</code> module.</p>
<p>Although data.frame is a built-in type, it is not quite efficient for many operations. I would suggest to use data.table<sup class="footnote" id="fnr8"><a href="#fn8">8</a></sup> whenever possible. dplyr<sup class="footnote" id="fnr9"><a href="#fn9">9</a></sup> is also a very popular package in R for data manipulation. Many good online resources are available online to learn data.table and pandas.</p>
<h2 id="revisit">Revisit of variables</h2>
<p>We have talked about variables and functions so far. When a function has a name, its name is also a valid variable. After all, what is a variable?</p>
<p>In mathematics, a variable is a symbol that represents an element, and we do not care whether we conceptualize a variable in our mind, or write it down on a paper. However, in programming a variable is not only a symbol. We have to understand that a variable is a name given to a memory location in computer systems. When we run <code>x=2</code> in R or Python, somewhere in memory has the value 2, and the variable (name) points to this memory address. If we further run <code>y=x</code>, the variable <code>y</code> points to the same memory location pointed to by <code>x</code>. What if we run <code>x=3</code>? It doesn’t modify the memory which stores the value <code>2</code>. Instead, somewhere in the memory now has the value 3 and this memory location has a name <code>x</code>. And the variable <code>y</code> is not affected at all, as well as the memory location it points to.</p>
<h3>Mutability</h3>
<p>Almost everything in R or Python is an object, including these data structures we introduced in previous sections. Mutability is a property of objects, not variables, because a variable is just a name.</p>
<p>A list in Python is mutable meaning that we could change the elements stored in the list object without copying the list object from one memory location to another. We can use the <code>id</code> function in Python to check the memory location for a variable. In the code below, we modified the first element of the list object with name <code>x</code>. And since Python list is mutable, the memory address of the list doesn’t change.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno">1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1001</span><span class="p">))</span> <span class="c1"># list() convert a range object to a list</span>
<span class="lineno">2 </span><span class="o">>>></span> <span class="nb">hex</span><span class="p">(</span><span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="c1"># print the memory address of x</span>
<span class="lineno">3 </span><span class="s1">'0x10592d908'</span>
<span class="lineno">4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mf">1.0</span> <span class="c1"># from integer to float</span>
<span class="lineno">5 </span><span class="o">>>></span> <span class="nb">hex</span><span class="p">(</span><span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="lineno">6 </span><span class="s1">'0x10592d908'</span></code></pre></figure><p>Is there any immutable data structure in Python? Yes, for example tuple is immutable, which contains a sequence of elements. The element accessing and subset slicing of tuple is following the same rules of list in Python.</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,)</span> <span class="c1"># use () to create a tuple in Python, it is better to always put a comma in the end</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="nb">type</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o"><</span><span class="k">class</span> <span class="err">'</span><span class="nc">tuple</span><span class="s1">'></span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 5 </span><span class="mi">3</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="lineno"> 7 </span><span class="mi">1</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=-</span><span class="mi">1</span>
<span class="lineno"> 9 </span><span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="lineno">10 </span> <span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="lineno">11 </span><span class="ne">TypeError</span><span class="p">:</span> <span class="s1">'tuple'</span> <span class="nb">object</span> <span class="n">does</span> <span class="ow">not</span> <span class="n">support</span> <span class="n">item</span> <span class="n">assignment</span></code></pre></figure><p>If we have two Python variables pointed to the same memory, when we modify the memory via one variable the other is also affected as we expect (see the example below).</p>
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno"> 2 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="mi">4535423616</span>
<span class="lineno"> 4 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">0</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">x</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">y</span><span class="o">=</span><span class="n">x</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="mi">4535459104</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="lineno">10 </span><span class="mi">4535459104</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="mi">0</span>
<span class="lineno">12 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno">13 </span><span class="mi">4535459104</span>
<span class="lineno">14 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="lineno">15 </span><span class="mi">4535459104</span>
<span class="lineno">16 </span><span class="o">>>></span> <span class="n">x</span>
<span class="lineno">17 </span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="lineno">18 </span><span class="o">>>></span> <span class="n">y</span>
<span class="lineno">19 </span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span></code></pre></figure><p>In contrast, the mutability of vector in R is more complex and sometimes confusing. First, let’s see the behavior when there is a single name given to the vector object stored in memory.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> a<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>a<span class="p">))</span>
<span class="lineno">3 </span><span class="o">@</span><span class="m">7</span>fe94408f3c8 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">1</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">4 </span><span class="o">></span> a<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=</span><span class="m">0</span>
<span class="lineno">5 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>a<span class="p">))</span>
<span class="lineno">6 </span><span class="o">@</span><span class="m">7</span>fe94408f3c8 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">1</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">0</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span></code></pre></figure><p>It is clear in this case the vector object is mutable since the memory address doesn’t change after the modification. What if there is an additional name given to the memory?</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> a<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">)</span>
<span class="lineno"> 2 </span><span class="o">></span> b<span class="o">=</span>a
<span class="lineno"> 3 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>a<span class="p">))</span>
<span class="lineno"> 4 </span><span class="o">@</span><span class="m">7</span>fe94408f238 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">2</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno"> 5 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>b<span class="p">))</span>
<span class="lineno"> 6 </span><span class="o">@</span><span class="m">7</span>fe94408f238 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">2</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno"> 7 </span><span class="o">></span> a<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=</span><span class="m">0</span>
<span class="lineno"> 8 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>a<span class="p">))</span>
<span class="lineno"> 9 </span><span class="o">@</span><span class="m">7</span>fe94408f0a8 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">1</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">0</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">10 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>b<span class="p">))</span>
<span class="lineno">11 </span><span class="o">@</span><span class="m">7</span>fe94408f238 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">2</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">12 </span><span class="o">></span> a
<span class="lineno">13 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span> <span class="m">2</span> <span class="m">3</span>
<span class="lineno">14 </span><span class="o">></span> b
<span class="lineno">15 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">3</span></code></pre></figure><p>Before the modification, both variable <code>a</code> and <code>b</code> point to the same vector object in the memory. But surprisingly, after the modification the memory address of variable <code>a</code> also changed, which is called “copy on modify” in R. And because of this unique behavior, the modification of <code>a</code> doesn’t affect the object stored in the old memory and thus the vector object is immutable in this case. The mutability of R <code>list</code> is similar to that of R <code>vector</code>.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno">1 </span><span class="o">></span> x<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">)</span>
<span class="lineno">2 </span><span class="o">></span> <span class="kp">tracemem</span><span class="p">(</span>x<span class="p">)</span> <span class="c1"># print the memory address of x whenever the address changes</span>
<span class="lineno">3 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"<0x7ff360c95c08>"</span>
<span class="lineno">4 </span><span class="o">></span> x<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=-</span>x<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="c1"># type not changed, i.e., from integer to integer</span>
<span class="lineno">5 </span><span class="o">></span> <span class="kp">tracemem</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno">6 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="s">"<0x7ff360c95c08>"</span>
<span class="lineno">7 </span><span class="o">></span> x<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=</span><span class="m">-1.0</span>
<span class="lineno">8 </span><span class="kp">tracemem</span><span class="p">[</span><span class="mh">0x7ff360c95c08</span> <span class="o">-></span> <span class="mh">0x7ff3604692d8</span><span class="p">]</span><span class="o">:</span> </code></pre></figure><h3>Variable as function argument</h3>
<p>Most of functions/methods in R and Python take some variables as argument. What happens when we pass the variables into a function?</p>
<p>In Python, the variable, i.e., the name of the object is passed into a function. If the variable points to an immutable object, any modification to the variable, i.e., the name doesn’t persist. However, when the variable points to a mutable object, the modification of the object stored in memory persist. Let’s see the examples below.</p>
<div class="codewrapper">
<div class="codeleft">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">g</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="lineno"> 3 </span><span class="o">...</span> <span class="n">x</span><span class="o">-=</span><span class="mi">1</span>
<span class="lineno"> 4 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="lineno"> 5 </span><span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="n">a</span><span class="o">=</span><span class="mi">1</span>
<span class="lineno"> 7 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="lineno"> 8 </span><span class="mi">4531658512</span>
<span class="lineno"> 9 </span><span class="o">>>></span> <span class="n">g</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="lineno">10 </span><span class="mi">4531658512</span>
<span class="lineno">11 </span><span class="mi">4531658480</span>
<span class="lineno">12 </span><span class="mi">0</span>
<span class="lineno">13 </span><span class="o">>>></span> <span class="n">a</span>
<span class="lineno">14 </span><span class="mi">1</span></code></pre></figure></div>
<div class="coderight">
<language>Python</language>
<figure class="highlight"><pre><code class="language-python3" data-lang="python3"><span></span><span class="lineno"> 1 </span><span class="o">>>></span> <span class="k">def</span> <span class="nf">f</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="lineno"> 2 </span><span class="o">...</span> <span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 3 </span><span class="o">...</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-=</span><span class="mi">1</span>
<span class="lineno"> 4 </span><span class="o">...</span> <span class="nb">id</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="lineno"> 5 </span><span class="o">>>></span> <span class="n">a</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="lineno"> 6 </span><span class="o">>>></span> <span class="nb">id</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="lineno"> 7 </span><span class="mi">4535423616</span>
<span class="lineno"> 8 </span><span class="o">>>></span> <span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="mi">4535423616</span>
<span class="lineno">10 </span><span class="mi">4535423616</span>
<span class="lineno">11 </span><span class="o">>>></span> <span class="n">a</span>
<span class="lineno">12 </span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span></code></pre></figure></div>
</div>
<p>We see that the object is passed into function by its name. If the object is immutable, a new copy is created in memory when any modification is made to the original object. When the object is immutable, no new copy is made and the thus the change persists out of the function.</p>
<p>In R, the passed object is always copied on a modification inside the function, and thus no modification can be made on the original object in memory.</p>
<language>R</language>
<figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="lineno"> 1 </span><span class="o">></span> f<span class="o">=</span><span class="kr">function</span><span class="p">(</span>x<span class="p">){</span>
<span class="lineno"> 2 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span><span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>x<span class="p">)))</span>
<span class="lineno"> 3 </span><span class="o">+</span> x<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="o">=</span>x<span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="m">-1</span>
<span class="lineno"> 4 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span><span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>x<span class="p">)))</span>
<span class="lineno"> 5 </span><span class="o">+</span> <span class="kp">print</span><span class="p">(</span>x<span class="p">)</span>
<span class="lineno"> 6 </span><span class="o">+</span> <span class="p">}</span>
<span class="lineno"> 7 </span><span class="o">></span>
<span class="lineno"> 8 </span><span class="o">></span> a<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span><span class="p">)</span>
<span class="lineno"> 9 </span><span class="o">></span> <span class="m">.</span>Internal<span class="p">(</span>inspect<span class="p">(</span>a<span class="p">))</span>
<span class="lineno">10 </span><span class="o">@</span><span class="m">7</span>fe945538688 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">1</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">11 </span><span class="o">></span> f<span class="p">(</span>a<span class="p">)</span>
<span class="lineno">12 </span><span class="o">@</span><span class="m">7</span>fe945538688 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">3</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">13 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">3</span>
<span class="lineno">14 </span><span class="o">@</span><span class="m">7</span>fe945538598 <span class="m">14</span> REALSXP g0c3 <span class="p">[</span>NAM<span class="p">(</span><span class="m">1</span><span class="p">)]</span> <span class="p">(</span>len<span class="o">=</span><span class="m">3</span><span class="p">,</span> tl<span class="o">=</span><span class="m">0</span><span class="p">)</span> <span class="m">0</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">3</span>
<span class="lineno">15 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span> <span class="m">2</span> <span class="m">3</span>
<span class="lineno">16 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">0</span> <span class="m">2</span> <span class="m">3</span>
<span class="lineno">17 </span><span class="o">></span> a
<span class="lineno">18 </span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">1</span> <span class="m">2</span> <span class="m">3</span></code></pre></figure><p>People may argue that R functions are not as flexible as Python functions. However, it makes more sense to do functional programming in R since we usually can’t modify object passed into a function.</p>
<h3>Scope of variables</h3>