-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfeed.xml
2777 lines (1845 loc) · 190 KB
/
feed.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<<<<<<< HEAD
<title>Tools for Information Literacy</title>
<description>Tools and concepts for information literacy. Includes software use and maintenance, computer applications, and networked information systems.</description>
<link>http://inls161.johndmart.in/</link>
<atom:link href="http://inls161.johndmart.in/feed.xml" rel="self" type="application/rss+xml"/>
<pubDate>Wed, 13 Jul 2016 11:46:46 -0400</pubDate>
<lastBuildDate>Wed, 13 Jul 2016 11:46:46 -0400</lastBuildDate>
<generator>Jekyll v3.1.3</generator>
<item>
<title>Database software and MySQL</title>
<description><p>Today we will jump head-first into working with databases as an extension of our discussions earlier in the week about data.<br>
<excerpt/></p>
<p>A &quot;database&quot; is a collection of information, arranged or organized in some meaningful way so as to aid the retrieval of that information.
The types of information stored in databases vary based on the purpose and application. </p>
<p>When we refer to databases now, we usually mean electronic databases or DataBase Management Systems (DBMS).
However, databases can exist in a number of non-electonic forms (and have for much of human history).</p>
<p>What are some examples of non-electronic databases?</p>
<h1 id="what-goes-into-a-database?">What goes into a database?</h1>
<p>Pieces of information, or objects, are stored in a database in a structured way.
The objects are sorted into classes according to type.
This approach is referred to as &quot;object-oriented,&quot; which we have briefly touched on over the course of the semester. </p>
<p>When data are entered into a database, each object, or chunk of information, is assigned a class that allows for sorting and recall. </p>
<h2 id="how-does-the-information-go-in?">How does the information go in?</h2>
<p>There are a number of different ways to get data into a database. </p>
<p>We can import tables directly from files.
We can also input data one at a time using an entry form.
We will do both of these things over the next few days. </p>
<p>Today we will start by experimenting with opening the CSV we made the other day as a table. </p>
<h2 id="electronic-dbms">Electronic DBMS</h2>
<p>There are myriad choices for DMBS implementations.
A commonly used system is called MySQL, which is based in a database language called SQL (Structured Query Language). </p>
<p>You will also hear of NoSQL databases, such as MongoDB.
These use an entirely different internal logic to store and recall data that that of SQL-based systems. </p>
<p>There are a great many front-end user interfaces for these systems.
MSAccess has been used for years for all sorts of applications. </p>
<p>We will use the LibreOffice Base package.
It is an open-source analog to Access and will allow us to do the same things.
One major benefit of this is that we will be able to open our database without being locked to MSAccess.
At present, there is no way to open an Access DB in another program.
We wish to avoid that. </p>
<h1 id="install-mysql">Install MySQL</h1>
<p>We will install MySQL so that we can create and explore a database using the SQL shell in our CodeAnywhere containers.</p>
<h1 id="system-maintenance">System maintenance</h1>
<p>First update and upgrade all packages:</p>
<p><code>sudo apt-get update &amp;&amp; sudo apt-get upgrade</code></p>
<p>We will need to make some space on our containers to install MySQL. </p>
<p>We no longer need <code>unoconv</code>, so let&#39;s get rid of it:</p>
<p><code>sudo apt-get remove unoconv</code></p>
<p>When that is done, we can remove all of its dependencies:</p>
<p><code>sudo apt-get autoremove --purge</code></p>
<p>And then just for good measure, let&#39;s clean out our package cache:</p>
<p><code>sudo apt-get autoclean &amp;&amp; sudo apt-get clean</code></p>
<h1 id="install-the-mysql-client-and-server-packages">Install the MySQL client and server packages</h1>
<p>Then we have to install one dependency, without which the install will fail:</p>
<p><code>sudo apt-get install bsdutils</code></p>
<p>Then install both the MySQL server and client packages in separate commands. </p>
<p><code>sudo apt-get install mysql-server</code></p>
<p>This will ask you to create a password for the MySQL root user.
Since we are only trying things out today and not installing this for the purpose of running a real SQL server, just put <code>changethis</code> as the root password. </p>
<p>Then install the client:</p>
<p><code>sudo apt-get install mysql-client</code></p>
<h1 id="get-some-data">Get some data</h1>
<p>Let&#39;s download some CSV files that I prepared with our books list in them. </p>
<p><code>wget http://inls161.johndmart.in/raw-material/tblBook.csv</code> </p>
<p><code>wget http://inls161.johndmart.in/raw-material/tblPub.csv</code></p>
<h1 id="the-mysql-prompt">The MySQL prompt</h1>
<p>Once we are all installed, issue the <code>mysql</code> command to get into the <code>mysql&gt;</code> prompt:</p>
<p><code>mysql -u root -p</code></p>
<p>This specifies that you want to use the root user to login to the MySQL prompt. </p>
<p>Next let&#39;s create a new DB. Make sure that your prompt looks like this:</p>
<p><code>mysql&gt;</code></p>
<p>If it does, then you can type:</p>
<p><code>CREATE DATABASE booksinfo;</code></p>
<p>Commands in the mysql&gt; prompt are <em>case-sensitive,</em> so pay attention to the case of the commands. </p>
<p>Let&#39;s list our DBs:</p>
<p><code>show databases;</code></p>
<p>We should see the DB with the name that we created in the list. Let&#39;s move into it:</p>
<p><code>USE booksinfo;</code></p>
<h1 id="add-tables">Add tables</h1>
<p>Now we have to create two tables so that we can import data from our CSV files.</p>
<p><code>CREATE TABLE tblBook (ID INT, Title VARCHAR(255), Date INT, RetailPrice DECIMAL(5,2), Copies INT, ShelfNumber VARCHAR(255), PubID INT);</code></p>
<p><code>CREATE TABLE tblPub (ID INT, Publisher VARCHAR(255), City VARCHAR(255), State VARCHAR(255), Country VARCHAR(255));</code></p>
<p>See what tables we have just created:</p>
<p><code>show tables;</code></p>
<p>Let&#39;s import some tables from the files we downloaded earlier:</p>
<p><code>LOAD DATA INFILE &#39;/home/cabox/workspace/tblBook.csv&#39; INTO TABLE tblBook FIELDS TERMINATED BY &#39;,&#39; OPTIONALLY ENCLOSED BY &#39;&quot;&#39;;</code></p>
<p>This should give us some output.
If we notice a warning, type the following to view the warnings:</p>
<p><code>SHOW WARNINGS;</code></p>
<p>So, it looks like we have a missing date.
No big deal.
We&#39;ll deal with that later.
Let&#39;s import our other table. </p>
<p><code>LOAD DATA INFILE &#39;/home/cabox/workspace/tblPub.csv&#39; INTO TABLE tblPub FIELDS TERMINATED BY &#39;,&#39; OPTIONALLY ENCLOSED BY &#39;&quot;&#39;;</code></p>
<p>Let&#39;s see what is in our tables:</p>
<p><code>SHOW COLUMNS FROM tblBook;</code></p>
<p><code>SHOW COLUMNS FROM tblPub;</code></p>
<p>We&#39;ll notice that we have no key set for either table.
We need to do this, right?</p>
<p><code>ALTER TABLE tblBook ADD PRIMARY KEY (ID);</code></p>
<p>Now look at the table again and see that it has changed:</p>
<p><code>SHOW COLUMNS FROM tblBook;</code></p>
<p>Now do the same for the other table:</p>
<p><code>SHOW COLUMNS FROM tblPub;</code></p>
<p><code>ALTER TABLE tblPub ADD PRIMARY KEY (ID);</code></p>
<h1 id="define-relationships">Define relationships</h1>
<p>We need to tell MySQL that the PubID column in tblBook refers to the primary key in tblPub. This action is called a constraint and the reference is called a foreign key. </p>
<p><code>ALTER TABLE tblBook ADD CONSTRAINT fk_PubID FOREIGN KEY (PubID) REFERENCES tblPub(ID) ON UPDATE NO ACTION;</code></p>
<p>Let&#39;s look at our columns again:</p>
<p><code>SHOW COLUMNS FROM tblBook;</code></p>
<p>You&#39;ll notice that the &#39;Key&#39; column now has &#39;MUL&#39; for PubID.
This means that we are using that column as an index as well as the primary key column.
This new index just happens to be non-unique. </p>
<h1 id="summaries">Summaries</h1>
<p>So now that we have our data in place, let&#39;s summarize it a bit. </p>
<p><code>SELECT COUNT(*) FROM tblBook;</code></p>
<p>What if we want to see the first ten rows in tblBook?</p>
<p><code>SELECT * FROM tblBook ORDER BY Date LIMIT 10;</code></p>
<p>Let&#39;s look at only books published after 1980;</p>
<p><code>SELECT * FROM tblBook WHERE Date &gt; 1980 ORDER BY Date;</code></p>
<p>And count them:</p>
<p><code>SELECT COUNT(*) FROM tblBook WHERE Date &gt; 1980 ORDER BY Date;</code></p>
<p>How about only books published in 1980?</p>
<p><code>SELECT * FROM tblBook WHERE Date = 1980 ORDER BY ShelfNumber;</code></p>
<p>Let&#39;s find out how much our books cost:</p>
<p><code>SELECT AVG(RetailPrice) FROM tblBook;</code></p>
<p><code>SELECT MIN(RetailPrice) FROM tblBook;</code></p>
<p><code>SELECT MAX(RetailPrice) FROM tblBook;</code></p>
<p>Let&#39;s summarize all of that together.</p>
<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span>
<span class="k">AVG</span><span class="p">(</span><span class="n">RetailPrice</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'Average Price'</span><span class="p">,</span>
<span class="k">MIN</span><span class="p">(</span><span class="n">RetailPrice</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'Lowest Price'</span><span class="p">,</span>
<span class="k">MAX</span><span class="p">(</span><span class="n">RetailPrice</span><span class="p">)</span> <span class="k">AS</span> <span class="s1">'Highest Price'</span>
<span class="k">FROM</span> <span class="n">tblBook</span><span class="p">;</span>
</code></pre></div>
<h1 id="queries">Queries</h1>
<p>Let&#39;s try something that I posed the other day in class.
I want to list out all the books in the list that were published by Oxford University Press. </p>
<p><code>SELECT * FROM tblBook b RIGHT JOIN tblPub p ON b.PubID = p.ID WHERE p.Publisher = &#39;Oxford University Press&#39;;</code></p>
<p>How many locations does OUP have? </p>
<p><code>SELECT COUNT(*) FROM tblPub WHERE Publisher = &#39;Oxford University Press&#39;;</code></p>
<p>Where are they?</p>
<p><code>SELECT * FROM tblPub WHERE Publisher = &#39;Oxford University Press&#39;;</code></p>
<p>What if I only want the books published by OUP&#39;s US office?
I have to specify an additional criterion. </p>
<p><code>SELECT * FROM tblBook b RIGHT JOIN tblPub p ON b.PubID = p.ID WHERE p.Publisher = &#39;Oxford University Press&#39; AND p.Country = &#39;US&#39;;</code></p>
<p>I wonder if books cost more from the OUP GB location than from the OUP US location.</p>
<p><code>SELECT AVG(RetailPrice) FROM tblBook b RIGHT JOIN tblPub p ON b.PubID = p.ID WHERE p.Publisher = &#39;Oxford University Press&#39; AND p.Country = &#39;GB&#39;;</code></p>
<p><code>SELECT AVG(RetailPrice) FROM tblBook b RIGHT JOIN tblPub p ON b.PubID = p.ID WHERE p.Publisher = &#39;Oxford University Press&#39; AND p.Country = &#39;US&#39;;</code></p>
<h1 id="output-from-mysql-queries-as-tables">Output from MySQL queries as tables</h1>
<p>We can take any of the above queries and output the results to a table. </p>
<p>We need to add <code>CREATE TABLE qryName</code> to the front of any of our query commands. </p>
<p>Here is an example using our price summary from above:</p>
<p><code>CREATE TABLE qryPrices SELECT AVG(RetailPrice) AS &#39;Average Price&#39;, MIN(RetailPrice) AS &#39;Lowest Price&#39;, MAX(RetailPrice) AS &#39;Highest Price&#39; FROM tblBook;</code></p>
<p>See if it worked by listing the tables:</p>
<p><code>show tables;</code></p>
<p>Look at what is in the new table:</p>
<p><code>SELECT * FROM qryPrices</code></p>
<p>It should match the output from when you ran the query before. </p>
<h1 id="output-from-mysql-queries-in-other-formats">Output from MySQL queries in other formats</h1>
<p>We can output all of this stuff outside of our MySQL prompt shell (in a normal shell).</p>
<p>Let&#39;s ask for the summary of book prices in HTML and then we can try some other queries if we have time. </p>
<p><code>quit</code> to exit the MySQL prompt.</p>
<p>Now you have exited from the MySql prompt.</p>
<p><code>mysql -u root -p -H -e &quot;SELECT AVG(RetailPrice) AS &#39;Average Price&#39;, MIN(RetailPrice) AS &#39;Lowest Price&#39;, MAX(RetailPrice) AS &#39;Highest Price&#39; FROM tblBook;&quot; booksinfo</code></p>
<h1 id="exporting-and-importing">Exporting and Importing</h1>
<p>To export your whole database so that you can use it elsewhere (i.e., transfer it to a different server) do the following command:</p>
<p><code>mysqldump -u root -p booksinfo &gt; booksinfo.sql</code></p>
<p>If you want to then import that same database somewhere else, the command is very similar. The direction changes, and instead of the specialized <code>mysqldump</code> command, you use just the standard MySQL client command:</p>
<p><code>mysql -u root -p booksinfo &lt; booksinfo.sql</code></p>
<h1 id="for-next-time">For next time</h1>
<p>Tomorrow, we will return to databases and discuss the conceptual and theoretical underpinnings of what we worked on today.
I would like you to read something about databases.<label for='db' class='margin-toggle sidenote-number'></label><input type='checkbox' id='db' class='margin-toggle'/><span class='sidenote'>“What Is a Database?” BBC Guides. <a href="http://www.bbc.co.uk/guides/z8yk87h">http://www.bbc.co.uk/guides/z8yk87h</a>. </span></p>
<p>We will look at other resources and tutorials in class as well. </p>
</description>
<pubDate>Wed, 13 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/data/2016/07/13/sql-bootcamp/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/data/2016/07/13/sql-bootcamp/</guid>
<category>database</category>
<category>SQL</category>
<category>tables</category>
<category>queries</category>
<category>Data</category>
</item>
<item>
<title>Data cleaning and spreadsheet software</title>
<description><p>Today we&#39;re going to look at one common way of manipulating CSV and other data files: the spreadsheet.
<excerpt/></p>
<h1 id="csv-to-spreadsheet">CSV to spreadsheet</h1>
<p>Spreadsheets appear to operate in a very simple manner, storing data in rows and columns just like any other table, but looks can be deceiving. </p>
<p>Just like with DOCX and ODT files when compared with plaintext files, spreadsheets hide a lot of metadata underneath what appears on the surface. </p>
<p>Unlike with the relationship between plaintext and interface-wrapped formatted text files, it is more difficult to get all of the information out of the spreadsheet. </p>
<p>As mentioned in the piece by Paul Ford, spreadsheets can hide a lot of things from us, including errors.
If our code to perform the same kinds of operations is visible to us, it is easier to check for errors. </p>
<p>That said, spreadsheets are a powerful tool and should be used in certain tasks in preference over other tools. </p>
<p>Once you have learned to do basic math in a spreadsheet, there is absolutely no reason to ever use a calculator (or calculator app) for instance.</p>
<h1 id="caveats-and-pitfalls-of-using-spreadsheets">Caveats and pitfalls of using spreadsheets</h1>
<p>This forum post is a good guide to things that you should be aware of when using spreadsheets:</p>
<p><a href="https://forum.openoffice.org/en/forum/viewtopic.php?t=39529">https://forum.openoffice.org/en/forum/viewtopic.php?t=39529</a></p>
<p>It pertains to OpenOffice, which is the productivity suite that LibreOffice was forked from some years ago.
OpenOffice is now owned by Oracle and some of the community was not happy with licensing changes that had occured, so they jumped ship and moved their development to LibreOffice under the umbrella of the Open Document Foundation.
The software still operates similarly enough that the information is relevant to us.
This is because both suites use ODF at their core. </p>
<p>It will also be a good idea to have a look at the LibreOffice documentation to familiarize yourself with it as a reference: <a href="https://www.libreoffice.org/get-help/documentation/">https://www.libreoffice.org/get-help/documentation/</a></p>
<p>Finally, here is a compendium of all of the functions available to you in LibreOffice Calc: <a href="https://help.libreoffice.org/Calc/Spreadsheet_Functions">https://help.libreoffice.org/Calc/Spreadsheet_Functions</a></p>
<h1 id="for-next-time">For next time</h1>
<p>We&#39;re going to break into databases next time and have a MySQL crash course.
I would like you to look over some basic MySQL tutorials on your own so that we are prepared to dig into this.<label for='sql' class='margin-toggle sidenote-number'></label><input type='checkbox' id='sql' class='margin-toggle'/><span class='sidenote'>Sverdlov, Etel. “A Basic MySQL Tutorial.” DigitalOcean. Last modified June 12, 2012. <a href="https://www.digitalocean.com/community/tutorials/a-basic-mysql-tutorial">https://www.digitalocean.com/community/tutorials/a-basic-mysql-tutorial</a>. </span> </p>
</description>
<pubDate>Tue, 12 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/data/2016/07/12/data-cleaning/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/data/2016/07/12/data-cleaning/</guid>
<category>CSV</category>
<category>data</category>
<category>spreadsheets</category>
<category>Data</category>
</item>
<item>
<title>Handling data</title>
<description><p>Today we are going to discuss the creation of data and learn how manipulate data structures. </p>
<p>We will learn some things about using pipes to redirect output and learn some commands for working with data.
<excerpt/></p>
<h1 id="data">Data</h1>
<p>So, data? </p>
<p>What is data? </p>
<p>Rather, we should ask: &quot;What are data?&quot; </p>
<p>datum, data, <em>n</em> - something given (past participle of the verb, <em>dare,</em> &quot;to give&quot;).</p>
<p>Where does it come from? What do we use it for? What does it all mean? </p>
<p>The major question that we are going to be asking ourselves here is &quot;How are we going to get data into and out of different formats?&quot; </p>
<p>We will start with lists of similar data and then move to structured and ordered sets of lists (tables). </p>
<p>Eventually we will consider linked sets of data in the form of databases.</p>
<h2 id="raw-data">Raw data</h2>
<p>&quot;Raw&quot; data is sort of an oxymoron.
There is very little data available that is actually really raw in the sense that it has not been touched, manipulated, massaged, curated, or cleaned by some human intervention. </p>
<p>Remember, even data that is available on the web is not raw, it is text that we have marked up and structured in specific ways.
However, web data can stand in as an analog for raw data. </p>
<p>The process through which we might gather data via the web is referred to as &quot;scraping.&quot;
A &quot;scraper&quot; is a program that reaches out into the web and grabs all of the text (including markup) available at a URL and saves it in some meaningfully structured way. </p>
<p>We&#39;re not going to dig into web-scraping too much, but I want you to be aware of how data can be gathered on the web. </p>
<p>One tool that can be used for web scraping is our friend, <code>wget</code>.
We&#39;ve used it to download remote files, but it can also be used to get whole websites and all of the data linked from them. </p>
<p>This can be useful for mirroring a website.
It can also be useful in aggregating unstructured data so that it might be manipulated into structured data. </p>
<h2 id="structured-data">Structured data</h2>
<p>One simple fomat for structured data is a table. </p>
<p>Rows in the table represent individual cases or instances of something. </p>
<p>Columns represent variables. </p>
<p>What is the difference? </p>
<p>In the data that we are going to create in class, our rows will represent individual people.
The information contained in these rows will be given (&quot;datum, a thing given&quot;) to us by every member of this class.
The columns will represent a specifically defined aspect of data that we gather about every individual person. </p>
<p>We will start with making our own individual lists and then aggregate them. </p>
<h1 id="the-humble-and-mighty-csv">The humble and mighty CSV</h1>
<h3 id="lists">Lists</h3>
<p>We&#39;ll start with a list of data. </p>
<p>Open a new file and name it with your GitHub user account and the extension .list. </p>
<p>Mine will be <code>jdmar3.list</code>. </p>
<p>Inside the file, I want you to give one-word or numerical answers to the following (as specified), in this order, each on their own line:</p>
<p>What is your GitHub username?
How tall are you (in centimeters)?
What time did you wake up this morning (in 24-hour/military time: e.g., 06:30)?
How many semesters do you have left in your degree program?
Approximately how far is your home city/town away from UNC/Chapel Hill (in km)?</p>
<p>If any answer doesn&#39;t apply to you, type <code>NA</code> (&quot;not applicable&quot;).</p>
<p>My file will look like this:</p>
<div class="highlight"><pre><code class="language-" data-lang="">jdmar3
175.26
06:45
2
1,129.3
</code></pre></div>
<p>Very simple. </p>
<h3 id="comma-separated-values-(csv)">Comma Separated Values (CSV)</h3>
<p>Now that we have listed some information about ourselves, lets try to aggregate our data. </p>
<p>If we want to put all of our data together as it is, we will just end up with a super long list that is difficult for us to use in any meaninful way.
If we take our list and flip it, so that we have a single line instead, we can then stack all of our data up together.
We can separate the elements in the list with commas (or tabs, semicolons, pipe separators, or some other marker) and then we will have a row of what will become a Comma Separated Values file: structured data. </p>
<p>We can do this by hand, but that is boring. </p>
<p>Let&#39;s learn a command to do this:</p>
<p><code>paste -d, -s example.list</code> </p>
<p><code>paste</code> sequentially reads the lines from a file and then writes them out in the same sequence, separated by something (tabs, by default).
In this case we are asking it to read every line in our file, and then write it out separated by a comma (<code>-d,</code>).
The <code>-s</code> tells <code>paste</code> to serialize its operations instead of parellelizing them. </p>
<p>So our standard ouput (STDOUT) from the above command will be:</p>
<div class="highlight"><pre><code class="language-" data-lang="">gh-username,height,wakeup,semesters-left,hometown-distance
</code></pre></div>
<h1 id="output-redirection">Output redirection</h1>
<p>To get this into a file, we will use one of several forms of output redirection. </p>
<p>Output redirection is simple.
It merely allows for the echoed output of one file to be put into another file.
We can use programs on top of this to manipulate that output. </p>
<p>For example:</p>
<p><code>paste -d, -s example.list &gt; example.csv</code> </p>
<p>This will take the output from the first part of the command and overwrite the CSV file specified in the second part. </p>
<p>This command will append the output to the file instead of overwriting it:</p>
<p><code>paste -d, -s example.list &gt; example.csv</code></p>
<h1 id="pipes">Pipes</h1>
<p>A &quot;pipe&quot; is an operator that tells a program to take output from another program.
You&#39;ll find it on your keyboard as SHIFT+.</p>
<p>Pipes translate the output of one program (STDOUT) into being input for another program (STDIN). </p>
<p>For example, if we wanted to count how many lines were in our csv file, we could run:</p>
<p><code>cat example.csv | wc -l</code></p>
<h1 id="for-next-time">For next time</h1>
<p>Tomorrow, we are going to work in groups to learn to create and aggregate data using scripts.
In your groups, you will write a script that asks the above questions of the user and then appends their answer to a CSV file.
This will be the basis of the next asssignment, which will be a group assignment. </p>
<p>I would like you to review some commands for working with a CSV file including how pipes work.<label for='csv' class='margin-toggle sidenote-number'></label><input type='checkbox' id='csv' class='margin-toggle'/><span class='sidenote'>Connelly, Brian. “Working with CSVs on the Command Line.” <a href="http://bconnelly.net/">http://bconnelly.net/</a>. Last modified September 23, 2013. <a href="http://bconnelly.net/working-with-csvs-on-the-command-line/">http://bconnelly.net/working-with-csvs-on-the-command-line/</a>. </span> </p>
<p>I would also like you to watch the following video on working with CSV files. I think that it might be very helpful. Try watching it once and then following along a second time. </p>
<div class="video-container">
<iframe width="560" height="315" src="https://www.youtube.com/embed/OecFFZpIkDc" frameborder="0" allowfullscreen></iframe>
</div>
</description>
<pubDate>Mon, 11 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/data/2016/07/11/data-handling/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/data/2016/07/11/data-handling/</guid>
<category>CSV</category>
<category>data</category>
<category>scripting</category>
<category>Bash</category>
<category>Data</category>
</item>
<item>
<title>Lab #3: Automation and Scripting</title>
<description><p>Today we are going to go over more scripting tricks this morning and then work on our next assignment for the remainder of the session.
<excerpt/></p>
<p>We will use a few online tutorials for reference.<label for='shell-scripts' class='margin-toggle sidenote-number'></label><input type='checkbox' id='shell-scripts' class='margin-toggle'/><span class='sidenote'>Shotts, William, Jr. “Writing Shell Scripts.” LinuxCommand.org. Accessed July 6, 2016. <a href="http://linuxcommand.org/lc3_writing_shell_scripts.php">http://linuxcommand.org/lc3_writing_shell_scripts.php</a>.<br/><br/>Chadwick, Ryan. “User Input - Bash Scripting Tutorial.” Ryan’s Tutorials. <a href="http://ryanstutorials.net/bash-scripting-tutorial/bash-input.php">http://ryanstutorials.net/bash-scripting-tutorial/bash-input.php</a>. </span> </p>
<h2 id="adding-user-input">Adding user input</h2>
<p>We&#39;re going to add user input to our scripts from yesterday. </p>
<p>To do this, we use the <code>read</code> command to accept the next line of input and save it as a variable.
Instead of putting our GitHub username in the script where we want it to read out, this method will prompt for a username.</p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
<span class="c"># Say "Hello" to the world.</span>
<span class="nb">echo</span> <span class="s2">"Hello, world!"</span>
<span class="c"># Ask who the user is</span>
<span class="nb">echo</span> <span class="s2">"Who are you?"</span>
<span class="c"># Read GitHub username from input</span>
<span class="nb">read </span>GHUSERNAME
<span class="c"># Say "Hello" to me.</span>
<span class="nb">echo</span> <span class="s2">"Hello, </span><span class="nv">$GHUSERNAME</span><span class="s2">!"</span>
</code></pre></div>
<p>Now if we run the script, it will pause and ask us for input:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh
Hello, world!
Who are you?
</code></pre></div>
<p>If I type my GitHub username and press enter, then Bash will read it and put it in the right place in the output. </p>
<div class="highlight"><pre><code class="language-" data-lang="">jdmar3
Hello, jdmar3!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<h2 id="setting-a-variable">Setting a variable</h2>
<p>We can also set the username we want to output before we run the second echo command.
It&#39;s best to set variables at the top of the file so that they are easily found. </p>
<p>Here is an example: </p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
<span class="c"># Read GH username in as variable</span>
<span class="nv">GHUSERNAME</span><span class="o">=</span>jdmar3
<span class="c"># Say "Hello" to the world.</span>
<span class="nb">echo</span> <span class="s2">"Hello, world!"</span>
<span class="c"># Say "Hello" to me.</span>
<span class="nb">echo</span> <span class="s2">"Hello, </span><span class="nv">$GHUSERNAME</span><span class="s2">!"</span>
</code></pre></div>
<p>If we run this the Bash will output the value we set in place of the variable name:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh
Hello, world!
Hello, jdmar3!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<h2 id="accepting-a-variable-from-command-line-input">Accepting a variable from command line input</h2>
<p>We can also specify a varialbe value in the command line itself.
To do this, we need to tell Bash to read what comes after the run command in as a variable, like this: </p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
<span class="c"># Read GH username</span>
<span class="nv">GHUSERNAME</span><span class="o">=</span><span class="nv">$1</span>
<span class="c"># Say "Hello" to the world.</span>
<span class="nb">echo</span> <span class="s2">"Hello, world!"</span>
<span class="c"># Say "Hello" to me.</span>
<span class="nb">echo</span> <span class="s2">"Hello, </span><span class="nv">$GHUSERNAME</span><span class="s2">!"</span>
</code></pre></div>
<p>Now when we run the script, we will have to specify the username directly after the name of the script. </p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh jdmar3
Hello, world!
Hello, jdmar3!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<p>If you change the input, you will see it change in the output now:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh YOUR-GITHUB-USERNAME
Hello, world!
Hello, YOUR-GITHUB-USERNAME!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<h1 id="lab">Lab</h1>
<p>You can use the techniques above to create a script to automate your document conversion workflow.
Use the assignment text to make sure that you have gotten each of the required parts to work properly. </p>
<h1 id="for-next-week">For next week</h1>
<p>Next week we will use similar methods to help to automate the collection and formatting of data in flat data tables.
Through this we will learn more about pipes and redirecting output.<label for='csv' class='margin-toggle sidenote-number'></label><input type='checkbox' id='csv' class='margin-toggle'/><span class='sidenote'>Connelly, Brian. “Working with CSVs on the Command Line.” <a href="http://bconnelly.net/">http://bconnelly.net/</a>. Last modified September 23, 2013. <a href="http://bconnelly.net/working-with-csvs-on-the-command-line/">http://bconnelly.net/working-with-csvs-on-the-command-line/</a>. </span> </p>
<p>We will then learn how to use data tables to create a more complex database, which we will be able to query to see relationships between different variables.<label for='sql' class='margin-toggle sidenote-number'></label><input type='checkbox' id='sql' class='margin-toggle'/><span class='sidenote'>Sverdlov, Etel. “A Basic MySQL Tutorial.” DigitalOcean. Last modified June 12, 2012. <a href="https://www.digitalocean.com/community/tutorials/a-basic-mysql-tutorial">https://www.digitalocean.com/community/tutorials/a-basic-mysql-tutorial</a>. </span> </p>
</description>
<pubDate>Fri, 08 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/text/2016/07/08/lab-automation/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/text/2016/07/08/lab-automation/</guid>
<category>scripting</category>
<category>Bash</category>
<category>Pandoc</category>
<category>Text</category>
</item>
<item>
<title>Introduction to scripting</title>
<description><p>Today we will learn how to convert our plaintext Markdown into a PDF.
To do this we will have to install some more software, which may take a while. </p>
<p>In the mean time, we will cover some of the basics of scripting in order to be able to automate our workflows a little bit better.
<excerpt/></p>
<h1 id="texlive">Texlive</h1>
<p>The TeX (or LaTeX) distribution that we are using in conjunction with Pandoc is called &quot;TeXLive.&quot;
This software is in our CodeAnywhere container&#39;s repositories. </p>
<p>It is very large, so we are only going to install the base package to avoid running out of space on our CodeAnywhere containers.
We have a total of 2GB of space.
TeXLive should take up an extra ~650MB of space. </p>
<p>To check to see how much space we have, we can run this command in a terminal:</p>
<p><code>df -h</code></p>
<p>It will look something like this:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace$ df -h
Filesystem Size Used Avail Use% Mounted on
/vz/private/669416 2.0G 1.1G 953M 54% /
none 128M 4.0K 128M 1% /dev
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 26M 52K 26M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 128M 0 128M 0% /run/shm
none 100M 0 100M 0% /run/user
cabox@box-codeanywhere:~/workspace$
</code></pre></div>
<p>Make a mental note of the number in the first line of output.
I have 953MB available. </p>
<h2 id="installation">Installation</h2>
<p>Now, to get TeXLive installed on our CodeAnywhere containers, we&#39;ll need to use apt-get<label for='apt-get' class='margin-toggle sidenote-number'></label><input type='checkbox' id='apt-get' class='margin-toggle'/><span class='sidenote'>“AptGet/Howto.” Ubuntu Documentation - Community Help Wiki. <a href="https://help.ubuntu.com/community/AptGet/Howto">https://help.ubuntu.com/community/AptGet/Howto</a>. </span> just like we did for Pandoc:</p>
<p><code>sudo apt-get install texlive</code></p>
<p>That will output a bunch of stuff and tell us how much the installation will take up on disk.
See if the number is smaller than your available space and then type <code>Y</code> and hit enter if you have enough space.
If not, we&#39;ll have to clear something out so you can install it. </p>
<p>This may take a while. </p>
<p>If that command does not work, it is likely because you need to update your software source repositories and upgrade your installed software packages.
You can do that with two commands:</p>
<p><code>sudo apt-get update</code></p>
<p>This updates the sources. Follow it with;</p>
<p><code>sudo apt-get dist-upgrade</code></p>
<p>This actually downloads and installs updates to the already-installed software. </p>
<p>In the mean time, we can look at scripting. </p>
<h1 id="scripting">Scripting</h1>
<p>Scripting is fun when you get used to how it works.
It is also really useful for not having to repeat the same work over and over again. </p>
<p>Creating scripts is often a trial and error process, though, and can feel frustrating (see below).</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">What scripting feels like <a href="https://t.co/jYR9WQftIX">https://t.co/jYR9WQftIX</a></p>&mdash; SecuriTay (@SwiftOnSecurity) <a href="https://twitter.com/SwiftOnSecurity/status/749783791279939585">July 4, 2016</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<h2 id="how-to-do-it?">How to do it?</h2>
<p>In class today, we&#39;ll go over some basic scripting.
We&#39;ll use William Shotts&#39; tutorial.Writing Shell Scripts tutorial by William Shotts for tomorrow.<label for='shell-scripts' class='margin-toggle sidenote-number'></label><input type='checkbox' id='shell-scripts' class='margin-toggle'/><span class='sidenote'>Shotts, William, Jr. “Writing Shell Scripts.” LinuxCommand.org. Accessed July 6, 2016. <a href="http://linuxcommand.org/lc3_writing_shell_scripts.php">http://linuxcommand.org/lc3_writing_shell_scripts.php</a>. </span> </p>
<p>A basic Bash script is simply a list of commands in a file. </p>
<p>The lines in the file get executed sequentially and then at the end of the file Bash stops, or waits depending on what the file tells it to do. </p>
<h3 id="execution">Execution</h3>
<p>All shell scripts should be executable and specify what shell program will run them.
If this is not the case, then we have to call a shell in the command line in order to run. </p>
<p>To create a basic Bash shell script, just touch a new file and give it the extension <code>.sh</code>:</p>
<p><code>touch hello-world.sh</code></p>
<p>Then we&#39;ll make it executable:</p>
<p><code>chmod +x hello-world.sh</code></p>
<p>This command sets the execute flag for user, group, and anyone in the permissions.
They should look like this if you list the files with long output:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ls -lah
total 64K
drwxr-xr-x 3 cabox cabox 4.0K Jul 7 12:44 .
drwxrwxr-x 5 cabox cabox 4.0K Jul 7 10:02 ..
-rwxr-xr-x 1 cabox cabox 104 Jul 7 12:50 hello-world.sh
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<p>We also need to add a &quot;shebang&quot; to our script to tell Bash to execute this file. Inside the <code>hello-world.sh</code> file, put this on the first line:</p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
</code></pre></div>
<h3 id="echo">Echo</h3>
<p>Now, we&#39;re going to make this script say hello to the world and hello to us.
We do this by using the <code>echo</code> command, which outputs whatever you tell it as text to <code>STDOUT</code>.
Try it:</p>
<p><code>echo &quot;Hello, world!&quot;</code></p>
<p>You should see this in the command line:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ echo "Hello, world!"
Hello, world!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<p>Now we will add this command to our script: </p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
<span class="c"># Say "Hello, world!</span>
<span class="nb">echo</span> <span class="s2">"Hello, world!"</span>
</code></pre></div>
<p>Now when we run this script, we will see the same output as from our earlier <code>echo</code> command on <code>STDOUT</code>:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh
Hello, world!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<p>Note the <code>./</code> in front of the script&#39;s filename in the above command. </p>
<p>We have to do this in order to execute files and scripts that are not in our execute path.
The excecute path is just a list of directories from which we are allowed to execute files without jumping through some extra hoops.
To see your path, use the following command: </p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<p>These are all the directories from which you can execute a program. </p>
<p>Finally, we want to add a &quot;Hello!&quot; to ourselves, and use our GitHub username.
We will add the same command as before, substituting our GitHub username for world, so that the script looks like this (I&#39;ve used my own name here):</p>
<div class="highlight"><pre><code class="language-" data-lang=""><span class="c">#!/bin/bash</span>
<span class="c"># Say "Hello, world!</span>
<span class="nb">echo</span> <span class="s2">"Hello, world!"</span>
<span class="c"># Say "Hello, YOU!</span>
<span class="nb">echo</span> <span class="s2">"Hello, jdmar3!"</span>
</code></pre></div>
<p>Now if we run this, we will see all of this output:</p>
<div class="highlight"><pre><code class="language-" data-lang="">cabox@box-codeanywhere:~/workspace/helper-scripts$ ./hello-world.sh
Hello, world!
Hello, jdmar3!
cabox@box-codeanywhere:~/workspace/helper-scripts$
</code></pre></div>
<h2 id="tomorrow">Tomorrow</h2>
<p>We will pick up where we left off tomorrow during lab and get to work creating our scripts for the next assignment.
Please review the tutorial on writing shell scripts again.<label for='shell-scripts' class='margin-toggle sidenote-number'></label><input type='checkbox' id='shell-scripts' class='margin-toggle'/><span class='sidenote'>Shotts, William, Jr. “Writing Shell Scripts.” LinuxCommand.org. Accessed July 6, 2016. <a href="http://linuxcommand.org/lc3_writing_shell_scripts.php">http://linuxcommand.org/lc3_writing_shell_scripts.php</a>. </span>
Also, we&#39;ll use another tutorial so that we can learn to create prompts for user input.<label for='user-input' class='margin-toggle sidenote-number'></label><input type='checkbox' id='user-input' class='margin-toggle'/><span class='sidenote'>Chadwick, Ryan. “User Input - Bash Scripting Tutorial.” Ryan’s Tutorials. <a href="http://ryanstutorials.net/bash-scripting-tutorial/bash-input.php">http://ryanstutorials.net/bash-scripting-tutorial/bash-input.php</a>. </span></p>
</description>
<pubDate>Thu, 07 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/text/2016/07/07/scripting/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/text/2016/07/07/scripting/</guid>
<category>Pandoc</category>
<category>PDF</category>
<category>LaTeX</category>
<category>Bash</category>
<category>Scripts</category>
<category>Text</category>
</item>
<item>
<title>Single input, multiple outputs</title>
<description><p>We&#39;ve briefly discussed Pandoc now.
It bills itself as the &quot;Universal Document Converter.&quot;
This is reasonably true, but it might require some creative combinations of switches within Pandoc commands as well as multiple commands strung together or intermediate commands to get the desired output. </p>
<p>The benefit of troubleshooting and understanding this process is that once we do, we can more easily optimize our conversions and automate them.
We&#39;ll talk more about this as we go forward.
<excerpt/></p>
<h1 id="source-to-output-conversion">Source to output conversion</h1>
<p>It is possible to use GUI tools to create and convert documents.
Support is somewhat limited, but in LibreOffice, we can at least create a PDF from our ODT and DOCX files. </p>
<p>We can also manipulate the styles of the headers and other structural elements that we have assigned using Markdown in our GUI editors. </p>
<p>One convenient effect of starting with plaintext marked up with Markdown is that we have those structural elements when we convert them into another format and then edit them elsewhere. It is certainly possible to start in the GUI editor and define the same things, but after becoming acquainted with Markdown, it should feel somewhat more burdensome to use the GUI. Arguably, it is. There is a great deal more that goes into a DOCX or an ODT file, structurally, than in a plaintext file with Markdown in it. </p>
<p>We also have the disadvantage of only being able to operate on thost files in limited ways on headless or remote systems. </p>
<p>If we keep plaintext at the core of our workflows and GUI editors toward the periphery, we will be served in the end as we will always have access to our work, on any system, without any huge barriers to editing and changing. </p>
<h2 id="commands">Commands</h2>
<p>We&#39;re going to practice some conversions using Pandoc today. We will also work in groups. </p>
<p>First, we need to get some files. </p>
<p>Fork and clone this repository into your CodeAnywhere container:</p>
<p><a href="https://github.com/inls161/pandoc-practice">https://github.com/inls161/pandoc-practice</a></p>
<p>Once you have the files in your CodeAnywhere container, I will show you some things in class and then in your groups you will answer and mark up the <code>example.md</code> file using the instructions in the file. </p>
<p><strong><em>When you are finished with the questions in file:</em></strong> </p>
<ol>
<li>I want you to change the name of the file to your GitHub username. </li>
<li>You will then work as a group to convert the file to HTML, DOCX, and ODT formats, per the instructions in class. I also want you to open the files on your lab computers so you can see what you have done. </li>
<li>Then I want you to add, commit, and push your changes. </li>
<li>Finally, you will create a pull request in GitHub to get these files back into my original repository.<br></li>
</ol>
<h2 id="basic-pandoc-commands">Basic Pandoc commands</h2>
<p><label for='pandoc-commands' class='margin-toggle'> &#8853;</label><input type='checkbox' id='pandoc-commands' class='margin-toggle'/><span class='marginnote'>All Pandoc commands are documented here: <a href="http://pandoc.org/README.html">http://pandoc.org/README.html</a><br/><br/>A good set of example commands exists here: <a href="http://pandoc.org/demos.html">http://pandoc.org/demos.html</a> </span></p>
<p>Convert a Markdown file to HTML:</p>
<p><code>pandoc -o example.html example.md</code></p>
<p>Pandoc reads the filetype from the extension in normal usage.
If you want to convert a file directly from a URL, you will have to specify the filetype, like this:</p>
<p><code>pandoc -f html -t markdown http://inls161.johndmart.in/syllabus/</code></p>
<p>You can make sure that certain things, like quotes and em-dashes, get read and formatted propery by specifying the &quot;Smart&quot; switch (a capital <code>-S</code> or <code>--smart</code>):</p>
<p><code>pandoc -S -o example.html example.md</code></p>
<p>There are a host of other commands in the documentation. Be sure to try them out. </p>
<h2 id="specific-file-commands">Specific file commands</h2>
<p>Convert your markdown to HTML:</p>
<p><code>pandoc -o example.html exampld.md</code></p>
<p>If you wish to convert to a DOCX or ODT file:</p>
<p><code>pandoc -o example.docx example.md</code></p>
<p><code>pandoc -o example.odt example.md</code></p>
<p>If you wish to convert between two different word processor filetypes, we might have to get a little creative.
We learned in class that if we issue the following command, we get errors related to file encoding and the conversion will not work. </p>
<p><code>pandoc -o example.docx example.odt</code></p>
<p>If, however, we add an intermediary step, say through HTML, we can get the output that we want. Try it like this instead: </p>
<p><code>pandoc -o example.docx example-tmp.html &amp;&amp; pandoc -o example.odt example-tmp.html</code></p>
<p>This preserves the formatting and extracts the text from the DOCX as an HTML file and then converts that HTML into ODT.
We do not have the weird encoding errors this way, and we don&#39;t have to mess with pipes. </p>
<p>Filter a document through a template file:</p>
<p><code>pandoc -S --reference-docx=FILE -o example.docx example.md</code> </p>
<p>In the above command, you need to specify the location of the template file.
If it is a file called <code>template.docx</code> and is located in the same directory as your Markdown source, then the command will be:</p>
<p><code>pandoc -S --reference-docx=./template.docx -o example.docx example.md</code></p>
<p>You can also use an ODT or OTT for reference:</p>
<p><code>pandoc -S --reference-odt=./template.ott -o example.odt example.md</code></p>
<h1 id="for-tomorrow">For tomorrow</h1>
<p>Tomorrow, we are going to learn one more output format and then learn how to script all of our outputs together so that we can save ourselves time. </p>
<p>I would like you to go through the Writing Shell Scripts tutorial by William Shotts for tomorrow.<label for='shell-scripts' class='margin-toggle sidenote-number'></label><input type='checkbox' id='shell-scripts' class='margin-toggle'/><span class='sidenote'>Shotts, William, Jr. “Writing Shell Scripts.” LinuxCommand.org. Accessed July 6, 2016. <a href="http://linuxcommand.org/lc3_writing_shell_scripts.php">http://linuxcommand.org/lc3_writing_shell_scripts.php</a>. </span>
This will show you the basics of scripting.
The scripts that we will write will be very, very simple, but it is good to have looked over this before we start. </p>
</description>
<pubDate>Wed, 06 Jul 2016 00:00:00 -0400</pubDate>
<link>http://inls161.johndmart.in/text/2016/07/06/single-multiple-outputs/</link>
<guid isPermaLink="true">http://inls161.johndmart.in/text/2016/07/06/single-multiple-outputs/</guid>
<category>pandoc</category>
<category>text</category>
<category>conversion</category>
<category>LaTeX</category>
<category>Text</category>
</item>
<item>
<title>Text Conversion</title>
<description><h1 id="plaintext,-markup,-and-formatted-text">Plaintext, markup, and formatted text</h1>
<p>This week we will discuss the uses of plaintext and markup for creating formatted documents. </p>
<p>Today we are going to briefly introduce Pandoc and then tomorrow we are going to see what it can really do with our documents when we learn some interesting switches and tricks. We will also some hand-on-keyboards excercises that will demonstrate the power of using marked-up plaintext for creating formatted documents.
<excerpt/></p>
<p>All of the tools and tasks that we cover in this course can be used to make our lives easier and our workflows simpler.
They can also be used to make our lives harder.
That is not the intent of this course.
We&#39;re here to learn about new and flexible ways of completing tasks that involve the communication of information. </p>
<p>Inevitably, someone (usually a boss or instructor) will demand that you use a specific tool to complete a task.
On that day, knowing what you know after taking this class, you will be able to not only suggest alternatives, but you will also be able to make a compelling argument as to why those alternatives are better, more flexible and will save work in the long run. </p>
<p>If you are particularly talented or skilled, you will go off on your own and use whatever workflow you want and then produce the output requested by your boss or instructor.
They never have to know the difference and can be blissfully unaware of the the technical prowess that went into creating the Word document that they only want so that they can print onto dead trees and then give back to you with red pen marks all over it. </p>
<p>The triumph in learning these tools is that you will know that there is a better way, and you will use that way whenever you can.
And in the end, when you are in charge, you can set the workflows and toolchains and collaboration environments. </p>
<p>You will be equipped to do that. </p>
<h2 id="down-with-word,-up-with-creativity">Down with Word, Up with Creativity</h2>
<p>You should have read for today the elegant and logical rant about why MicroSoft Word has to by by science-fiction author Charlie Stross.
Stross believes that tools like Word stifle creativity by shackling you to an interface.<label for='word' class='margin-toggle sidenote-number'></label><input type='checkbox' id='word' class='margin-toggle'/><span class='sidenote'>Stross, Charlie. “Why Microsoft Word Must Die.” Charlie’s Diary. Last modified October 12, 2013. <a href="http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html">http://www.antipope.org/charlie/blog-static/2013/10/why-microsoft-word-must-die.html</a>. </span></p>
<p>I believe the same thing.
Don Knuth, famed computer scientist, believed the same thing about writing and typesetting for scientists in the 1970s.
It is because of his disdain for the arduous process involved in getting your work to print that he invented something called TeX.
He believed that TeX freed writers and allowed them to return to their content and not worry so much about presentation. </p>
<p>We do not have enough time in this class to cover TeX, but we can discuss it briefly in this week&#39;s sessions in relation to using Pandoc for creating PDFs from our document sources. </p>