-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathweek9.html
205 lines (198 loc) · 40.8 KB
/
week9.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Week 9</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="css/bootstrap.min.css" rel="stylesheet">
<link href="css/custom.css" rel="stylesheet">
</head>
<body class="markdown github">
<header class="navbar-inverse navbar-fixed-top">
<div class="container">
<nav role="navigation">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="index.html" class="navbar-brand">J298 Data Journalism</a>
</div> <!-- /.navbar-header -->
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Class notes<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="week1.html">What is data?</a></li>
<li><a href="week2.html">Types of stories</a></li>
<li><a href="week3.html">Working with spreadsheets</a></li>
<li><a href="week4.html">Acquiring, cleaning, and formatting data</a></li>
<li><a href="week5.html">R, RStudio, and the tidyverse</a></li>
<li><a href="week6.html">Data journalism in the tidyverse</a></li>
<li><a href="week7.html">Don't let the data lie to you</a></li>
<li><a href="week8.html">Databases and SQL</a></li>
<li><a href="week9.html">Finding stories using maps</a></li>
<li><a href="week10.html">Maps meet databases</a></li>
<li><a href="week11.html">More fun with R</a></li>
<li><a href="week12.html">R practice</a></li>
<li><a href="week13.html">PostGIS practice</a></li>
<li><a href="week14.html">More fun with R</a></li>
</ul>
</li>
<li><a href="software.html">Software</a></li>
<li><a href="datasets.html">Data</a></li> <li><a href="questions.html">If you get stuck</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Email instructors<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="mailto:[email protected]">Peter Aldhous</a></li>
<li><a href="mailto:[email protected]">Amanda Hickman</a></li>
</ul>
</li>
</ul>
</div><!-- /.navbar-collapse -->
</nav>
</div> <!-- /.navbar-header -->
</header>
<div class="container all">
<h2 id="week-9-march-15-2018">Week 9 | March 15, 2018</h2>
<p><em>Instructor: Amanda Hickman</em></p>
<h1 id="10-min-data-of-the-week">10 Min: Data of the Week</h1>
<p>Carlos & Nikka</p>
<p>Let's take a few minutes to capture and organize <a href="https://docs.google.com/spreadsheets/d/11JLkkyWZf3fvVz3aebgMjcZ6mxV-j5Gw7hEpeiPAGY4/edit#gid=0">the presentation schedule</a>.</p>
<h1 id="sql-what-was-tricky-">SQL: What was tricky?</h1>
<p>We're going to spend the next three weeks on SQL and mapping. I want to look at where folks got stuck and how to get unstuck.</p>
<p>Charlotte had a fun one this week, and I want to walk through some of my troubleshooting:</p>
<ul>
<li>I didn't tell you what the redirect operator <code>></code> does. Did anyone try it? What does it do?</li>
<li>Some of you tried running that in Postico, instead of in the Terminal. What's the difference?</li>
<li>Some of you tried to run "snippet.csv" -- why wouldn't that work?</li>
<li>Here's an example of a question that I immediately knew the answer to. What does that Error mean?:</li>
</ul>
<pre class="editor-colors lang-"><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>I tried loading the three datasets I have from AHS into Postico just to view the tables and I got:</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span> </span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>"ERROR: relation "household" does not exist"</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span> </span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>This was the code I used to load the data:</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span> </span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>COPY household FROM '/Users/Charlotte/Desktop/Spring 2018/Data/AHS data/AHS files/household.csv' DELIMITER ',' CSV HEADER;</span></span></div></pre><ul>
<li>Which one is the Postgres table? Which is the csv file?</li>
<li><p>And then we encountered a new error. What clues do you see in this error? I had to look at the <a href="https://www.postgresql.org/docs/9.5/static/sql-copy.html">Posgres COPY</a> documentation to resolve it.</p>
<p> ERROR: invalid input syntax for integer: "'11000001'"
CONTEXT: COPY household_snip, line 2, column CONTROL: "'11000001'"</p>
</li>
<li><p>If you're curious, here's a data sample:</p>
</li>
</ul>
<pre class="editor-colors lang-CSV"><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>CONTROL,TOTROOMS,TOTHCAMT,PERPOVLVL,JACPRIMARY,JACSECNDRY,JADEQUACY,JARTACCESS,JARTATTRACT,JARTAWARE,JARTECON,</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>'11000001',7,'14',501,'0','0','2','0','0','0','0',</span></span></div></pre>
<h2 id="creating-new-databases">Creating New databases</h2>
<p>This might be easy. Try just running this in the Postico SQL Query pane:</p>
<pre class="editor-colors lang-SQL"><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--meta syntax--create syntax--sql"><span class="syntax--keyword syntax--other syntax--create syntax--sql"><span>CREATE</span></span><span> </span><span class="syntax--keyword syntax--other syntax--sql"><span>DATABASE</span></span><span> </span><span class="syntax--entity syntax--name syntax--function syntax--sql"><span>week9</span></span></span><span>;</span></span></div></pre>
<p>If that doesn't work, then we get to do some fancy footwork. One more reason to learning to love the command line. The Postgres documentation has a walk through for enabling <a href="http://postgresapp.com/documentation/cli-tools.html">command line tools</a> but what did I say last week about <code>sudo</code>?</p>
<p>So we're going to use their second option.</p>
<pre class="editor-colors lang-sh"><div class="line"><span class="syntax--source syntax--shell"><span class="syntax--support syntax--function syntax--builtin syntax--shell"><span>echo</span></span><span> </span><span class="syntax--variable syntax--other syntax--normal syntax--shell"><span class="syntax--punctuation syntax--definition syntax--variable syntax--shell"><span>$</span></span><span>path</span></span></span></div><div class="line"><span class="syntax--source syntax--shell"><span>which psql</span></span></div><div class="line"><span class="syntax--source syntax--shell"><span class="syntax--support syntax--function syntax--builtin syntax--shell"><span>cd</span></span></span></div><div class="line"><span class="syntax--source syntax--shell"><span>nano .profile</span></span></div></pre>
<p>You're going to edit your <code>.profile</code> file to add this line:</p>
<p><code>export PATH="$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin"</code></p>
<p>Use <kbd>ctrl</kbd><kbd>o</kbd> then <kbd>ctrl</kbd><kbd>x</kbd> to save.</p>
<pre class="editor-colors lang-sh"><div class="line"><span class="syntax--source syntax--shell"><span>which psql</span></span></div><div class="line"><span class="syntax--source syntax--shell"><span class="syntax--support syntax--function syntax--builtin syntax--shell"><span>source</span></span><span> </span><span class="syntax--keyword syntax--operator syntax--tilde syntax--shell"><span>~</span></span><span>/.profile</span></span></div><div class="line"><span class="syntax--source syntax--shell"><span>which psql</span></span></div><div class="line"><span class="syntax--source syntax--shell"><span>psql</span></span></div></pre>
<p>That should get you into a postgres command prompt. Then we need to run <code>CREATE DATABASE week9;</code> in the psql command prompt. Use <code>\q</code> to exit.</p>
<h2 id="more-homework-troubleshooting">More Homework Troubleshooting</h2>
<ul>
<li>Josh had a question about <a href="https://www.postgresql.org/docs/9.5/static/functions-formatting.html">number formats</a>. Go ahead and <a href="https://drive.google.com/file/d/1qyVKQkpmJ1Tj4A0_He2rU_E1blcLkELg/view">grab his data</a>, create a table for it, and import the data into the table.</li>
</ul>
<pre class="editor-colors lang-sql"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--meta syntax--create syntax--sql"><span class="syntax--keyword syntax--other syntax--create syntax--sql"><span>CREATE</span></span><span> </span><span class="syntax--keyword syntax--other syntax--sql"><span>TABLE</span></span><span> </span><span class="syntax--entity syntax--name syntax--function syntax--sql"><span>forfeiture</span></span></span><span> (</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> admin_number </span><span class="syntax--storage syntax--type syntax--sql"><span>VARCHAR</span></span><span>(</span><span class="syntax--constant syntax--numeric syntax--sql"><span>17</span></span><span>)</span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> docket_number </span><span class="syntax--storage syntax--type syntax--sql"><span>VARCHAR</span></span><span>(</span><span class="syntax--constant syntax--numeric syntax--sql"><span>24</span></span><span>)</span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> amount_forfeit </span><span class="syntax--storage syntax--type syntax--sql"><span>FLOAT</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> date_dispersed </span><span class="syntax--storage syntax--type syntax--sql"><span>DATE</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> recipient </span><span class="syntax--storage syntax--type syntax--sql"><span>VARCHAR</span></span><span>(</span><span class="syntax--constant syntax--numeric syntax--sql"><span>26</span></span><span>)</span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> amount_dispersed </span><span class="syntax--storage syntax--type syntax--sql"><span>FLOAT</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> county </span><span class="syntax--storage syntax--type syntax--sql"><span>VARCHAR</span></span><span>(</span><span class="syntax--constant syntax--numeric syntax--sql"><span>14</span></span><span>)</span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> year </span><span class="syntax--storage syntax--type syntax--sql"><span>INTEGER</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span></span></div><div class="line"><span class="syntax--source syntax--sql"><span>);</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span>COPY {tablename} </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> {file </span><span class="syntax--storage syntax--type syntax--sql"><span>path</span></span><span>} CSV HEADER;</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> county, year, </span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit)</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>GROUP BY</span></span><span> county, year </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>ORDER BY</span></span><span> county;</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> county, recipient, </span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit)</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>WHERE</span></span><span> county </span><span class="syntax--keyword syntax--operator syntax--comparison syntax--sql"><span>=</span></span><span> </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>Los Angeles</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span> </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>AND</span></span><span> recipient </span><span class="syntax--keyword syntax--operator syntax--comparison syntax--sql"><span>=</span></span><span> </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>GENERAL FUND</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>GROUP BY</span></span><span> county, recipient </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>ORDER BY</span></span><span> county;</span></span></div></pre>
<p>The second problem is super quirky. The short version is <a href="https://www.postgresql.org/docs/9.5/static/datatype-numeric.html">math is hard</a> even for computers. And honestly I'm a little embarrassed that I've just never bothered to dig into this, so the best I can do is wave my hands around and mumble about how computers store numeric data and optimization. In fairness to myself the Postgres manual isn't a lot more specific:</p>
<blockquote>
<p>Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing and retrieving a value might show slight discrepancies. Managing these errors and how they propagate through calculations is the subject of an entire branch of mathematics and computer science and will not be discussed here</p>
</blockquote>
<p>What I can tell you is what your options are for solving the problem.</p>
<p>There are a few different options. You can use <a href="https://www.postgresql.org/docs/9.5/static/functions-formatting.html">formatting</a> to customize the number, you can <a href="https://www.postgresql.org/docs/9.5/static/typeconv-func.html">round it off</a> or, you cast as <a href="https://www.postgresql.org/docs/9.5/static/datatype-money.html">money</a>.</p>
<pre class="editor-colors lang-SQL"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> county, year, </span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit::</span><span class="syntax--storage syntax--type syntax--sql"><span>numeric</span></span><span>)</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>GROUP BY</span></span><span> county, year </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>ORDER BY</span></span><span> county;</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--comment syntax--block syntax--sql"><span class="syntax--punctuation syntax--definition syntax--comment syntax--sql"><span>/*</span></span><span> You might want actual currency: </span><span class="syntax--punctuation syntax--definition syntax--comment syntax--sql"><span>*/</span></span></span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> county, year, </span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit::</span><span class="syntax--storage syntax--type syntax--sql"><span>numeric</span></span><span>::</span><span class="syntax--storage syntax--type syntax--sql"><span>money</span></span><span>)</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>GROUP BY</span></span><span> county, year </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>ORDER BY</span></span><span> county;</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--comment syntax--block syntax--sql"><span class="syntax--punctuation syntax--definition syntax--comment syntax--sql"><span>/*</span></span><span> We're smart, so we can figure out how to make this work: </span><span class="syntax--punctuation syntax--definition syntax--comment syntax--sql"><span>*/</span></span></span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> county, year, to_char(</span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit), </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>9999.00</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span>) </span><span class="syntax--keyword syntax--other syntax--alias syntax--sql"><span>as</span></span><span> total</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>GROUP BY</span></span><span> county, year </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>ORDER BY</span></span><span> total </span><span class="syntax--keyword syntax--other syntax--order syntax--sql"><span>DESC</span></span><span>;</span></span></div></pre>
<p>I have a bigger question though: should we be taking the sum of all <code>amount_forfeit</code> to begin with? Try these two queries and let's discuss.</p>
<pre class="editor-colors lang-SQL"><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> </span><span class="syntax--keyword syntax--operator syntax--star syntax--sql"><span>*</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>WHERE</span></span><span> admin_number </span><span class="syntax--keyword syntax--operator syntax--comparison syntax--sql"><span>=</span></span><span> </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>13-121J</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span>;</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span></span></div><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> </span><span class="syntax--support syntax--function syntax--aggregate syntax--sql"><span>SUM</span></span><span>(amount_forfeit) </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> forfeiture </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>WHERE</span></span><span> admin_number </span><span class="syntax--keyword syntax--operator syntax--comparison syntax--sql"><span>=</span></span><span> </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>13-121J</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span>;</span></span></div></pre>
<h1 id="digging-into-mapping">Digging into Mapping</h1>
<p>Cartography and GIS aren't the same thing. We're talking about very basic maps as visualizations here. <a href="http://maptime.io/lessons-resources/">More on that</a></p>
<h2 id="shapes-points-lines">Shapes, Points, Lines</h2>
<h3 id="mapping-points">Mapping Points</h3>
<p>"Geocoding" refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location's latitude and longitude. <code>219 West 40th Street</code> means nothing without coordinates.</p>
<p>Geocoding is actually challenging because there aren't good, free resources for doing batch jobs, where many addresses are geocoded at once. My <a href="https://github.com/amandabee/cunyjdata/wiki/Tip-Sheet:-Geocoding">Geocoding Tip Sheet</a> includes some helpful resources, but many city data sources actually include coordinates, so double check that, first.</p>
<p>If you're committed to mapping points, you may need my help geocoding them.</p>
<h3 id="mapping-lines">Mapping Lines</h3>
<p>No student has ever pitched me a compelling map that features lines rather than shapes or points. I did a project that drew out flight maps showing how far from home every prisoner incarcerated in Florence, CO is, but I pitched that, so it doesn't count. To draw that map I had to take a <a href="http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/">crash course</a> in rendering lines. If you're excited about doing something like this, great! But you're going to need to install R and walk through Nathan Yau's tutorial before you do anything else.</p>
<h3 id="mapping-polygons">Mapping Polygons</h3>
<p>Zipcodes, council districts, police precincts -- these are all polygons. Most of your maps will be in polygons. These polygons are defined in (usually) one of two specialized file formats -- a "Shapefile" or a "KML" file. The syntax of the file types varies, but they contain basically the same information -- the polygon called "Bronx CB 04" is defined by this series of lat/lon pairs. My <a href="https://github.com/amandabee/cunyjdata/wiki/Where-to-Find-Shapefiles">Shapefiles Tip Sheet</a> has some excellent resources for finding shapefiles though a lot of the resources there are New York City specific.</p>
<p>Often (usually) your data won't include a shapefile. If you have High School graduation rates by school districts, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.</p>
<h2 id="projections">Projections</h2>
<p>We don't deal with <a href="http://xkcd.com/977/">projections</a> much but they matter. And if you have inconsistent projections you might end up with a map where the city of <a href="https://amandabee.carto.com/viz/d42d245a-5aa2-11e5-ba80-0e853d047bba/public_map">San Francisco is floating about 10 miles NE of where it belongs</a>. I had to <a href="https://gis.stackexchange.com/questions/162779/why-is-the-city-of-san-francsico-floating-over-point-richmond">ask for help</a> to resolve that.</p>
<p>Most of the time you're going to be in WGS84.</p>
<p>The Wikipedia article on <a href="https://en.wikipedia.org/wiki/Web_Mercator">web mercator</a> is pretty good if you're dying to understand how this all fits together, but EPSG is an obsolete acronym for European Petroleum Survey Group a scientific research group with ties to the petroleum industry. They compiled a comprehensive database of projections and coordinate systems. </p>
<h2 id="vectors-and-rasters">Vectors and Rasters</h2>
<p>We can head down some serious rabbit holes here. But it is worth understanding the distinction between a <a href="https://en.wikipedia.org/wiki/Vector_graphics">vector</a> and a <a href="https://en.wikipedia.org/wiki/Raster_graphics">raster</a>, because they'll come up again and again in all kinds of contexts.</p>
<h2 id="shapefiles">Shapefiles</h2>
<p>What is a shapefile? There are a few different widely used formats for storing geographic information. ESRI makes ArcGIS which is popular and expensive. Their shapefile format is almost universal. Google Maps uses it's own KML format.</p>
<h2 id="asking-for-help">Asking for help</h2>
<p>I can't say enough about the importance of learning how to ask for help. If you look at my <a href="https://gis.stackexchange.com/users/24497/amanda?tab=questions&sort=newest">gis.stackexchange.com</a> profile you can see where I got stuck and then unstuck, starting back in 2013. There's a community there that is very good about <a href="https://gis.stackexchange.com/questions/84443/what-is-this-postgis-query-doing-to-show-great-circle-connections">thoroughly explaining</a> what you're dealing with.</p>
<h1 id="open-qgis">Open QGIS</h1>
<p>Okay, so let's actually do some mapping.</p>
<p>In 2011, the BLS published a <a href="https://www.bls.gov/opub/btn/volume-2/death-on-the-job-fatal-work-injuries-in-2011.htm">map of fatal workplace injuries</a> by state.</p>
<p>What do we think of this map?</p>
<ul>
<li>Are these colors continuous or categorical? Should they be? Is anyone surprised that CA and TX have a lot of workplace injuries? How can we improve on this?</li>
</ul>
<p>So we'll recreate it:</p>
<ul>
<li><a href="data/week9/">BLS Fatality Data</a></li>
<li><a href="https://www.census.gov/popest/data/state/totals/2011/tables/NST-EST2011-01.csv">2011 Population Estimates</a> (via <a href="http://www.census.gov/popest/data/historical/2010s/vintage_2011/state.html">census.gov</a>)</li>
</ul>
<p>I already combined the Census 2011 population estimates with the BLS workplace fatality data. We're actually going to use Postgres to do the first piece of this -- we could do it just as easily in <a href="https://www.libreoffice.org/discover/calc/">a spreadsheet</a> but we need the SQL practice.</p>
<pre class="editor-colors lang-SQL"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--meta syntax--create syntax--sql"><span class="syntax--keyword syntax--other syntax--create syntax--sql"><span>CREATE</span></span><span> </span><span class="syntax--keyword syntax--other syntax--sql"><span>TABLE</span></span><span> </span><span class="syntax--entity syntax--name syntax--function syntax--sql"><span>bls_fatalities_2011</span></span></span><span> (</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> state </span><span class="syntax--storage syntax--type syntax--sql"><span>character varying</span></span><span>(</span><span class="syntax--constant syntax--numeric syntax--sql"><span>20</span></span><span>)</span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> fatalities </span><span class="syntax--storage syntax--type syntax--sql"><span>integer</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> population </span><span class="syntax--storage syntax--type syntax--sql"><span>integer</span></span><span> </span><span class="syntax--keyword syntax--other syntax--DDL syntax--create syntax--II syntax--sql"><span>NOT NULL</span></span></span></div><div class="line"><span class="syntax--source syntax--sql"><span>);</span></span></div></pre>
<p>Do you remember how to load data into a table?</p>
<p>And how are we going to normalize this?</p>
<pre class="editor-colors lang-SQL"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--meta syntax--alter syntax--sql"><span class="syntax--keyword syntax--other syntax--create syntax--sql"><span>ALTER</span></span><span> </span><span class="syntax--keyword syntax--other syntax--table syntax--sql"><span>TABLE</span></span><span> </span><span class="syntax--entity syntax--name syntax--function syntax--sql"><span>bls_fatalities_2011</span></span></span><span> ADD COLUMN fatality_rate </span><span class="syntax--storage syntax--type syntax--sql"><span>float</span></span><span>;</span></span></div></pre>
<p>And then try out a query to fill that rate column:</p>
<pre class="editor-colors lang-sql"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SELECT</span></span><span> </span><span class="syntax--keyword syntax--operator syntax--star syntax--sql"><span>*</span></span><span>,</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> ((fatalities::</span><span class="syntax--storage syntax--type syntax--sql"><span>float</span></span><span class="syntax--keyword syntax--operator syntax--math syntax--sql"><span>/</span></span><span>population)</span><span class="syntax--keyword syntax--operator syntax--star syntax--sql"><span>*</span></span><span class="syntax--constant syntax--numeric syntax--sql"><span>100000</span></span><span>) </span><span class="syntax--keyword syntax--other syntax--alias syntax--sql"><span>AS</span></span><span> rate</span></span></div><div class="line"><span class="syntax--source syntax--sql"><span> </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>FROM</span></span><span> bls_fatalities_2011</span></span></div></pre>
<p>And then actually do it:</p>
<pre class="editor-colors lang-sql"><div class="line"><span class="syntax--source syntax--sql"><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>UPDATE</span></span><span> bls_fatalities_2011 </span><span class="syntax--keyword syntax--other syntax--DML syntax--sql"><span>SET</span></span><span> fatality_rate </span><span class="syntax--keyword syntax--operator syntax--comparison syntax--sql"><span>=</span></span><span> ((fatalities::</span><span class="syntax--storage syntax--type syntax--sql"><span>float</span></span><span class="syntax--keyword syntax--operator syntax--math syntax--sql"><span>/</span></span><span>population)</span><span class="syntax--keyword syntax--operator syntax--star syntax--sql"><span>*</span></span><span class="syntax--constant syntax--numeric syntax--sql"><span>100000</span></span><span>);</span></span></div></pre>
<p>Why did we have to re-cast it? Well, <a href="https://dba.stackexchange.com/questions/200320/what-am-i-doing-wrong-with-my-math">because</a>.
And then output it:</p>
<pre class="editor-colors lang-sql"><div class="line"><span class="syntax--source syntax--sql"><span>COPY bls_fatalities_2011 TO </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>~/Desktop/bls_normalized.csv</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span> DELIMITER </span><span class="syntax--string syntax--quoted syntax--single syntax--sql"><span class="syntax--punctuation syntax--definition syntax--string syntax--begin syntax--sql"><span>'</span></span><span>,</span><span class="syntax--punctuation syntax--definition syntax--string syntax--end syntax--sql"><span>'</span></span></span><span> CSV HEADER;</span></span></div></pre>
<h3 id="adding-a-shapefile">Adding a shapefile</h3>
<p>To actually map this, we need some states. Who keeps track of US State boundaries? <a href="https://www.census.gov/geo/maps-data/data/tiger.html">The Census</a>. You want "Cartographic Boundary Shapefiles" > "States". The state boundaries don't actually change, so it doesnt matter which year.</p>
<p>For our purposes 1:20,000,000 is plenty of resolution.</p>
<p>You should be able to load the zip file in as a layer.</p>
<p>Why does it look all squished? Once upon a time <a href="https://gis.stackexchange.com/questions/167181/why-would-an-svg-output-from-cartodb-look-squished-when-the-map-doesnt">I asked about that</a>, too. The answer is kind of cool. If we use the toggle on the bottom right to switch to "EPSG 54004" we get something that looks a little more familiar. </p>
<h3 id="loading-a-basemap">Loading a basemap</h3>
<p>You need a basemap. The "tile map scale plugin" -- does a nice job of automatically zooming you to an available tile layer, which the other base map plugins don't do.</p>
<p>So go ahead and download the plugin. <code>Plugins > Manage and Install Plugins ...</code> and search for "Tile Map Scale"</p>
<p><img alt="adding a layer" src="img/week9_01.png"></p>
<p>That will give you a tiny pulldown on the map that lets you add a base layer so you can see where you are in the world.</p>
<h3 id="qgis-built-in-join">QGIS Built In Join</h3>
<p>Add your CSV to QGIS (it has no geometry.)</p>
<p>Look at the Attribute table for the Shapefile. How are we going to join this?</p>
<p>And then style it.</p>
<ul>
<li><p>Use equal intervals. <a href="https://gis.stackexchange.com/questions/84562/am-i-misunderstanding-equal-interval">I found a bug once</a> which is another reason to ask for help. They had no idea.</p>
</li>
<li><p>Make compound labels with the <code>||</code> operator. Eg. <code>NAME || '\n' || "bls_fatalities_2011_Workplace Fatalities 2011"</code></p>
</li>
</ul>
<h2 id="keep-going-with-qgis">Keep going with QGIS</h2>
<p>This series is highly recommended:
<a href="https://www.youtube.com/watch?v=Pf9cYvaCYWA&index=3&list=PL7HotvlLKHCs9nD1fFUjSOsZrsnctyV2R">https://www.youtube.com/watch?v=Pf9cYvaCYWA&index=3&list=PL7HotvlLKHCs9nD1fFUjSOsZrsnctyV2R</a></p>
<h1 id="advanced-query">Advanced query</h1>
<p>We probably won't get to this. And this week all we're going to do is talk it through. We're not going to tackle it.</p>
<p>In 2006, California voters passed <a href="http://www.lao.ca.gov/ballot/2006/83_11_2006.htm">Prop 83</a>, which requires registered sex offenders to live at least 2000 feet from any school or playground. In 2015, the state supreme court said the <a href="https://www.nbclosangeles.com/news/local/California-Loosens-Sex-Offender-Residency-Restrictions-297740931.html">blanket restriction was too broad</a> and the law could only be applied to offenders whose crimes involved children.</p>
<p>Sex offender restrictions buffered around schools to show how much of a city is off limits.</p>
<ul>
<li>Walk through how we would do it, in theory. What do we need?</li>
</ul>
<pre class="editor-colors lang-"><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>+ location of every public school in the county</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>+ location of every private school in the county</span></span></div><div class="line"><span class="syntax--text syntax--plain syntax--null-grammar"><span>+ location of every playground in the county</span></span></div></pre><p>Alameda publishes at least the schools: <a href="https://data.acgov.org/Education/Alameda-County-Schools/yza6-6jwu">https://data.acgov.org/Education/Alameda-County-Schools/yza6-6jwu</a></p>
<p>Then we're going to need a way to calculate circles around those points. I want you to take five minutes to think of a Google search that might let you get at that. Put your search terms in the Etherpad.</p>
<p>More applications of this? If you aren't following SB 827 you should be. So how would you map the impact of that bill? <a href="https://transitrichhousing.org/">https://transitrichhousing.org/</a> tried.</p>
<h1 id="homework">Homework</h1>
<p>See bCourses</p>
<p>I keep a <a href="https://github.com/amandabee/CUNY-SOJ-data-storytelling/wiki/Where-to-Find-Shapefiles">list of shapefile sources</a>, and a <a href="https://github.com/amandabee/CUNY-data-storytelling/wiki/Tip-Sheet:-Geocoding">list of geocoders</a> which you'll need if you have addresses, but no latitude or longitude. Note that for this week's homework I want you to identify a shapefile.</p>
<p>The Data of the Week is a little scattered so let's fill in a Google Spreadsheet:
<a href="https://docs.google.com/spreadsheets/d/11JLkkyWZf3fvVz3aebgMjcZ6mxV-j5Gw7hEpeiPAGY4/edit#gid=0">with the presentation schedule</a>.</p>
</div> <!-- /.container all -->
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
</body>
</html>