-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgeo2hxl.php
413 lines (309 loc) · 20.6 KB
/
geo2hxl.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
<?php
//Geo2HXL.php: Reads csv of a single admin layer (converted to csv with WKT geom using ogr2ogr) to HXL triples.
//CHANGE LOG
//Version 13 02/04/2013: fix added to handle strangely formatted p-codes (e.g. " 3.000000")
//Version 12 21/11/2012: fixed declaration of Admin0 to be hxl:atLevel Admin0
//Version 11 20/11/2012: added logic so that the pcode for the Country does not need to be in the file for Admin 1 (is declared directly as a variable in the configuration)
//Version 10 16/11/2012: added a couple more characters to the deaccent function
//Version 9 15/11/2012: added logic to handle a populated place class that is not matched in the array (returns class = "unknown")
//Version 8 13/11/2012: added configuration variables for country name and pcode for admin 0 so that they don't have to be included in the csv as explicit field values.
//Version 7 13/11/2012: MAJOR REVISION - Adding logic to handle populated places and to reject rows with missing values.
//Version 6 07/11/2012: Modifications to the data container elements to reflect the ValidOn property.
//Version 5 17/8/2012: Adds declaration for rdf:type hxl:country .
//Version 4: Adds in logic to truncate to desired precision of WKT
//Version 3: This version checks for whether or not the admin level being processed is level 0 (the national boundary). It ignores the atLocation property in this case.
//Version 2: This version has revisions to map to latest version of URI patterns and the Geolocation Standard. It adds the writing of prefixes and the DataContainer.
//THINGS IT DOESN'T DO:
//Doesn't parse the date stamp (used for the data container name) to also get the dc:date literal. dc:date is specified in the configuration.
//Labels all data containers with the optional organization value set to "unocha". A good future addition would be to make this configurable.
//Some metadata elements for the datacontainer are missing (reported by, for example).
//Although it declares a feature to be at it's AdminUnitLevel, no further definition is given to these levels (the title of the level or the country to which it belongs).
//For the populated places classification, this script doesn't allow a delcaration of what a given class represents. HXL has the ability to declare a "title" for a given class (ie: to indicate that "1stOrder" in HXL for a given country = "National Capital"). It would be good to add some logic to make these declartions. The same is true for the admin levels, which can have titles, but this script doesn't handle declaring those.
//-------------------CONFIGURATION-----------------------------------------------------
//Delcare below which csv columns contain which data (first column = 0) and other needed info from user
//See http://sites.google.com/site/ochaimwiki/cod-fod-guidance/administrative-boundaries for guidance on these fields
//Also, configure the code that writes the prefixes at the beginning of the output file. This is in the "add headers" section a bit further down.
//The sample configuration below is for BFA Admin 0
//For convenience, paste the CSV header row here:
//WKT,CNTRY_NAME,CNTRY_CODE
$countryName = "yadda" ; //Only used when processing Admin 0, otherwise ignored.
$countryPcode = "COD" ; //Only used when processing Admin 0 and Admin 1, otherwise ignored. This means that the pcode for the country itself does not need to be included in the Admin0 or Admin1 CSV input. For lower admin levels (admin > 1), the pcode of the level immediately above must be included.
$country_iso3 = "cod" ; //ISO 3166 Alpha3 three letter code for the country, lower case. This becomes part of the URI for the features generated by the script.
$geom_element = 0 ; //which column contains the WKT geometry. First column = 0.
$level_n_pcode_element = 5 ; //which column contains the pcode for the level that is being converted. First column = 0. Ignored if $n = 0 (see below). Set to 999 if the data to convert does not have a p-code.
$level_n_minus_one_pcode_element = 999 ; //which column contains the pcode for the admin unit one level above the level that is being converted. This is ignored if $n = 0 or $n = 1 (See below), or if set to 999 (do this in case the dataset does not contain a reference to the containing admin unit).
$n = 999 ; //base admin level being processed. Set to 0 if you are processing the national boundary. Set to 999 if you are processing the populated places layer. Otherwise set to the admin level you are processing: 1, 2, 3, etc.
$featureName_element = 8 ; //which column contains the feature name. Ignored if $n = 0.
$featureRefName_element = 8 ; //which column contains the feature ref name. Note that the script will automatically change accented characters if found, so this can be the same column as the $featureName_element. Ignored if $n = 0.
$popPlaceClass_element = 6 ; //for populated places, which column contains the class number. Note that these must be numbers with the lower numbers representing higher status in the hierarchy. See http://hxl.humanitarianresponse.info for details. Ignored if $n != 999.
//Populated Place Classes/Types/Categories Translation -------------------------------------------------------
//The list of elements below describes a translation between any classification of populated places in your dataset to the HXL system. This information is primarily used for cartographic symbolization of the populated place data. These settings are ignored if $n != 999.
//If your data doesn't have information for a given class, just leave it blank (keep the 'ignored'). Otherwise, put the value in your dataset that would indicate the status of a populated place equal to what is in the [ ]. It is recommended that your biggest/most important code be given 1st order, your smallest/least important code be given 10th order, with other ranks being distributed between. Blank orders are acceptable.
$pplClass['_1stOrder'] = '0' ;
$pplClass['_2ndOrder'] = 'ignored' ;
$pplClass['_3rdOrder'] = 'ignored' ;
$pplClass['_4thOrder'] = 'ignored' ;
$pplClass['_5thOrder'] = 'ignored' ;
$pplClass['_6thOrder'] = 'ignored' ;
$pplClass['_7thOrder'] = 'ignored' ;
$pplClass['_8thOrder'] = 'ignored' ;
$pplClass['_9thOrder'] = 'ignored' ;
$pplClass['_10thOrder'] = 'ignored' ;
//Processing settings
$precision = 7 ; //number of decimal places to which the WKT coordinates will be truncated. For Decimal Degrees, 7 yields approximately cm precision and is the recommended value. The default ogr2ogr output is 15 decimal places, about the radius of a hydrogen atom.
$file_to_process = "/Users/carsten/Desktop/CODs/Democratic Republic of Congo/cod_location/cod_pplp1_rgc.csv" ;
$output_file_name = $file_to_process.".ttl" ;
//Metadata items
$dcdate = "now" ; //the date the file is created. Format must be ISO 8601 format (level of granularity below the day is optional). Or simply put in "now" to use the current time stamp
$validon = "2011-10-01" ; //Beginning date for which this dataset is the valid one (in ISO 8601 format, level of granularity below the day is optional). This value is applied to the data container (i.e. named graph) which holds the data. The end of the period of validity for this dataset is the first later ValidOn for a given feature.
mb_internal_encoding('UTF-8');
//--------------FUNCTIONS--------------------------------------------------------------------------------------
function truncate($precision, $current_geom) {
$output = "" ;
$geom_length = strlen($current_geom) ;
$char_counter = 99;
$counting = false;
$delimiters = array(" " , "," , ")") ;
for ($i = 0; $i < $geom_length; $i++) {
$current_char = $current_geom[$i];
if ($current_char == ".") {
$char_counter = $precision + 1;
$counting = true;
} elseif (in_array ($current_char, $delimiters)) {
$char_counter = 99;
$counting = false;
}
if ($counting) {
if($char_counter <= $precision + 1 and $char_counter > 0 ) {
$output .= $current_char ;
}
$char_counter-- ;
} else {
$output .= $current_char ;
}
}
return $output ;
}
// declare these only once for use in the deaccent function
$search = array("ç","æ", "", "á","é","í","ó","ú","à","è","ì","ò","ù","ä","ë","ï","ö","ü","ÿ","â","ê","î","ô","û","å","e","i","ø","u","Ô","Â","Á","Í","Ó","ñ","Ñ","É");
$replace = array("c","ae","oe","a","e","i","o","u","a","e","i","o","u","a","e","i","o","u","y","a","e","i","o","u","a","e","i","o","u","O","A","A","I","O","n","N","E");
foreach ($search as $key => $value) {
$search[$key] = utf8_encode($value);
}
foreach ($replace as $key => $value) {
$replace[$key] = utf8_encode($value);
}
//used to clean refnames
function deaccent($accentedString) {
global $search, $replace;
$deaccented = str_replace($search, $replace, $accentedString);
return $deaccented;
}
// removes unneccessary blanks and digits from pcodes
function shrink($pcode){
// remove blanks from pcode
$pcode = trim($pcode);
// remove "digits" from pcode (e.g. "3.000000" -> "3")
$arr = explode(".", $pcode);
return $arr[0];
}
//--------------DECLARE FIXED PARTS OF URIs--------------------------------------------------------------------
$base_data_uri = "<http://hxl.humanitarianresponse.info/data/" ;
$base_locations_uri = "<http://hxl.humanitarianresponse.info/data/locations/admin/" ;
$ns_uri = "hxl:" ;
$geo_ns_uri = "geo:" ;
$dc_ns_uri = "dc:" ;
$AdminUnit_id = "AdminUnit" ;
$PopulatedPlace_id = "PopulatedPlace" ;
$atLocation_id = "atLocation" ;
$atLevel_id = "atLevel" ;
$inClass_id = "inClass" ;
$pcode_id = "pcode" ;
$featureName_id = "featureName" ;
$featureRefName_id = "featureRefName" ;
$hasGeometry_id = "hasGeometry" ;
$geom_uri = "geom" ;
$hasSerialization_id = "hasSerialization" ;
$wktLiteral_id = "wktLiteral";
$Geometry_id = "Geometry" ;
$DataContainer_id = "DataContainer" ;
$Country_id = "Country" ;
//------------------------START PROCESSING--------------------------------------------
// set the maximum execution time for the script to 5 minutes (increase if necessary):
set_time_limit(300);
// if dcdate is set to "now", generate the time stamp:
if($dcdate == "now") $dcdate = date("c");
$csv_handle = fopen($file_to_process,"r") or exit ("Unable to open file $file_to_process") ;
$output = fopen($output_file_name,"w") or exit ("Unable to create new file") ;
$base_uri = $base_locations_uri . $country_iso3 . "/" ;
//add headers
fwrite($output , "@prefix hxl: <http://hxl.humanitarianresponse.info/ns/#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix dc: <http://purl.org/dc/terms/> .\n") ;
//create DataContainer and associate metadata items
$time = gettimeofday(); //get time for timestamp which is used as datacontainer name (according to the HXL standard for URI patterns)
$timestamp = $time['sec'] . "." . $time['usec'] ;
fwrite ($output , $base_data_uri . "datacontainers" . "/" . "unocha/" . $timestamp . "> a " . $ns_uri . $DataContainer_id . " .\n") ;
fwrite ($output , $base_data_uri . "datacontainers" . "/" . "unocha/" . $timestamp . "> " . $dc_ns_uri . "date " . "\"" . $dcdate . "\"" . " .\n") ;
fwrite ($output , $base_data_uri . "datacontainers" . "/" . "unocha/" . $timestamp . "> " . $ns_uri . "validOn " . "\"" . $validon . "\"" . " .\n") ;
/*if (strlen($validityEnd) > 3) //must have at least 4 chars to be a year
{ fwrite ($output , $base_data_uri . "datacontainers" . "/" . "unocha/" . $timestamp . " " . $ns_uri . "validityEnd " . "\"" . $validityEnd . "\"" . " .\n") ;}*/
fgetcsv($csv_handle,0,",") ; //reads and discards the first line
$csvline = 2 ; //set line counter for error reporting
while(!feof($csv_handle)){
$msg = ""; // we'll use this for the error message
$current = fgetcsv($csv_handle,0,",") ;
//test for blank line (usually last line at end), if so, exit the while loop
if (count($current)==1)
{break;}
//check for missing data and write an error if found
if ($n == 999) { //handles populated places (not admin units)
$reject = "keep" ;
// check if all required elements are there:
$testarray = array();
// don't test for elements that we don't have anyway:
if($level_n_minus_one_pcode_element != 999)
$testarray[] = $current[$level_n_minus_one_pcode_element];
if($level_n_pcode_element != 999)
$testarray[] = $current[$level_n_pcode_element];
// Everything else should be there
$testarray[] = $current[$featureName_element];
$testarray[] = $current[$featureRefName_element];
$testarray[] = $current[$popPlaceClass_element];
$testarray[] = $current[$geom_element];
for ($i = 1; $i <= count($testarray); $i++) {
if ((empty($testarray[$i-1]) && $testarray[$i-1] != 0) || ($testarray[$i-1] == "")) {
$reject = "reject" ;
$msg = $msg . " column $i is empty; ";
}
}
} elseif ($n == 0) { //handles national boundaries (admin 0)
$reject = 'keep' ;
$testarray = array($countryPcode,
$countryName,
$current[$geom_element]) ;
for ($i = 1; $i <= count($testarray); $i++) {
if ((empty($testarray[$i-1]) && $testarray[$i-1] != 0) || ($testarray[$i-1] == "")) {
$reject = "reject" ;
$msg = $msg . " column $i is empty; ";
}
}
} elseif ($n == 1) { //handles admin 1
$reject = 'keep' ;
$testarray = array($countryPcode,
$current[$level_n_pcode_element],
$current[$featureName_element],
$current[$featureRefName_element],
$current[$geom_element]) ;
for ($i = 0; $i < count($testarray); $i++) {
if ((empty($testarray[$i]) && $testarray[$i] != 0) || ($testarray[$i] == "")) {
$reject = "reject" ;
$msg = $msg . " column $i is empty; ";
}
}
} elseif ($n > 1 && $n < 999) { //handles sub-national boundaries (admin > 1)
$reject = 'keep' ;
$testarray = array($current[$featureName_element],
$current[$featureRefName_element],
$current[$geom_element]) ;
// don't test for elements that we don't have anyway:
if($level_n_minus_one_pcode_element != 999)
$testarray[] = $current[$level_n_minus_one_pcode_element] ;
if($level_n_pcode_element != 999)
$testarray[] = $current[$level_n_pcode_element] ;
for ($i = 1; $i <= count($testarray); $i++) {
if ((empty($testarray[$i-1]) && $testarray[$i-1] != 0) || ($testarray[$i-1] == "")) {
$reject = "reject" ;
$msg = $msg . " column $i is empty; ";
}
}
} else {
echo "Variable n is set to an illegal value which is > 999. Exiting script.";
fclose($csv_handle) ;
fclose($output) ;
exit() ;
}
if ($reject == "reject") {
echo ("
Rejected line " . $csvline . " due to missing data: " . $msg . " <br />") ;
} else //generate all the triples for the current line of the CSV
//set up the base URI that is reused in most of the triples
// if the place does not have a p-code, construct the URI from the
// p-code of the containing admin unit, and the place name.
if($level_n_pcode_element == 999){
$clean_name = mb_strtolower(preg_replace('%[^a-z0-9_-]%six','_',deaccent($current[$featureName_element])));
$admunit_uri = $base_uri . shrink($current[$level_n_minus_one_pcode_element]) . "/" . $clean_name . ">";
$admunit_geom_uri = $base_uri . shrink($current[$level_n_minus_one_pcode_element]) . "/" . $clean_name . "/" . $geom_uri .">";
} else {
if ($n > 0){
$admunit_uri = $base_uri . shrink($current[$level_n_pcode_element]) . ">" ;
$admunit_geom_uri = $base_uri . shrink($current[$level_n_pcode_element]) . "/" . $geom_uri . ">";
} else {
$admunit_uri = $base_uri . $countryPcode . ">" ;
$admunit_geom_uri = $base_uri . $countryPcode . "/" . $geom_uri . ">";
}
}
//create the feature and its basic attributes
if ($n == 0) //handles admin 0 (national boundaries)
{fwrite($output , $admunit_uri . " a " . $ns_uri . $Country_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $atLevel_id . " " . $base_uri . "adminlevel" . $n . "> .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $pcode_id . " \"" . $countryPcode . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureName_id . " \"" . $countryName . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureRefName_id . " \"" . deaccent($countryName) . "\" .\n") ;
fwrite($output , $admunit_geom_uri . " a " . $geo_ns_uri . $Geometry_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $geo_ns_uri . $hasGeometry_id . " " . $admunit_geom_uri . " .\n") ;
fwrite($output , $admunit_geom_uri . " " . $geo_ns_uri . $hasSerialization_id . " " . "\"" . truncate($precision,$current[$geom_element]) . "\"^^" . $geo_ns_uri . $wktLiteral_id . " .\n") ;
}
elseif ($n == 1) //handles sub-national admin boundaries
{fwrite($output , $admunit_uri . " a " . $ns_uri . $AdminUnit_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $atLevel_id . " " . $base_uri . "adminlevel" . $n . "> .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $atLocation_id . " " . $base_uri . $countryPcode . "> .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $pcode_id . " \"" . shrink($current[$level_n_pcode_element]) . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureName_id . " \"" . ucwords(mb_strtolower($current[$featureName_element])) . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureRefName_id . " \"" . ucwords(mb_strtolower(deaccent($current[$featureRefName_element]))) . "\" .\n") ;
fwrite($output , $admunit_geom_uri . " a " . $geo_ns_uri . $Geometry_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $geo_ns_uri . $hasGeometry_id . " " . $admunit_geom_uri . " .\n") ;
fwrite($output , $admunit_geom_uri . " " . $geo_ns_uri . $hasSerialization_id . " " . "\"" . truncate($precision,$current[$geom_element]) . "\"^^" . $geo_ns_uri . $wktLiteral_id . " .\n") ;
}
elseif ($n > 0 && $n < 999) //handles sub-national admin boundaries
{fwrite($output , $admunit_uri . " a " . $ns_uri . $AdminUnit_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $atLevel_id . " " . $base_uri . "adminlevel" . $n . "> .\n") ;
if($level_n_minus_one_pcode_element != 999){
fwrite($output , $admunit_uri . " " . $ns_uri . $atLocation_id . " " . $base_uri . shrink($current[$level_n_minus_one_pcode_element]) . "> .\n") ;
}
fwrite($output , $admunit_uri . " " . $ns_uri . $pcode_id . " \"" . shrink($current[$level_n_pcode_element]) . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureName_id . " \"" . ucwords(mb_strtolower($current[$featureName_element])) . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureRefName_id . " \"" . ucwords(mb_strtolower(deaccent($current[$featureRefName_element]))) . "\" .\n") ;
fwrite($output , $admunit_geom_uri . " a " . $geo_ns_uri . $Geometry_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $geo_ns_uri . $hasGeometry_id . " " . $admunit_geom_uri . " .\n") ;
fwrite($output , $admunit_geom_uri . " " . $geo_ns_uri . $hasSerialization_id . " " . "\"" . truncate($precision,$current[$geom_element]) . "\"^^" . $geo_ns_uri . $wktLiteral_id . " .\n") ;
}
elseif ($n == 999) //handles populated places
{fwrite($output , $admunit_uri . " a " . $ns_uri . $PopulatedPlace_id . " .\n") ;
if($level_n_minus_one_pcode_element != 999){
fwrite($output , $admunit_uri . " " . $ns_uri . $atLocation_id . " " . $base_uri . shrink($current[$level_n_minus_one_pcode_element]) . "> .\n") ;
}
//determine ppl class from pplClass array
$class = array_search($current[$popPlaceClass_element],$pplClass) ;
if ($class != FALSE){
fwrite($output , $admunit_uri . " " . $ns_uri . $inClass_id . " " . $base_uri . "pplclass/" . $class . "> .\n") ;
} else {
fwrite($output , $admunit_uri . " " . $ns_uri . $inClass_id . " " . $base_uri . "pplclass/" . "unknown" . "> .\n") ;
}
// only spit out the p-code if we have one:
if($level_n_pcode_element != 999){
fwrite($output , $admunit_uri . " " . $ns_uri . $pcode_id . " \"" . shrink($current[$level_n_pcode_element]) . "\" .\n") ;
}
fwrite($output , $admunit_uri . " " . $ns_uri . $featureName_id . " \"" . ucwords(mb_strtolower($current[$featureName_element])) . "\" .\n") ;
fwrite($output , $admunit_uri . " " . $ns_uri . $featureRefName_id . " \"" . ucwords(mb_strtolower(deaccent($current[$featureRefName_element]))) . "\" .\n") ;
fwrite($output , $admunit_geom_uri . " a " . $geo_ns_uri . $Geometry_id . " .\n") ;
fwrite($output , $admunit_uri . " " . $geo_ns_uri . $hasGeometry_id . " " . $admunit_geom_uri . " .\n") ;
fwrite($output , $admunit_geom_uri . " " . $geo_ns_uri . $hasSerialization_id . " " . "\"" . truncate($precision,$current[$geom_element]) . "\"^^" . $geo_ns_uri . $wktLiteral_id . " .\n") ;
}
$csvline ++;
}
//close the files
fclose($csv_handle) ;
fclose($output) ;
echo 'Done. Output written to '.$output_file_name.'.';
?>