Disclaimer: Please compare the data layers with the official PDF before using the data in your own project.
Delhi Development Authority recently released draft land use plan 2041 for Delhi and called for public comments. This repo has some of the geospatial layers that I was able to extract from the PDF map of the draft plan. The commands used for extraction are in the bat file. Command ogrinfo draftplan.pdf > layers.txt
was used to obtain all the layer names from the PDF.
When these layers were imported into QGIS, noticed that five of the extracted layers presented with a peculiar problem - all the geometries of these layers seemed distorted (image below). This could possibly be because of the 'shading' style used to represent the layer while making the PDF. The challenge was to eleminate the lines so the geometries underneath could be filtered in. Used the 'Select features using expression' option and used the expression num_points($geometry)=2
to select all the shading lines. Then, inverted the selection to select all the neat geometries and exported them in a separate file.
An interactive tool to visualise layers extracted so far can be viewed here developed by Nikhil VJ.
- BASE_MAP_2041_COMMERCIAL
- BASE_MAP_2041_INDUSTRIAL
- BASE_MAP_2041_RESIDENTIAL
- BASE_MAP_2041_TRANSPORTATION (split as BASE_MAP_2041_TRANSPORTATION_1 and BASE_MAP_2041_TRANSPORTATION_2)
- BASE_MAP_2041_UTILITY (split as BASE_MAP_2041_UTILITY_1 and BASE_MAP_2041_UTILITY_2)
- BASE_MAP_2041_GREEN_BELT_WATERBODY_1
- BASE_MAP_2041_GOVERNMENT
- BASE_MAP_2041_PSP_1
- BASE_MAP_2041_RECREATIONAL_1
- Boundaries_DDA_ZONE_Boundary
- Boundaries_LUTYEN_BUNGALOW_ZONE
- Boundaries_RESERVED_FOREST_BOUNDARY
- EXTRA_LAYERS_Builtup
- EXTRA_LAYERS_FLOODPLAIN
- EXTRA_LAYERS_LALDORA_GREENBELT_VOID_FILL
- EXTRA_LAYERS_Delhi_UC
- EXTRA_LAYERS_LDRA_VILLAGE
- ROAD_RAILWAY_METRO_LINES_Barapulla_Elevated_Road
- ROAD_RAILWAY_METRO_LINES_DMRC_Metro_Line
- ROAD_RAILWAY_METRO_LINES_HSR_Alignment
- ROAD_RAILWAY_METRO_LINES_NH_buffer
- ROAD_RAILWAY_METRO_LINES_Railway_Lines
- ROAD_RAILWAY_METRO_LINES_RRTS
- ROAD_RAILWAY_METRO_LINES_UER_BUFFER
- ROAD_RAILWAY_METRO_LINES_Roads_ROW_MPD_170321__query_applied
- Labels
Extraction of geospatial data from PDFs almost always results in some form of data loss. Even if all geometries are extracted, the attribute information is lost. For this exercise, I could not extract the 'labels' layer which contained the names/titles of each geographic feature in the PDF map. I also couldn't extract the category of feature. For instance, the layer 'Public & Semi Public Facilities' has 8 categories. However, this is not retained in the extracted layers.
When we extract without specifying any CRS (Co-ordinate Reference System), the resulting shapefiles have v.large co-ordinates like this: (716882.20,3166456.28)
Turns out they're projected in a different CRS, but that info wasn't available.
By using this tool: http://projfinder.com/ we were able to find it out: EPSG:32643 .
So, in the ogr2ogr lines we specify source and destnation CRS's - converting it to conventional lat-longs (which is EPSG:4326).
Quite a few local development authorities have started using GIS for compiling the spatial information they collect for planning related surveys and activities. The tools such as ArcMap (ArcGIS Desktop) or QGIS, usually allow exporting these layers as PDF maps. These PDFs are most often geospatial PDFs that retain the spatial information and the vector geometries of the constitutent layers. One can view the coordinate information in these PDFs by opening them in a software such as Adobe Acrobat and using the 'Tools> Measure> Geospatial Location Tool' option. This is also one quick way of verifying if the PDF is a geospatial PDF. This ability of the file format -to retain spatial info- allows extraction of the constituent layers back from the PDFs for any further spatial analysis. However, such an extraction process is still not ideal since the attribute information, such as labels/names of the features, are difficult to extract. And one also has to deal with projection/coordinate system related discrepancies that arise as a result of such extraction.
Till departments start sharing spatial data in usable formats, these workarounds should be of some help.