-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from hotosm/mvum
Add conversion utilities for external highway datasets
- Loading branch information
Showing
4 changed files
with
573 additions
and
105 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,107 +1,46 @@ | ||
# Utility Programs | ||
|
||
Conflator includes a few bourne shell scripts used for bulk | ||
processing of data files. Because of the poor performance when | ||
processing huge data files, they're split up into manageable | ||
pieces. There are several assumptions made, namely that the OSM data | ||
is in postgres database, imported via [these | ||
instructions](conflation.md). The building footprint file can be | ||
either in a database, or the downloaded GeoJson formatted data file. | ||
|
||
Since all of the output files need to be web accessible, these scripts | ||
are usually run in the directory containing the data files. Each | ||
country should be in a separate directory. These scripts take two | ||
command line arguments, the country to be processed, and optionally a | ||
single project to process. By default all projects are processed. If a | ||
consistent naming convention is used, the basename of the current | ||
directory is to try to guess the proper country name. That same value | ||
is also used to identify the correct database or data file name. | ||
|
||
### For example | ||
|
||
~www/Africa/Kenya | ||
~www/Africa/Kenya/kenya-latest.pbf | ||
~www/Africa/Kenya/kenya.geojsonl | ||
~www/Africa/Nigeria | ||
~www/Africa/Nigeria/nigeria-latest.pbf | ||
~www/Africa/Nigeria/nigeria.geojsonl | ||
~www/Asia/Nepal | ||
~www/Asia/Nepal/... | ||
|
||
# Tasking Manager Projects | ||
|
||
Since the import data is huge, the Tasking Manager is used to | ||
validation the results of conflation. To further reduce the data size, | ||
the project boundaries are downlooaded from the Tasking Manager by | ||
using it's remote API. These are then saved to disk using the naming | ||
convention *12345-project.geojson*, where **12345** is a project | ||
ID. The project boundaries can downloaded using the | ||
[splitter.py](splitter.md) program, which is part of conflator. A | ||
boundary can be downloaded like this: | ||
|
||
> PATH/splitter.py -p 12345 | ||
Since a big import requires multiple Tasking Manager projects, to get | ||
started, download all of the boundaries for this import. | ||
|
||
## clipsrc.sh | ||
|
||
This script extracts all the buildings in the specified country into a | ||
data file. This assumes all the data has already been imported into | ||
postgres. Since there are usually multiple countries imported into | ||
postgres, this gets just the ones we want for furthur processing. | ||
|
||
This generates two output files from the database, namely the country | ||
name, postfixed by the data source. For example *kenya-osm.geojson* | ||
and *kenya-ms.geojson*. These data files are then split into | ||
smaller files based on a Tasking Manager project boundary. Each of the | ||
smaller files follows the same naming convention, *12345-osm.geojson* | ||
or *kenya-ms.geojson*. | ||
|
||
> PATH/clipsrc.sh kenya | ||
## update.sh | ||
|
||
This script processes the project sized data files for the best | ||
performance. One again, it looks for any files that follow the naming | ||
convention, and runs the [conflation script](conflator.md) on each of | ||
the project boundaries. The generates a single output file, containing | ||
buildings from the footprint file that are not already in OSM. This | ||
file is *12345-buildings.geojson*. | ||
|
||
> PATH/clipsrc.sh nigeria | ||
## index.sh | ||
|
||
This script generates a simple webpage to navigate all the data files, | ||
so they can be manually downloaded for validation. This script should | ||
be run in the directory with all the data files. The first section is | ||
just the project from the Tasking Manager, the rest are all the | ||
smaller files for each project. Each project has 3 generated data | ||
files, the two raw data files produced from the database, and the | ||
conflated building output. | ||
|
||
> ./index.sh | ||
## splittasks.sh | ||
|
||
This utility splits an existing data file of the results of building | ||
conflation into smaller pieces. If the project id is specified on the | ||
command line, only that project is downloaded. Otherwise the current | ||
directory is scanned for files using the naming convention of | ||
${projectid}-tasks.geojson. This then uses the X and Y coordinates of | ||
the task for the default zoom level This then uses the X and Y | ||
coordinates of the task for the default zoom level to uniquely name | ||
the data file so the Tasking Manager can load it. | ||
|
||
> PATH/splittasks.sh [project ID] | ||
## getosm | ||
|
||
This utility is to download smaller data files than are available | ||
from GeoFabrik. It requires a boundary polygon from a Tasking Manager | ||
project. If the project id is specified on the command line, only that | ||
project is downloaded. Otherwise the current directory is scanned for | ||
files using the naming convention of ${projectid}-projects.geojson. | ||
|
||
> PATH/getosm.sh [project ID] | ||
To conflate external datasets with OSM, the external data needs to be | ||
converted to the OSM tagging schema. Otherwise comparing tags gets | ||
very convoluted. Since every dataset uses a different schema, included | ||
are a few utility programs for converting external datasets. Currently | ||
the only datatsets are for highways. These datasets are available from | ||
the [USDA](https://www.usda.gov/), and have an appropriate license to | ||
use with OpenStreetMap. Indeed, some of this data has already been | ||
imported. The files are available from the | ||
[FSGeodata Clearinghouse](https://data.fs.usda.gov/geodata/edw/datasets.php?dsetCategory=transportation) | ||
|
||
Most of the fields in the dataset aren't needed for OSM, only the | ||
reference number if it has one, and the name. Most of these highways | ||
are already in OSM, but it's a bit of a mess, and mostly | ||
unvalidated. Most of the problems are related to the TIGER import | ||
in 2007. So the goal of these utilities is to add in the [TIGER | ||
fixup](https://wiki.openstreetmap.org/wiki/TIGER_fixup) work by | ||
updating or adding the name and a reference number. These utilities | ||
prepare the dataset for conflation. | ||
|
||
There are other fields in the datasets we might want, like surface | ||
type, is it 4wd only, etc... but often the OSM data is more up to | ||
date. And to really get that right, you need to ground truth it. | ||
|
||
## mvum.py | ||
|
||
This converts the [Motor Vehicle Use Map(MVUM)](https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.Road_MVUM.zip) dataset that contains | ||
data on highways more suitable for offroad vehicles. Some require | ||
specialized offroad vehicles like a UTV or ATV. The data in OSM for | ||
these roads is really poor. Often the reference number is wrong, or | ||
lacks the suffix. We assume the USDA data is correct when it comes to | ||
name and reference number, and this will get handled later by | ||
conflation. | ||
|
||
## roadcore.py | ||
|
||
This converts the [Road Core](https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.RoadCore_FS.zip) vehicle map. This contains data on all | ||
highways in a national forest. It's similar to the MVUM dataset. | ||
|
||
## Trails.py | ||
|
||
This converts the [NPSPublish](https://data.fs.usda.gov/geodata/edw/edw_resources/shp/S_USA.TrailNFS_Publish.zip) Trail dataset. These are hiking trails | ||
not open to motor vehicles. Currently much of this dataset has empty | ||
fields, but the trail name and reference number is useful. This | ||
utility is to support the OpenStreetMap US [Trails Initiative](https://openstreetmap.us/our-work/trails/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
#!/usr/bin/python3 | ||
|
||
# Copyright (c) 2021, 2022, 2023, 2024 Humanitarian OpenStreetMap Team | ||
# | ||
# This program is free software: you can redistribute it and/or modify | ||
# it under the terms of the GNU Affero General Public License as | ||
# published by the Free Software Foundation, either version 3 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# This program is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
# GNU Affero General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU Affero General Public License | ||
# along with this program. If not, see <https://www.gnu.org/licenses/>. | ||
|
||
import argparse | ||
import logging | ||
import sys | ||
import os | ||
from sys import argv | ||
from osm_fieldwork.osmfile import OsmFile | ||
from geojson import Point, Feature, FeatureCollection, dump, Polygon, load | ||
import geojson | ||
from shapely.geometry import shape, LineString, Polygon, mapping | ||
import shapely | ||
from shapely.ops import transform | ||
import pyproj | ||
import asyncio | ||
from codetiming import Timer | ||
import concurrent.futures | ||
from cpuinfo import get_cpu_info | ||
from time import sleep | ||
from thefuzz import fuzz, process | ||
from pathlib import Path | ||
from tqdm import tqdm | ||
import tqdm.asyncio | ||
|
||
# Instantiate logger | ||
log = logging.getLogger(__name__) | ||
|
||
# The number of threads is based on the CPU cores | ||
info = get_cpu_info() | ||
cores = info['count'] | ||
|
||
# shut off warnings from pyproj | ||
import warnings | ||
warnings.simplefilter(action='ignore', category=FutureWarning) | ||
|
||
class MVUM(object): | ||
def __init__(self, | ||
filespec: str = None, | ||
): | ||
self.file = None | ||
if filespec is not None: | ||
self.file = open(filespec, "r") | ||
|
||
def convert(self, | ||
filespec: str = None, | ||
) -> list: | ||
|
||
# FIXME: read in the whole file for now | ||
if filespec is not None: | ||
file = open(filespec, "r") | ||
else: | ||
file = self.file | ||
|
||
data = geojson.load(file) | ||
|
||
highways = list() | ||
for entry in data["features"]: | ||
geom = entry["geometry"] | ||
id = 0 | ||
sym = 0 | ||
op = None | ||
surface = str() | ||
name = str() | ||
props = dict() | ||
# print(entry["properties"]) | ||
if entry["properties"] is None or entry is None: | ||
continue | ||
if "ID" in entry["properties"]: | ||
props["ref:usfs"] = f"FR {entry['properties']['ID']}" | ||
if "NAME" in entry["properties"] and entry["properties"]["NAME"] is not None: | ||
title = entry["properties"]["NAME"].title() | ||
name = str() | ||
# Fix some common abbreviations | ||
if " Cr " in title: | ||
name = name.replace(" Cr ", " Creek ") | ||
elif " Cg " in title: | ||
name = name.replace(" Cg ", " Campground ") | ||
elif " Rd. " in title: | ||
name = name.replace(" Rd. ", " Road") | ||
elif " Mtn " in title: | ||
name = name.replace(" Mtn", " Mountain") | ||
else: | ||
name = title | ||
if name.find("Road") <= 0: | ||
props["name"] = f"{name} Road" | ||
if "OPERATIONA" in entry["properties"] and entry["properties"]["OPERATIONA"] is not None: | ||
op = int(entry["properties"]["OPERATIONA"][:1]) | ||
if op == 1: | ||
props["access"] = "no" | ||
elif op == 2: | ||
props["smoothness"] = "very bad" | ||
elif op == 3: | ||
props["smoothness"] = "good" | ||
elif op == 4: | ||
props["smoothness"] = "bad" | ||
elif op == 5: | ||
props["smoothness"] = "excellent" | ||
|
||
# if "SBS_SYMBOL" in entry["properties"] and op is None: | ||
# if "Not Maintained for" in entry["properties"]["SBS_SYMBOL"]: | ||
# props["smoothness"] = "very bad" | ||
# else: | ||
# sym = entry["properties"] | ||
if "SURFACETYP" in entry["properties"]: | ||
surface = entry["properties"]["SURFACETYP"] | ||
if surface is None: | ||
continue | ||
if surface[:3] == "NAT": | ||
props["surface"] = "dirt" | ||
if surface[:3] == "IMP" or surface[:5] == "CSOIL": | ||
props["surface"] = "gravel" | ||
props["surface"] = "compacted" | ||
elif surface[:3] == "AGG": | ||
props["surface"] = "gravel" | ||
elif surface[:2] == "AC": | ||
props["surface"] = "gravel" | ||
elif surface[:3] == "BST" or surface[:2] == "P ": | ||
props["surface"] = "paved" | ||
|
||
highways.append(Feature(geometry=geom, properties=props)) | ||
#print(props) | ||
|
||
return FeatureCollection(highways) | ||
|
||
|
||
async def main(): | ||
"""This main function lets this class be run standalone by a bash script""" | ||
parser = argparse.ArgumentParser( | ||
prog="mvum", | ||
formatter_class=argparse.RawDescriptionHelpFormatter, | ||
description="This program converts MVUM highway data into OSM tagging", | ||
epilog=""" | ||
This program processes the MVUM data. It will convert the MVUM dataset | ||
to using OSM tagging schema so it can be conflated. Abbreviations are | ||
discouraged in OSM, so they are expanded. Most entries in the MVUM | ||
dataset are ignored. For fixing the TIGER mess, all that is relevant | ||
are the name and the USFS reference number. The surface and smoothness | ||
tags are also converted, but should never overide what is in OSM, as the | ||
OSM values for these may be more recent. And the values change over time, | ||
so what is in the MVUM dataset may not be accurate. These tags are converted | ||
primarily as an aid to navigation when ground-truthing, since it's usually | ||
good to avoid any highway with a smoothness of "very bad" or worse. | ||
For Example: | ||
mvum.py -v -c -i WY_RoadsMVUM.geojson | ||
""", | ||
) | ||
parser.add_argument("-v", "--verbose", action="store_true", help="verbose output") | ||
parser.add_argument("-i", "--infile", required=True, help="Output file from the conflation") | ||
parser.add_argument("-c", "--convert", action="store_true", help="Convert MVUM feature to OSM feature") | ||
parser.add_argument("-o", "--outfile", default="out.geojson", help="Output file") | ||
|
||
args = parser.parse_args() | ||
|
||
mvum = MVUM() | ||
if args.convert: | ||
data = mvum.convert(args.infile) | ||
|
||
file = open(args.outfile, "w") | ||
geojson.dump(data, file) | ||
|
||
if __name__ == "__main__": | ||
"""This is just a hook so this file can be run standlone during development.""" | ||
loop = asyncio.new_event_loop() | ||
asyncio.set_event_loop(loop) | ||
loop.run_until_complete(main()) |
Oops, something went wrong.