ci: run pre-commit hooks on all files

hotosm · Oct 24, 2023 · 6a01ffb · 6a01ffb
1 parent 617f360
commit 6a01ffb
Show file tree

Hide file tree

Showing 6 changed files with 141 additions and 133 deletions.
diff --git a/docs/geofabrik.md b/docs/geofabrik.md
@@ -1,10 +1,10 @@
 # Geofabrik
 
 This is a simple utility to download country data files from
-[GeoFabrik](https://download.geofabrik.de/). 
+[GeoFabrik](https://download.geofabrik.de/).
 
-	options:
-	--help(-h)            show this help message and exit
-	--verbose(-v)         verbose output
-	--file(-f) FILE       The country or US state to download
-	--list(-l)            List all files on GeoFabrik
+    options:
+    --help(-h)            show this help message and exit
+    --verbose(-v)         verbose output
+    --file(-f) FILE       The country or US state to download
+    --list(-l)            List all files on GeoFabrik
diff --git a/docs/overture.md b/docs/overture.md
@@ -1,6 +1,6 @@
 # Overture Map Data
 
-The Overture Foundation (https://www.overturemaps.org) has been
+The Overture Foundation (<https://www.overturemaps.org>) has been
 recently formed to build a competitor to Google Maps. The plan is to
 use OpenStreetMap (OSM) data as a baselayer, and layer other datasets
 on top. The currently available data (July 2023) has 13 different
@@ -40,28 +40,27 @@ less columns in it, and each data type had a schema oriented towards
 that data type. The new schema (Oct 2023) is larger, but all the data
 types are supported in the same schema.
 
-The schema used in the Overture data files is [documented here](
-https://docs.overturemaps.org/reference). This document is just a
+The schema used in the Overture data files is [documented here](https://docs.overturemaps.org/reference). This document is just a
 summary with some implementation details.
 
 ### Buildings
 
 The current list of buildings datasets is:
 
-* Austin Building Footprints Year 2013 2D Buildings
-* Boston BPDA 3D Buildings
-* City of Cambridge, MA Open Data 3D Buildings
-* Denver Regional Council of Governments 2D Buildings
-* Esri Buildings | Austin Building Footprints Year 2013 2D Buildings
-* Esri Buildings | Denver Regional Council of Governments 2D Buildings
-* Esri Community Maps
-* Miami-Dade County Open Data 3D Buildings
-* OpenStreetMap
-* Microsoft ML Buildings
-* NYC Open Data 3D Buildings
-* Portland Building Footprint 2D Buildings
-* USGS Lidar
-* Washington DC Open Data 3D Buildings
+- Austin Building Footprints Year 2013 2D Buildings
+- Boston BPDA 3D Buildings
+- City of Cambridge, MA Open Data 3D Buildings
+- Denver Regional Council of Governments 2D Buildings
+- Esri Buildings | Austin Building Footprints Year 2013 2D Buildings
+- Esri Buildings | Denver Regional Council of Governments 2D Buildings
+- Esri Community Maps
+- Miami-Dade County Open Data 3D Buildings
+- OpenStreetMap
+- Microsoft ML Buildings
+- NYC Open Data 3D Buildings
+- Portland Building Footprint 2D Buildings
+- USGS Lidar
+- Washington DC Open Data 3D Buildings
 
 Since the Microsoft ML Buildings and the OpenStreetMap data is
 available elsewhere, and is more up-to-date for global coverage, all
@@ -78,30 +77,30 @@ accurate.
 
 ### Places
 
-The *places* data are POIs of places. This appears to be for
+The _places_ data are POIs of places. This appears to be for
 amenities, and contains tags related to that OSM category. This
 dataset is from Meta, and the data appears derived from Facebook.
 
 The columns that are of interest to OSM are:
 
-* freeform - The address of the amenity, although the format is not
+- freeform - The address of the amenity, although the format is not
   consistent
-* socials - An array of social media links for this amenity.
-* phone - The phone number if it has one
-* websites - The website URL if it has one
-* value - The name of the amenity if known
+- socials - An array of social media links for this amenity.
+- phone - The phone number if it has one
+- websites - The website URL if it has one
+- value - The name of the amenity if known
 
 ### Highways
 
-In the current highway *segment* data files, the only source is
+In the current highway _segment_ data files, the only source is
 OSM. In that cases it's better to use uptodate OSM data. It'll be
 interesting to see if Overture imports the publically available
 highway datasets from the USGS, or some state governments. That would
 be very useful.
 
-The Overture *segments* data files are equivalent to an OSM way, with
+The Overture _segments_ data files are equivalent to an OSM way, with
 tags specific to that highway linestring. There are separate data
-files for *connections*, that are equivalant to an OSM relation.
+files for _connections_, that are equivalant to an OSM relation.
 
 ### Admin Boundaries
 
@@ -115,21 +114,21 @@ reason to care about these files.
 The names column can have 4 variations on the name. Each may also have
 a language value as well.
 
-* common
-* official
-* alternate
-* short
+- common
+- official
+- alternate
+- short
 
 Each of these can have multiple values, each of which consists of a
 value and the language.
 
 ## sources
 
 The sources column is an array of with two entries. The first entry is
-the name of the dataset, and where it exists, a *recordID* to
+the name of the dataset, and where it exists, a _recordID_ to
 reference the source dataset. For OSM data, the recordID has 3
-sub-fields. The first character is the type, *w* (way), *n* (node), or
-*l* (line). The second is the OSM ID, and the third with a *v* is the
+sub-fields. The first character is the type, _w_ (way), _n_ (node), or
+_l_ (line). The second is the OSM ID, and the third with a _v_ is the
 version of the feature in OSM.
 
-For example: *w***123456**v2 is a way with ID 123456 and is version 2.
+For example: \*w**\*123456**v2 is a way with ID 123456 and is version 2.
diff --git a/docs/postgres.md b/docs/postgres.md
@@ -6,12 +6,12 @@ from a local postgres data, or the remote Underpass one. A boundary
 polygon is used to define the area to be covered in the
 extract. Optionally a data file can be used.
 
-	options:
-	--help(-h)               show this help message and exit
-	--verbose(-v)            verbose output
-	--uri(-u) URI            Database URI
-	--boundary(-b) BOUNDARY  Boundary polygon to limit the data size
-	--sql(-s) SQL            Custom SQL query to execute against the database
-	--all(-a) ALL            All the geometry or just centroids
-	--config(-c) CONFIG      The config file for the query (json or yaml)
-	--outfile(-o) OUTFILE    The output file
+    options:
+    --help(-h)               show this help message and exit
+    --verbose(-v)            verbose output
+    --uri(-u) URI            Database URI
+    --boundary(-b) BOUNDARY  Boundary polygon to limit the data size
+    --sql(-s) SQL            Custom SQL query to execute against the database
+    --all(-a) ALL            All the geometry or just centroids
+    --config(-c) CONFIG      The config file for the query (json or yaml)
+    --outfile(-o) OUTFILE    The output file
diff --git a/osm_rawdata/importer.py b/osm_rawdata/importer.py
@@ -20,35 +20,31 @@
 # <[email protected]>
 
 import argparse
+import concurrent.futures
 import logging
 import subprocess
 import sys
-import os
-import concurrent.futures
-import geojson
-from geojson import Feature, FeatureCollection
-from sys import argv
 from pathlib import Path
-from cpuinfo import get_cpu_info
-from shapely.geometry import shape
+from sys import argv
 
+import geojson
 import pyarrow.parquet as pq
 from codetiming import Timer
-from osm_rawdata.postgres import uriParser
+from cpuinfo import get_cpu_info
 from progress.spinner import PixelSpinner
 from shapely import wkb
+from shapely.geometry import shape
 from sqlalchemy import MetaData, cast, column, create_engine, select, table, text
 from sqlalchemy.dialects.postgresql import JSONB, insert
+from sqlalchemy.engine.base import Connection
 from sqlalchemy.orm import sessionmaker
 from sqlalchemy_utils import create_database, database_exists
-from sqlalchemy.engine.base import Connection
-from shapely.geometry import Point, LineString, Polygon
-from shapely import wkt, wkb
 
 # Find the other files for this project
 import osm_rawdata as rw
 import osm_rawdata.db_models
 from osm_rawdata.db_models import Base
+from osm_rawdata.postgres import uriParser
 
 rootdir = rw.__path__[0]
 
@@ -57,71 +53,73 @@
 
 # The number of threads is based on the CPU cores
 info = get_cpu_info()
-cores = info['count']
+cores = info["count"]
+
 
 def importThread(
-        data: list,
-        db: Connection,
-        ):
+    data: list,
+    db: Connection,
+):
     """Thread to handle importing
 
     Args:
         data (list): The list of tiles to download
         db (Connection): A database connection
     """
     # log.debug(f"In importThread()")
-    #timer = Timer(text="importThread() took {seconds:.0f}s")
-    #timer.start()
+    # timer = Timer(text="importThread() took {seconds:.0f}s")
+    # timer.start()
     ways = table(
         "ways_poly",
         column("id"),
         column("user"),
         column("geom"),
         column("tags"),
-        )
+    )
 
     nodes = table(
         "nodes",
         column("id"),
         column("user"),
         column("geom"),
         column("tags"),
-        )
+    )
 
     index = 0
 
     for feature in data:
         # log.debug(feature)
         index -= 1
         entry = dict()
-        tags = feature['properties']
-        tags['building'] = 'yes'
-        entry['id'] = index
+        tags = feature["properties"]
+        tags["building"] = "yes"
+        entry["id"] = index
         ewkt = shape(feature["geometry"])
         geom = wkb.dumps(ewkt)
         type = ewkt.geom_type
         scalar = select(cast(tags, JSONB))
 
-        if type == 'Polygon':
+        if type == "Polygon":
             sql = insert(ways).values(
                 # id = entry['id'],
                 geom=geom,
                 tags=scalar,
-                )
-        elif type == 'Point':
+            )
+        elif type == "Point":
             sql = insert(nodes).values(
                 # id = entry['id'],
                 geom=geom,
                 tags=scalar,
-                )
+            )
 
         db.execute(sql)
         # db.commit()
 
+
 def parquetThread(
     data: list,
     db: Connection,
-    ):
+):
     """Thread to handle importing
 
     Args:
@@ -136,15 +134,15 @@ def parquetThread(
         column("user"),
         column("geom"),
         column("tags"),
-        )
+    )
 
     nodes = table(
         "nodes",
         column("id"),
         column("user"),
         column("geom"),
         column("tags"),
-        )
+    )
 
     index = -1
     log.debug(f"There are {len(data)} entries in the data")
@@ -202,6 +200,7 @@ def parquetThread(
         # print(f"FIXME2: {entry}")
     timer.stop()
 
+
 class MapImporter(object):
     def __init__(
         self,
@@ -229,7 +228,7 @@ def __init__(
                 "CREATE EXTENSION IF NOT EXISTS postgis; CREATE EXTENSION IF NOT EXISTS hstore;CREATE EXTENSION IF NOT EXISTS dblink;"
             )
             self.db.execute(sql)
-            #self.db.commit()
+            # self.db.commit()
 
             Base.metadata.create_all(bind=engine)
 
@@ -354,8 +353,8 @@ def importGeoJson(
         """
         # load the GeoJson file
         file = open(infile, "r")
-        #size = os.path.getsize(infile)
-        #for line in file.readlines():
+        # size = os.path.getsize(infile)
+        # for line in file.readlines():
         #    print(line)
         data = geojson.load(file)
 
@@ -379,26 +378,27 @@ def importGeoJson(
                 meta.create_all(engine)
 
         # A chunk is a group of threads
-        entries = len(data['features'])
+        entries = len(data["features"])
         chunk = round(entries / cores)
 
         if entries <= chunk:
-            result = importThread(data['features'], connections[0])
+            result = importThread(data["features"], connections[0])
             timer.stop()
             return True
 
         with concurrent.futures.ThreadPoolExecutor(max_workers=cores) as executor:
             block = 0
             while block <= entries:
                 log.debug("Dispatching Block %d:%d" % (block, block + chunk))
-                result = executor.submit(importThread, data['features'][block : block + chunk], connections[index])
+                result = executor.submit(importThread, data["features"][block : block + chunk], connections[index])
                 block += chunk
                 index += 1
             executor.shutdown()
         timer.stop()
 
         return True
 
+
 def main():
     """This main function lets this class be run standalone by a bash script."""
     parser = argparse.ArgumentParser(
@@ -441,6 +441,7 @@ def main():
         mi.importParquet(args.infile)
     log.info(f"Imported {args.infile} into {args.uri}")
 
+
 if __name__ == "__main__":
     """This is just a hook so this file can be run standalone during development."""
     main()