Skip to content

Commit

Permalink
ci: run pre-commit hooks on all files
Browse files Browse the repository at this point in the history
  • Loading branch information
spwoodcock committed Oct 24, 2023
1 parent 617f360 commit 6a01ffb
Show file tree
Hide file tree
Showing 6 changed files with 141 additions and 133 deletions.
12 changes: 6 additions & 6 deletions docs/geofabrik.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Geofabrik

This is a simple utility to download country data files from
[GeoFabrik](https://download.geofabrik.de/).
[GeoFabrik](https://download.geofabrik.de/).

options:
--help(-h) show this help message and exit
--verbose(-v) verbose output
--file(-f) FILE The country or US state to download
--list(-l) List all files on GeoFabrik
options:
--help(-h) show this help message and exit
--verbose(-v) verbose output
--file(-f) FILE The country or US state to download
--list(-l) List all files on GeoFabrik
67 changes: 33 additions & 34 deletions docs/overture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overture Map Data

The Overture Foundation (https://www.overturemaps.org) has been
The Overture Foundation (<https://www.overturemaps.org>) has been
recently formed to build a competitor to Google Maps. The plan is to
use OpenStreetMap (OSM) data as a baselayer, and layer other datasets
on top. The currently available data (July 2023) has 13 different
Expand Down Expand Up @@ -40,28 +40,27 @@ less columns in it, and each data type had a schema oriented towards
that data type. The new schema (Oct 2023) is larger, but all the data
types are supported in the same schema.

The schema used in the Overture data files is [documented here](
https://docs.overturemaps.org/reference). This document is just a
The schema used in the Overture data files is [documented here](https://docs.overturemaps.org/reference). This document is just a
summary with some implementation details.

### Buildings

The current list of buildings datasets is:

* Austin Building Footprints Year 2013 2D Buildings
* Boston BPDA 3D Buildings
* City of Cambridge, MA Open Data 3D Buildings
* Denver Regional Council of Governments 2D Buildings
* Esri Buildings | Austin Building Footprints Year 2013 2D Buildings
* Esri Buildings | Denver Regional Council of Governments 2D Buildings
* Esri Community Maps
* Miami-Dade County Open Data 3D Buildings
* OpenStreetMap
* Microsoft ML Buildings
* NYC Open Data 3D Buildings
* Portland Building Footprint 2D Buildings
* USGS Lidar
* Washington DC Open Data 3D Buildings
- Austin Building Footprints Year 2013 2D Buildings
- Boston BPDA 3D Buildings
- City of Cambridge, MA Open Data 3D Buildings
- Denver Regional Council of Governments 2D Buildings
- Esri Buildings | Austin Building Footprints Year 2013 2D Buildings
- Esri Buildings | Denver Regional Council of Governments 2D Buildings
- Esri Community Maps
- Miami-Dade County Open Data 3D Buildings
- OpenStreetMap
- Microsoft ML Buildings
- NYC Open Data 3D Buildings
- Portland Building Footprint 2D Buildings
- USGS Lidar
- Washington DC Open Data 3D Buildings

Since the Microsoft ML Buildings and the OpenStreetMap data is
available elsewhere, and is more up-to-date for global coverage, all
Expand All @@ -78,30 +77,30 @@ accurate.

### Places

The *places* data are POIs of places. This appears to be for
The _places_ data are POIs of places. This appears to be for
amenities, and contains tags related to that OSM category. This
dataset is from Meta, and the data appears derived from Facebook.

The columns that are of interest to OSM are:

* freeform - The address of the amenity, although the format is not
- freeform - The address of the amenity, although the format is not
consistent
* socials - An array of social media links for this amenity.
* phone - The phone number if it has one
* websites - The website URL if it has one
* value - The name of the amenity if known
- socials - An array of social media links for this amenity.
- phone - The phone number if it has one
- websites - The website URL if it has one
- value - The name of the amenity if known

### Highways

In the current highway *segment* data files, the only source is
In the current highway _segment_ data files, the only source is
OSM. In that cases it's better to use uptodate OSM data. It'll be
interesting to see if Overture imports the publically available
highway datasets from the USGS, or some state governments. That would
be very useful.

The Overture *segments* data files are equivalent to an OSM way, with
The Overture _segments_ data files are equivalent to an OSM way, with
tags specific to that highway linestring. There are separate data
files for *connections*, that are equivalant to an OSM relation.
files for _connections_, that are equivalant to an OSM relation.

### Admin Boundaries

Expand All @@ -115,21 +114,21 @@ reason to care about these files.
The names column can have 4 variations on the name. Each may also have
a language value as well.

* common
* official
* alternate
* short
- common
- official
- alternate
- short

Each of these can have multiple values, each of which consists of a
value and the language.

## sources

The sources column is an array of with two entries. The first entry is
the name of the dataset, and where it exists, a *recordID* to
the name of the dataset, and where it exists, a _recordID_ to
reference the source dataset. For OSM data, the recordID has 3
sub-fields. The first character is the type, *w* (way), *n* (node), or
*l* (line). The second is the OSM ID, and the third with a *v* is the
sub-fields. The first character is the type, _w_ (way), _n_ (node), or
_l_ (line). The second is the OSM ID, and the third with a _v_ is the
version of the feature in OSM.

For example: *w***123456**v2 is a way with ID 123456 and is version 2.
For example: \*w**\*123456**v2 is a way with ID 123456 and is version 2.
18 changes: 9 additions & 9 deletions docs/postgres.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ from a local postgres data, or the remote Underpass one. A boundary
polygon is used to define the area to be covered in the
extract. Optionally a data file can be used.

options:
--help(-h) show this help message and exit
--verbose(-v) verbose output
--uri(-u) URI Database URI
--boundary(-b) BOUNDARY Boundary polygon to limit the data size
--sql(-s) SQL Custom SQL query to execute against the database
--all(-a) ALL All the geometry or just centroids
--config(-c) CONFIG The config file for the query (json or yaml)
--outfile(-o) OUTFILE The output file
options:
--help(-h) show this help message and exit
--verbose(-v) verbose output
--uri(-u) URI Database URI
--boundary(-b) BOUNDARY Boundary polygon to limit the data size
--sql(-s) SQL Custom SQL query to execute against the database
--all(-a) ALL All the geometry or just centroids
--config(-c) CONFIG The config file for the query (json or yaml)
--outfile(-o) OUTFILE The output file
71 changes: 36 additions & 35 deletions osm_rawdata/importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,35 +20,31 @@
# <[email protected]>

import argparse
import concurrent.futures
import logging
import subprocess
import sys
import os
import concurrent.futures
import geojson
from geojson import Feature, FeatureCollection
from sys import argv
from pathlib import Path
from cpuinfo import get_cpu_info
from shapely.geometry import shape
from sys import argv

import geojson
import pyarrow.parquet as pq
from codetiming import Timer
from osm_rawdata.postgres import uriParser
from cpuinfo import get_cpu_info
from progress.spinner import PixelSpinner
from shapely import wkb
from shapely.geometry import shape
from sqlalchemy import MetaData, cast, column, create_engine, select, table, text
from sqlalchemy.dialects.postgresql import JSONB, insert
from sqlalchemy.engine.base import Connection
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import create_database, database_exists
from sqlalchemy.engine.base import Connection
from shapely.geometry import Point, LineString, Polygon
from shapely import wkt, wkb

# Find the other files for this project
import osm_rawdata as rw
import osm_rawdata.db_models
from osm_rawdata.db_models import Base
from osm_rawdata.postgres import uriParser

rootdir = rw.__path__[0]

Expand All @@ -57,71 +53,73 @@

# The number of threads is based on the CPU cores
info = get_cpu_info()
cores = info['count']
cores = info["count"]


def importThread(
data: list,
db: Connection,
):
data: list,
db: Connection,
):
"""Thread to handle importing
Args:
data (list): The list of tiles to download
db (Connection): A database connection
"""
# log.debug(f"In importThread()")
#timer = Timer(text="importThread() took {seconds:.0f}s")
#timer.start()
# timer = Timer(text="importThread() took {seconds:.0f}s")
# timer.start()
ways = table(
"ways_poly",
column("id"),
column("user"),
column("geom"),
column("tags"),
)
)

nodes = table(
"nodes",
column("id"),
column("user"),
column("geom"),
column("tags"),
)
)

index = 0

for feature in data:
# log.debug(feature)
index -= 1
entry = dict()
tags = feature['properties']
tags['building'] = 'yes'
entry['id'] = index
tags = feature["properties"]
tags["building"] = "yes"
entry["id"] = index
ewkt = shape(feature["geometry"])
geom = wkb.dumps(ewkt)
type = ewkt.geom_type
scalar = select(cast(tags, JSONB))

if type == 'Polygon':
if type == "Polygon":
sql = insert(ways).values(
# id = entry['id'],
geom=geom,
tags=scalar,
)
elif type == 'Point':
)
elif type == "Point":
sql = insert(nodes).values(
# id = entry['id'],
geom=geom,
tags=scalar,
)
)

db.execute(sql)
# db.commit()


def parquetThread(
data: list,
db: Connection,
):
):
"""Thread to handle importing
Args:
Expand All @@ -136,15 +134,15 @@ def parquetThread(
column("user"),
column("geom"),
column("tags"),
)
)

nodes = table(
"nodes",
column("id"),
column("user"),
column("geom"),
column("tags"),
)
)

index = -1
log.debug(f"There are {len(data)} entries in the data")
Expand Down Expand Up @@ -202,6 +200,7 @@ def parquetThread(
# print(f"FIXME2: {entry}")
timer.stop()


class MapImporter(object):
def __init__(
self,
Expand Down Expand Up @@ -229,7 +228,7 @@ def __init__(
"CREATE EXTENSION IF NOT EXISTS postgis; CREATE EXTENSION IF NOT EXISTS hstore;CREATE EXTENSION IF NOT EXISTS dblink;"
)
self.db.execute(sql)
#self.db.commit()
# self.db.commit()

Base.metadata.create_all(bind=engine)

Expand Down Expand Up @@ -354,8 +353,8 @@ def importGeoJson(
"""
# load the GeoJson file
file = open(infile, "r")
#size = os.path.getsize(infile)
#for line in file.readlines():
# size = os.path.getsize(infile)
# for line in file.readlines():
# print(line)
data = geojson.load(file)

Expand All @@ -379,26 +378,27 @@ def importGeoJson(
meta.create_all(engine)

# A chunk is a group of threads
entries = len(data['features'])
entries = len(data["features"])
chunk = round(entries / cores)

if entries <= chunk:
result = importThread(data['features'], connections[0])
result = importThread(data["features"], connections[0])
timer.stop()
return True

with concurrent.futures.ThreadPoolExecutor(max_workers=cores) as executor:
block = 0
while block <= entries:
log.debug("Dispatching Block %d:%d" % (block, block + chunk))
result = executor.submit(importThread, data['features'][block : block + chunk], connections[index])
result = executor.submit(importThread, data["features"][block : block + chunk], connections[index])
block += chunk
index += 1
executor.shutdown()
timer.stop()

return True


def main():
"""This main function lets this class be run standalone by a bash script."""
parser = argparse.ArgumentParser(
Expand Down Expand Up @@ -441,6 +441,7 @@ def main():
mi.importParquet(args.infile)
log.info(f"Imported {args.infile} into {args.uri}")


if __name__ == "__main__":
"""This is just a hook so this file can be run standalone during development."""
main()
Loading

0 comments on commit 6a01ffb

Please sign in to comment.