Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

29 clip osm #34

Merged
merged 123 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
7b138f1
chore: Add ipykernel to deps
Jun 21, 2023
6b29978
chore: Add reqs for scraping GTFS lookup table
Jun 28, 2023
7eaffe9
feat: Func to ingest GTFS route_type table from urls
Jun 28, 2023
5bf382f
refactor: Dynamic column name lookup
Jun 28, 2023
ac0ae02
refactor: Reduce num of variables
Jun 28, 2023
2a4a433
feat: Defensive checking & refactoring for complexity hook
Jun 28, 2023
f7c2844
docs: Add flag guidance to test module
Jun 29, 2023
e8f5607
docs: Typos
Jun 29, 2023
5acc6e4
chore: Test fixture for route_type lookup
Jun 29, 2023
a29ea96
chore: Adjust pickle ignore rules for single named exception
Jun 29, 2023
bcfd84c
chore: Setup runinteg pytest flag
Jun 29, 2023
f91f508
test: Integration test check scraped html table is stable
Jun 29, 2023
ac0c179
test: Defensive checks raise as expected
Jun 29, 2023
ca73900
chore: Add test flag, Sergio suggested fix
Jun 29, 2023
3652ae7
Merge branch 'dev' into 15-gtfs-validation-pipeline
Jun 29, 2023
ddfe587
refactor: Helper function easier to mock requests
Jun 29, 2023
62d2925
chore: Add mocking dependencies
Jun 29, 2023
fbf32d8
test: Check expected lookup is returned
Jun 29, 2023
a63a51c
refactor: Use class to handle mocked return values
Jun 29, 2023
c50c266
fix: Catch the final gtfs description as not succeeded by br tag
Jun 29, 2023
c168573
fix: Update response value for new func logic
Jun 29, 2023
ab205bf
test: Table scraped correctly with extended schema format
Jun 29, 2023
40186dc
fix: Update test fixture lookup with corrected format
Jun 29, 2023
c20a94a
chore: Delete placeholder script
Jun 29, 2023
2ed8ec3
chore: Add toml to deps
Jul 3, 2023
75f8c77
feat: First iteration of validation class
Jul 3, 2023
953e1b2
style: Consistent use of speechmarks
Jul 3, 2023
26dc30c
feat: Pipeline to check, clean, visualise GTFS
Jul 3, 2023
3d443ec
feat: Helper print statements in pipeline
Jul 3, 2023
1650c90
test: Test class init defensive behaviour
Jul 3, 2023
dfc3b21
feat: Class defence implementation
Jul 3, 2023
1c06fd0
test: Assertions about feed instantiated attribute
Jul 3, 2023
5a56e6f
test: Passes on conversion from UK meters to m
Jul 3, 2023
8555382
test: Check dimensions of validity_df attribute
Jul 3, 2023
37cb97b
feat: Defensive checks for print_alerts method
Jul 3, 2023
ad778bb
test: Defensive behaviour of print_alerts method
Jul 3, 2023
a39fc34
fix: Use hasattr to check for attribute existence
Jul 3, 2023
a987d72
test: print_alert prints single error without truncation
Jul 3, 2023
3568572
test: print_alerts prints multiple alerts without truncation
Jul 3, 2023
e895a95
refactor: Tests use function scope fixture
Jul 3, 2023
e2b031f
fix: Filter pkg_resources warnings
Jul 4, 2023
61f9afa
refactor: Rm unused **kwargs
Jul 4, 2023
7fbcd49
test: Defensive behaviour for viz_stops
Jul 4, 2023
04dd63f
refactor: Helper function checks path-likes
Jul 4, 2023
15c8ef1
viz_stops defense implementation
Jul 4, 2023
fabbbbe
refactor: Unified save statement
Jul 4, 2023
c5fd6c1
test: viz_stops behaviour when mapping points
Jul 4, 2023
e593b3d
test: Fileops on viz_plot when plot convex hull
Jul 4, 2023
9d57d9f
test: Internal helper renders text as required
Jul 4, 2023
42f7180
refactor: Minor text formatting
Jul 4, 2023
7320cee
test: Expected table format for get_route_modes
Jul 4, 2023
3778229
feat: Defensive checks for summarise_weekday
Jul 4, 2023
e773ca4
test: Defensive behaviour for summarise_weekday
Jul 4, 2023
a2278a3
feat: Defensive implementation for summarise_weekday
Jul 4, 2023
6484092
refactor: Move integration test into more appropriate test module
Jul 4, 2023
de72c6b
test: Expensive test for test_summarise_weekday_defence with pytest flag
Jul 4, 2023
fc0dfc6
test: Check mockers called as expected
Jul 5, 2023
102f812
refactor: Mock scrape call when test get_route_modes
Jul 5, 2023
a4ddb9e
feat: Pipeline optionally prints performance profiling
Jul 5, 2023
1e83b01
refactor: Move utility defensive checkers to dedicated module
Jul 6, 2023
10bacf6
feat: Added optional performance testing to pipeline
Jul 6, 2023
dbf76cc
refactor: Move more utilities to defence module
Jul 6, 2023
7fc2cd9
test: Func returns None on pass
Jul 7, 2023
9d9881e
test: bbox_filter behaviour on pass
Jul 7, 2023
184ccde
feat: Func filters GTFS based on bbox
Jul 7, 2023
b0aa256
feat: Class now prints calendar date range
Jul 7, 2023
62b60ab
feat: Defence check list & elements optionally
Jul 7, 2023
f00dabb
feat: Pipeline checks GTFS archive exists & extracts available dates …
Jul 7, 2023
50a8987
refactor: Minor string styling in STOP_HULL_PTH
Jul 7, 2023
703140c
test: Defensive behaviour for utility func
Jul 10, 2023
de17234
refactor: Avoid complexity hook
Jul 10, 2023
85c771f
test: Assert refcatored message calls
Jul 10, 2023
67f1b4e
feat: Handle attribute error
Jul 10, 2023
c2554fd
test: Exception handle when no alerts exist
Jul 10, 2023
691d88d
test: Raises error on missing parent dir when create=False
Jul 10, 2023
b2ac376
test: Simulated condition where shape_id missing from shapes.txt
Jul 10, 2023
5341729
test: viz_stops handles cases with missing stops_id
Jul 10, 2023
212be70
refactor: Add assertion fail messages to all tests
Jul 10, 2023
cae3757
refactor: Add print statements to assertion fails
Jul 10, 2023
f9308cf
chore: Add more assertion fail print statements
Jul 10, 2023
616c3e7
refactor: Test expected columns
Jul 10, 2023
c17668e
chore: Add assertion failure message
Jul 10, 2023
23467cd
chore: Add pyrojroot to deps
Jul 10, 2023
b2fff03
refactor: Run pytest with no runsetup check
Jul 10, 2023
e52a6c4
chore: Try fixing np version
Jul 10, 2023
bb886d5
refactor: Class name syntax
Jul 11, 2023
8b809e8
chore: Merge dev breaking changes & update import statements
Jul 19, 2023
6215a1e
feat: Func writes filtered osm pbf
Jul 25, 2023
55698ff
refactor: Exception statements
Jul 25, 2023
4989f78
chore: Rm placeholder module
Jul 25, 2023
434b962
refactor: Defensive bool checks
Jul 25, 2023
5628d50
feat: Create out_pth parent directory if not exist
Jul 25, 2023
9f6db33
refactor: Handle cases where relative path has no explicit parent dir…
Jul 25, 2023
4b93c34
feat: Optionally tries to install osmosis
Jul 25, 2023
8e677b7
refactor: _is_expected_filetype internal func
Jul 25, 2023
ba4465d
feat: Defensive check paths in and out
Jul 25, 2023
646fae3
refactor: Bool check all boolean inputs
Jul 25, 2023
040635a
feat: Defensive check lat & long values
Jul 25, 2023
73ec169
fix: Pytest doesn't like test modules with same name
Jul 25, 2023
f054283
test: Defensive check pbf_pth
Jul 25, 2023
8ade074
test: Boolean defense
Jul 25, 2023
cfa9717
test: Defensive checks on bbox
Jul 25, 2023
38c6018
test: Out_pth defense
Jul 25, 2023
3f4c81d
refactor: Numbering of error messages
Jul 25, 2023
277ab35
test: Mocked missing osmosis raises Exception
Jul 25, 2023
5d1e0cd
test: Integration test with osmosis
Jul 26, 2023
892b70e
chore: Add print statement for test failure
Jul 26, 2023
5a08467
chore: Add osmosis & runinteg tests to CI workflow
Jul 26, 2023
883874d
Merge branch 'dev' into 29-clip-osm
Jul 26, 2023
8f50c65
chore: Seperate build workflow for integration tests relying on homeb…
Jul 26, 2023
7561bc6
chore: Rename workflow file
Jul 26, 2023
e36db0f
test: Default osmosis cmd starts as expected
Jul 26, 2023
6dea0e5
refactor: Defense on missing osmosis moved to own test func
Jul 26, 2023
efbd3be
test: Print statement encountered on missing osmosis
Jul 26, 2023
124d239
fix: Reinstate runinteg mark
Jul 26, 2023
696c8cd
chore: Try install-package hook
Jul 27, 2023
402d27b
chore: Rm surplus workflow file
Jul 27, 2023
9cb601d
chore: Avoid re-executing unmarked tests
Jul 27, 2023
91158fe
resolved merge conflicts with dev
ethan-moss Aug 17, 2023
91bbdb4
fix: using refactored gtfs type defence
ethan-moss Aug 17, 2023
a0dbea9
fix: gtfs file extension change during refactoring
ethan-moss Aug 17, 2023
a6d6fe6
fix: removed duplicate files caused by refactoring
ethan-moss Aug 17, 2023
872b045
chore: changed default PATH in 01-validate-gtfs.toml to test fixture
SergioRec Aug 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,21 @@ jobs:
- name: Check Java Install
run: |
java --version
- name: Install Osmosis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions to install osmosis need to be added to CONTRIBUTING.md.
#79

uses: ConorMacBride/install-package@v1
with:
brew: osmosis
apt: osmosis
- name: Run Integration Tests
run: |
pytest -m runinteg --runinteg # run only those tests marked runinteg
- name: pre-commit
run: |
pre-commit install
pre-commit run --all-files
- name: Test with pytest
run: |
pytest --runsetup
pytest
- name: Generate Report
run: |
coverage run -m pytest
Expand Down
4 changes: 2 additions & 2 deletions pipeline/gtfs/01-validate-gtfs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import subprocess

from transport_performance.gtfs.validation import GtfsInstance
from transport_performance.utils.defence import _is_gtfs_pth
from transport_performance.utils.defence import _is_expected_filetype

CONFIG = toml.load(here("pipeline/gtfs/config/01-validate-gtfs.toml"))
GTFS_PTH = here(CONFIG["GTFS"]["PATH"])
Expand All @@ -26,7 +26,7 @@
HULL_MAP_PATH = CONFIG["MAPS"]["STOP_HULL_PTH"]
PROFILING = CONFIG["UTILS"]["PROFILING"]
# check GTFS Path exists
_is_gtfs_pth(pth=GTFS_PTH, param_nm="GTFS_PTH", check_existing=True)
_is_expected_filetype(pth=GTFS_PTH, param_nm="GTFS_PTH", check_existing=True)
# Get the disk usage of the GTFS file.
gtfs_du = (
subprocess.check_output(["du", "-sh", GTFS_PTH]).split()[0].decode("utf-8")
Expand Down
2 changes: 1 addition & 1 deletion pipeline/gtfs/config/01-validate-gtfs.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
title = "Config for GTFS Validation Pipeline"

[GTFS]
PATH = "data/external/croppednewport-bus-07-07-2022_gtfs.zip"
PATH = "tests/data/newport-20230613_gtfs.zip"
UNITS = "m"
GEOMETRIC_CRS = 27700 # used for area calculations only

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ beautifulsoup4
requests
pytest-mock
toml
rasterio
pyprojroot
rasterio
matplotlib
plotly
nbformat>=4.2.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
from shapely.geometry import box
from pyprojroot import here

from transport_performance.utils.defence import _is_gtfs_pth, _check_list
from transport_performance.utils.defence import (
_is_expected_filetype,
_check_list,
)


def bbox_filter_gtfs(
Expand Down Expand Up @@ -38,8 +41,10 @@ def bbox_filter_gtfs(
None

"""
_is_gtfs_pth(pth=in_pth, param_nm="in_pth")
_is_gtfs_pth(pth=out_pth, param_nm="out_pth", check_existing=False)
_is_expected_filetype(pth=in_pth, param_nm="in_pth")
_is_expected_filetype(
pth=out_pth, param_nm="out_pth", check_existing=False
)
_check_list(ls=bbox_list, param_nm="bbox_list", exp_type=float)
for param in [units, crs]:
if not isinstance(param, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is going to be used to crop GTFS based on a provided box, we should rename the local variable newport_feed to something generic.

#78

Expand Down
2 changes: 1 addition & 1 deletion src/transport_performance/gtfs/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def scrape_route_type_lookup(
for url in [gtfs_url, ext_spec_url]:
_url_defence(url)

_bool_defence(extended_schema)
_bool_defence(extended_schema, "extended_schema")
# Get the basic scheme lookup
resp_txt = _get_response_text(gtfs_url)
soup = BeautifulSoup(resp_txt, "html.parser")
Expand Down
4 changes: 2 additions & 2 deletions src/transport_performance/gtfs/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

from transport_performance.gtfs.routes import scrape_route_type_lookup
from transport_performance.utils.defence import (
_is_gtfs_pth,
_is_expected_filetype,
_check_namespace_export,
_check_parent_dir_exists,
)
Expand Down Expand Up @@ -91,7 +91,7 @@ class GtfsInstance:
def __init__(
self, gtfs_pth=here("tests/data/newport-20230613_gtfs.zip"), units="m"
):
_is_gtfs_pth(pth=gtfs_pth, param_nm="gtfs_pth")
_is_expected_filetype(pth=gtfs_pth, param_nm="gtfs_pth")

# validate units param
if not isinstance(units, str):
Expand Down
108 changes: 108 additions & 0 deletions src/transport_performance/osm/osm_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""Utility functions for OSM files."""
import subprocess
from pyprojroot import here

from transport_performance.utils.defence import (
_bool_defence,
_check_list,
_check_parent_dir_exists,
_is_expected_filetype,
)


def filter_osm(
pbf_pth=here("tests/data/newport-2023-06-13.osm.pbf"),
out_pth="filtered.osm.pbf",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of out_pth is only the filename, so the output will be saved in root. Could this be changed to outputs/osm?
#80

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using a path in the out_pth argument, using backslashes in the string is not recognised as a path and instead creates a file with that character or whatever escape character happens in the filename. Using double slash to prevent escape character just creates a file with the slash in the name.

Will add to #61 as it's related to helper function _check_parent_dir_exists.

bbox=[-3.01, 51.58, -2.99, 51.59],
tag_filter=True,
install_osmosis=False,
):
"""Filter an osm.pbf file to a bbox. Relies on homebrew with osmosis.

Parameters
----------
pbf_pth: ((str, pathlib.PosixPath), optional)
Path to the open street map pbf to be filtered. Defaults to
here("tests/data/newport-2023-06-13.osm.pbf").
out_pth: ((str, pathlib.PosixPath), optional)
Path to write to. Defaults to "filtered.osm.pbf".
bbox: (list, optional)
Bounding box used to perform the filter, in left, bottom, right top
order. Defaults to [-3.01, 51.58, -2.99, 51.59].
tag_filter: (bool, optional)
Should non-highway ways be filtered? Excludes waterway, landuse &
natural. Defaults to True.
install_osmosis: (bool, optional)
Should brew be used to install osmosis if not found. Defaults to False.

Raises
------
Exception: Subprocess error.
Exception: Osmosis not found. Will not raise if install_osmosis=True.

Returns
-------
None

"""
# defence
_is_expected_filetype(pbf_pth, param_nm="pbf_pth", exp_ext=".pbf")
_is_expected_filetype(
out_pth, param_nm="out_pth", exp_ext=".pbf", check_existing=False
)
for nm, val in {
"tag_filter": tag_filter,
"install_osmosis": install_osmosis,
}.items():
_bool_defence(val, param_nm=nm)
# check bbox values makes sense, else osmosis will error
if not bbox[0] < bbox[2]:
raise ValueError(
(
f"Bounding box longitude West {bbox[0]}"
f" is not smaller than East {bbox[2]}"
)
)
elif not bbox[1] < bbox[3]:
raise ValueError(
(
f"Bounding box latitude South {bbox[1]}"
f" is not smaller than North {bbox[3]}"
)
)

_check_list(bbox, param_nm="bbox", check_elements=True, exp_type=float)
_check_parent_dir_exists(out_pth, param_nm="out_pth", create=True)
# Compile the osmosis command
cmd = [
"osmosis",
"--read-pbf",
pbf_pth.as_posix(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PosixPath() will likely not work in Windows machines. According to the pathlib documentation, the method .as_posix returns a string representation of the path with forward slashes. Perhaps worth exploring a more OS agnostic method for this.
#81

"--bounding-box",
f"left={bbox[0]}",
f"bottom={bbox[1]}",
f"right={bbox[2]}",
f"top={bbox[3]}",
]
if tag_filter: # optionaly filter ways
print("Rejecting ways: waterway, landuse & natural.")
cmd.extend(["--tf", "reject-ways", "waterway=* landuse=* natural=*"])

cmd.extend(["--used-node", "--write-pbf", out_pth])

try:
subprocess.run(cmd, check=True)
print(f"Filter completed. Written to {out_pth}")
except subprocess.CalledProcessError as e1:
raise Exception(f"Error executing osmosis command: {e1}")
except FileNotFoundError as e2:
if install_osmosis:
print(f"osmosis command was not recognised: {e2}. Trying install.")
subprocess.run(["brew", "install", "osmosis"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subprocess.run(["brew", "install", "osmosis"]) will only run in macOS systems. We could add a more system agnostic way of doing this? E.g. check OS and use an if statement to run different commands depending on OS.
#82

print("Installation of `osmosis successful.`")
subprocess.run(cmd, check=True)
print(f"Retry filter pbf completed. Written to {out_pth}")
else:
raise Exception("`osmosis` is not found. Please install.")

return None
20 changes: 12 additions & 8 deletions src/transport_performance/utils/defence.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ def _is_path_like(pth, param_nm):

def _check_parent_dir_exists(pth, param_nm, create=False):
_is_path_like(pth, param_nm)
# realpath helps to catch cases where relative paths are passed in main
pth = os.path.realpath(pth)
parent = os.path.dirname(pth)
if not os.path.exists(parent):
if create:
Expand All @@ -43,8 +45,8 @@ def _check_parent_dir_exists(pth, param_nm, create=False):
return None


def _is_gtfs_pth(pth, param_nm, check_existing=True):
"""Handle file paths that should be existing GTFS feeds.
def _is_expected_filetype(pth, param_nm, check_existing=True, exp_ext=".zip"):
"""Handle file paths that should be existing filetypes.

Parameters
----------
Expand All @@ -53,13 +55,15 @@ def _is_gtfs_pth(pth, param_nm, check_existing=True):
param_nm : str
The name of the parameter being tested. Helps with debugging.
check_existing : bool
Whether to check if the GTFS file already exists. Defaults to True.
Whether to check if the filetype file already exists. Defaults to True.
exp_ext: str
The expected filetype.

Raises
------
TypeError: `pth` is not either of string or pathlib.PosixPath.
FileExistsError: `pth` does not exist on disk.
ValueError: `pth` does not have a `.zip` file extension.
ValueError: `pth` does not have the expected file extension.

Returns
-------
Expand All @@ -71,9 +75,9 @@ def _is_gtfs_pth(pth, param_nm, check_existing=True):
_, ext = os.path.splitext(pth)
if check_existing and not os.path.exists(pth):
raise FileExistsError(f"{pth} not found on file.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is unclear.

>>> filter_osm(pbf_pth = 'folder/subfolder/blah.pbf')
FileExistsError: folder/subfolder/blah.pbf not found on file.

Something like <file> not found would read better.

#83

if ext != ".zip":
if ext != exp_ext:
raise ValueError(
f"`gtfs_pth` expected a zip file extension. Found {ext}"
f"`{param_nm}` expected file extension {exp_ext}. Found {ext}"
)

return None
Expand Down Expand Up @@ -109,11 +113,11 @@ def _url_defence(url):
return None


def _bool_defence(some_bool):
def _bool_defence(some_bool, param_nm):
"""Defence checking. Not exported."""
if not isinstance(some_bool, bool):
raise TypeError(
f"`extended_schema` expected boolean. Got {type(some_bool)}"
f"`{param_nm}` expected boolean. Got {type(some_bool)}"
)

return None
Expand Down
2 changes: 1 addition & 1 deletion tests/gtfs/test_utils.py → tests/gtfs/test_gtfs_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import os
import pytest

from transport_performance.gtfs.utils import bbox_filter_gtfs
from transport_performance.gtfs.gtfs_utils import bbox_filter_gtfs
from transport_performance.gtfs.validation import GtfsInstance


Expand Down
2 changes: 1 addition & 1 deletion tests/gtfs/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def test_init_defensive_behaviours(self):
# a case where file is found but not a zip directory
with pytest.raises(
ValueError,
match=r"`gtfs_pth` expected a zip file extension. Found .pbf",
match=r"`gtfs_pth` expected file extension .zip. Found .pbf",
):
GtfsInstance(
gtfs_pth=here("tests/data/newport-2023-06-13.osm.pbf")
Expand Down
Loading