-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #17 from cal-itp/feat-14-version1
Feat 14 version1
- Loading branch information
Showing
12 changed files
with
180 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
name: CI | ||
|
||
on: | ||
push: | ||
release: | ||
types: [ published ] | ||
|
||
jobs: | ||
checks: | ||
name: "Run Tests" | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Python | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: '3.9' | ||
- name: Set up Pre-commit | ||
uses: pre-commit/[email protected] | ||
release: | ||
name: "Release to PyPI" | ||
runs-on: ubuntu-latest | ||
needs: checks | ||
if: "github.event_name == 'release' && startsWith(github.event.release.tag_name, 'v')" | ||
steps: | ||
|
||
- uses: actions/checkout@v2 | ||
- name: "Set up Python" | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: '3.9' | ||
- name: "Build package" | ||
run: | | ||
python setup.py build sdist | ||
- name: "TEST Upload to PyPI" | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
if: github.event.release.prerelease | ||
with: | ||
user: __token__ | ||
password: ${{ secrets.PYPI_TEST_API_TOKEN }} | ||
repository_url: https://test.pypi.org/legacy/ | ||
|
||
- name: "Upload to PyPI" | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
if: "!github.event.release.prerelease" | ||
with: | ||
user: __token__ | ||
password: ${{ secrets.PYPI_API_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,51 +1,54 @@ | ||
# Feed Checker | ||
# GTFS Aggregator Checker | ||
|
||
This repo is to verify that a given list of feeds is listed in feed aggregators. | ||
Currently it checks transit.land and transitfeeds.com to verify that feeds are | ||
listed in an aggregator. | ||
|
||
## Installation | ||
|
||
## Requirements | ||
``` | ||
pip install gtfs-aggregator-checker | ||
``` | ||
|
||
* `.env` - Acquire an [api key from transitland][1] and save it to a `.env` file | ||
like `TRANSITLAND_API_KEY=SECRET`. Alternatively you can prefix commands with | ||
the api key like `TRANSITLAND_API_KEY=SECRET python feed_checker.py [...]`. | ||
## Configure | ||
|
||
* `agencies.yml` - This file can have any structure as the feed checker just | ||
looks for any urls (strings starting with `'http://'`), but the intended usage | ||
is a [Cal-ITP agencies.yml file][2]. (to run the program without an | ||
`agencies.yml` file, see the "Options" section below) | ||
The following env variables can be set in a `.env` file, set to the environment, | ||
or inline like `TRANSITLAND_API_KEY=SECRET python -m gtfs_aggregator_checker`. | ||
|
||
## Getting Started | ||
* `TRANSITLAND_API_KEY` An [api key from transitland][1]. | ||
|
||
To install requirments and check urls run the following. The first time you run | ||
this it will take a while since the cache is empty. | ||
* `GTFS_CACHE_DIR` Folder to save cached files to. Defaults to | ||
`~/.cache/gtfs-aggregator-checker` | ||
|
||
``` bash | ||
pip install -r requirements.txt | ||
python feed_checker.py | ||
``` | ||
## Getting Started | ||
|
||
The final line of stdout will tell how many urls were in `agencies.yml` and how | ||
many of those were matched in a feed. Above that it will list the domains for | ||
each url (in alphabetical order) as well group paths based on if the path was | ||
matched (in both `agencies.yml` and aggregator), missing (in `agencies.yml` but | ||
not aggregator) or unused (in aggregator but not in `agencies.yml`). An ideal | ||
outcome would mean the missing column is empty for all domains. | ||
## CLI Usage | ||
|
||
`python -m gtfs_aggregator_checker [YAML_FILE] [OPTIONS]` | ||
|
||
## CLI Usage | ||
`python -m gtfs_aggregator_checker` or `python -m gtfs_aggregator_checker | ||
/path/to/yml` will search a [Cal-ITP agencies.yml file][2] for any urls and see | ||
if they are present in any of the feed aggregators. Alternatively you can use a | ||
`--csv-file` or `--url` instead of an `agencies.yml` file. | ||
|
||
`python feed_checker.py` or `python feed_checker.py /path/to/yml` will search a | ||
[Cal-ITP agencies.yml file][2] for any urls and see if they are present in any | ||
of the feed aggregators. | ||
The final line of stdout will tell how many urls were in `agencies.yml` and how | ||
many of those were matched in a feed. | ||
|
||
### Options | ||
* `python feed_checker.py --help` print the help | ||
* `--csv-file agencies.csv` load a csv instead of a Cal-ITP agencies yaml file (one url per line) | ||
* `--url http://example.com` Check a single url instead of a Cal-ITP agencies yaml file | ||
* `--verbose` Print a table of all results (organized by domain) | ||
* `python -m gtfs_aggregator_checker --help` print the help | ||
* `--csv-file agencies.csv` load a csv instead of a Cal-ITP agencies yaml file | ||
(one url per line) | ||
* `--url http://example.com` Check a single url instead of a Cal-ITP agencies | ||
yaml file | ||
* `--output /path/to/file.json` Save the results as a json file | ||
|
||
[1]: https://www.transit.land/documentation/index#signing-up-for-an-api-key | ||
[2]: https://github.com/cal-itp/data-infra/blob/main/airflow/data/agencies.yml | ||
|
||
## Development | ||
|
||
Clone this repo and `pip install -e /pat/to/feed-checker` to develop locally. | ||
|
||
By default, downloaded files (raw html files, api requsets) will be saved to | ||
`~/.cache/calitp_gtfs_aggregator_checker`. This greatly reduces the time | ||
required to run the script. Delete this folder to reset the cache. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import typer | ||
|
||
from . import check_feeds | ||
|
||
|
||
def main( | ||
yml_file=typer.Argument("agencies.yml", help="A yml file containing urls"), | ||
csv_file=typer.Option(None, help="A csv file (one url per line)"), | ||
url=typer.Option(None, help="URL to check instead of a file",), | ||
output=typer.Option(None, help="Path to a file to save output to."), | ||
): | ||
check_feeds(yml_file=yml_file, csv_file=csv_file, url=url, output=output) | ||
|
||
|
||
typer.run(main) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import os | ||
from pathlib import Path | ||
import urllib.error | ||
import urllib.request | ||
|
||
from .utils import url_split | ||
|
||
|
||
def get_cache_dir(): | ||
if "GTFS_CACHE_DIR" in os.environ: | ||
path = Path(os.environ["GTFS_CACHE_DIR"]) | ||
else: | ||
path = Path.home() / ".cache/gtfs-aggregator-checker" | ||
path.mkdir(exist_ok=True, parents=True) | ||
return path | ||
|
||
|
||
def get_cached(key, func, directory=None): | ||
if not directory: | ||
directory = get_cache_dir() | ||
path = directory / key | ||
if not path.exists(): | ||
content = func() | ||
with open(path, "w") as f: | ||
f.write(content) | ||
with open(path, "r") as f: | ||
return f.read() | ||
|
||
|
||
def curl_cached(url, key=None): | ||
domain, path = url_split(url) | ||
if key is None: | ||
key = path.replace("/", "__") | ||
if len(key) > 255: | ||
key = key[:255] # max filename length is 255 | ||
|
||
def get(): | ||
req = urllib.request.Request(url) | ||
r = urllib.request.urlopen(req) | ||
return r.read().decode() | ||
|
||
path = get_cache_dir() / domain | ||
path.mkdir(exist_ok=True, parents=True) | ||
return get_cached(key, get, directory=path) |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
#!/usr/bin/env python | ||
|
||
import re | ||
from setuptools import setup, find_namespace_packages | ||
|
||
_version_re = re.compile(r"__version__\s+=\s+(.*)") | ||
|
||
with open("gtfs_aggregator_checker/__init__.py", "r") as f: | ||
version = _version_re.search(f.read()).group(1).strip("'\"") | ||
|
||
with open("README.md", "r") as f: | ||
long_description = f.read() | ||
|
||
setup( | ||
name="gtfs_aggregator_checker", | ||
version=version, | ||
packages=find_namespace_packages(), | ||
install_requires=[ | ||
"beautifulsoup4", | ||
"python-dotenv", | ||
"PyYAML", | ||
"requests", | ||
"typer", | ||
], | ||
description="Tool for checking if transit urls are on aggregator websites", | ||
long_description=long_description, | ||
long_description_content_type="text/markdown", | ||
author="", | ||
author_email="", | ||
url="https://github.com/cal-itp/gtfs-aggregator-checker", | ||
) |