diff --git a/README.md b/README.md index 109eeaa8e..4359e5b76 100644 --- a/README.md +++ b/README.md @@ -2,43 +2,16 @@ GTFS data quality reports for California transit providers -## Repository structure +#### Repository structure This repository is set up in two pieces: - `reports/` subfolder - generates underlying GTFS data for each report. - `website/` subfolder - uses `generate.py` and `../templates/` to create the static reports website. -## Generating the reports +## To Get Started -See [this screencast](https://www.loom.com/share/b45317053ff54b9fbb46b8159947c379) for a full walkthrough of building the reports. - -#### Generating Reports Data - -The following steps are run within the `reports` folder. - -- `make generate_parameters` runs the `generate_ids.py` file which generates: - 1. `outputs/index_report.json` - a file that lists every agency name and `outputs/YYYY/MM` folder - 2. `outputs/YYYY/MM` for every agency -- `make MONTH=02 YEAR=2023 all -j 15` runs the following commands: - 1. `python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json generates json files in `outputs/YYYY/MM/AGENCY_NUM/` directories - -The files in each `outputs/YYYY/MM/AGENCY_NUM/` directory are used to generate the static HTML (see below). - -All report data for every month can be generated by running: ``python run_all_months.py``. - -#### Building the website - -To build the website, run ``npm run build`` in the ``website`` folder. Run ``npm run dev`` for verbose output, which can help with troubleshooting. These commands perform the following: - -- Python script `website/generate.py` loads JSON from the `reports/outputs/YYYY/MM/ITPID/data` directory and applies it to template files in `/templates` -- HTML templates written with [Jinja](https://jinja.palletsprojects.com/en/3.0.x/) -- CSS written with [SCSS](https://sass-lang.com/documentation/syntax#scss) and [Tailwind](https://tailwindcss.com/docs) via [PostCSS](https://postcss.org/) -- JS behavior added with [Alpine.js](https://alpinejs.dev) - - Bundled with [Rollup](https://rollupjs.org/guide/en/) -- Build scripts via [NPM](https://www.npmjs.com/) - -### Set up google cloud credentials +### Set up Google Cloud credentials Set up [google cloud authentication credentials](https://cloud.google.com/docs/authentication/getting-started). @@ -49,15 +22,15 @@ Specifically, download the SDK/CLI at the above link, install it, create a new t Note that with a user account authentication, the environment variable `CALITP_SERVICE_KEY_PATH` should be unset. -### Running Locally +### To Run Locally -#### Virtual environment +#### with a Virtual Environment 1. `source .venv/bin/activate` to activate Python virtual environment 2. `pip install -r requirements.txt` to download Python dependencies 3. `npm install` to download npm dependencies -### Running via Docker-compose +#### with Docker-compose Note that the folder also contains a `docker-compose.yml`, so it is possible to run the build inside docker by running these commands first. In this case, docker first needs to be [installed locally](https://docs.docker.com/get-docker/), setting resources as desired (i.e. enable 6 cores if you have an 8 core machine, etc). @@ -69,7 +42,11 @@ docker-compose run --rm --service-ports calitp_reports /bin/bash If google credentials are already configured on the host, the local credential files should already be mounted in the container, but it may only be necessary to run `gcloud auth application-default login` from within the container. -### Executing Report Generation +## Executing Report Generation + +See [this screencast](https://www.loom.com/share/b45317053ff54b9fbb46b8159947c379) for a full walkthrough of building the reports. + +### Generating the Reports Data The following takes place within the reports subfolder, i.e. (`cd reports`). @@ -79,7 +56,7 @@ When looking for a clean start (i.e. start from scratch) run: make clean ``` -#### Fetching report data +#### Fetch existing report data Run the gsutil rsync to update all the locally stored reports. Note that `gtfs-data-test` can be replaced with `gtfs-data` for testing on production data: @@ -87,24 +64,27 @@ Note that `gtfs-data-test` can be replaced with `gtfs-data` for testing on produ gsutil -m rsync -r gs://gtfs-data-test/report_gtfs_schedule outputs ``` -#### Generating reports -Next, start the report generation: +#### Generate the index file and create the outputs folder structure +`make generate_parameters` runs the `generate_ids.py` file which generates: + 1. `outputs/index_report.json` - a file that lists every agency name and `outputs/YYYY/MM` folder + 2. `outputs/YYYY/MM` for every agency + +#### Run the data +`make MONTH=02 YEAR=2023 all -j 15` runs the following commands: + 1. `python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json generates json files in `outputs/YYYY/MM/AGENCY_NUM/` directories -```shell -make generate_parameters -make MONTH=02 YEAR=2023 all -j 15 -``` Where: * the number after `MONTH=` is the desired numerical month (`02` in this case) * the number after `YEAR=` is the desired numerical YEAR (`2023` in this case) * the number after `-j` is the number of parallel threads (`15` in this case) -This will create data for one month within the reports/outputs folder. +The files in each `outputs/YYYY/MM/AGENCY_NUM/` directory are used to generate the static HTML (see below). + +This will create data for one month within the reports/outputs folder. All report data for every month can be generated by running: ``python run_all_months.py``. **NOTE** that the MONTH refers to the month of the folders that will be generated. This is different than the ``publish_date``, which is the first day of the next month for a given report. I.e. ``make MONTH=02 YEAR=2023 all -j 15`` will create ``outputs/2023/02/*`` folders, whereas the ``publish_date`` for the data in those folders is ``2023-03-01``. -Note that running too many threads (i.e. parallel queries, such as `30` or more) may not complete successfully if many other BigQuery queries are happening simultaneously: [BigQuery has a limit of 100 concurrent queries](https://cloud.google.com/bigquery/quotas). -If this is the case, try rerunning with fewer threads (i.e. `make all -j 8`). +**NOTE** that running too many threads (i.e. parallel queries, such as `30` or more) may not complete successfully if many other BigQuery queries are happening simultaneously: [BigQuery has a limit of 100 concurrent queries](https://cloud.google.com/bigquery/quotas). If this is the case, try rerunning with fewer threads (i.e. `make all -j 8`). #### Validating the report creation @@ -122,17 +102,29 @@ If there is a missing month, an individual month can be run with the following c python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json ``` -### Build website +#### Testing + +Tests can be run locally from the ``tests`` directory by running ``python test_report_data.py``. These tests are run on commits through a github action. -Once every single report is generated, navigate to the website subfolder (i.e. `cd ../website`), install the npm dependencies, and build the website. +### Building the website + +Once the report data has been generated navigate to the website subfolder (i.e. `cd ../website`), install the npm dependencies if you haven't done so already, and build the website. ```shell npm install npm run build ``` -This will run the script in generate.py that will render the index.html, monthly report index pages, and the individual reports. -It will also apply the various jinja templates to the reports, JS frameworks, and CSS styles. It is worth mentioning that `npm run build` will currently only execute if you have data from previous months. +These commands perform the following: + +- Python script `website/generate.py` loads JSON from the `reports/outputs/YYYY/MM/ITPID/data` directory and applies it to template files in `/templates` +- HTML templates written with [Jinja](https://jinja.palletsprojects.com/en/3.0.x/) +- CSS written with [SCSS](https://sass-lang.com/documentation/syntax#scss) and [Tailwind](https://tailwindcss.com/docs) via [PostCSS](https://postcss.org/) +- JS behavior added with [Alpine.js](https://alpinejs.dev) + - Bundled with [Rollup](https://rollupjs.org/guide/en/) +- Build scripts via [NPM](https://www.npmjs.com/) + +It is worth mentioning that `npm run build` will currently only execute if you have data from previous months. Run ``npm run dev`` for verbose output and to see which month is failing, which can help with troubleshooting. Note that the error: ```shell @@ -140,7 +132,7 @@ jinja2.exceptions.UndefinedError: 'feed_info' is undefined ``` Is often due to a lack of generated reports. This can be remedied for prior months by rsyncing the reports from the upstream source (see [Fetching report data](#fetching-report-data)), and ensuring every single ITPID has a corresponding generated report for the current month (see [Generating reports](#generating-reports)). -Run ``npm run dev`` for more verbose output, to see which month is failing. +#### Viewing the website To check that everything is rendered appropriately, go into the website/build (i.e. `cd build`) directory: @@ -150,11 +142,10 @@ python -m http.server and open up a web browser, and navigate to: [localhost:8000](localhost:8000) -### Testing -Tests can be run locally from the ``tests`` directory by running ``python test_report_data.py``. These tests are run on commits through a github action. +### Pushing Data to Google Cloud -### Pushing to google cloud - Development +#### Pushing to Development The next step is to update the development bucket in google cloud with the new data. In the case where data must be overwritten (please use caution!) a `-d` flag can be added to the command @@ -171,7 +162,7 @@ PR to main. This site can be viewed at `https://development-build--cal-itp-repor > you can produce empty commits with `git commit --allow-empty` and merge those > into the main branch. -### Pushing to google cloud - Production +#### Pushing to Production Assuming that all the data is correct in development, you can sync the test data to production.