Skip to content

Conversation

@lauriemerrell
Copy link
Contributor

@lauriemerrell lauriemerrell commented Sep 30, 2025

Description

Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.

  • Upgrades GTFS validator to v7.1: adds rules etc.
  • Updates Dockerfile to add poetry plugin to support exporting to requirements.txt
  • Updates the build & push GitHub workflow to push to development image when on a branch to facilitate testing of PodOperator
  • TODO: Add tests of validator Airflow task

Resolves #3763, resolves #4378

Remaining TODOs:

  • Update the as-of date once it's clearer when this will merge (should be a future date for cutover)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Include commands/logs/screenshots as relevant.

If making changes to dbt models, please run the command poetry run dbt run -s CHANGED_MODEL and poetry run dbt test -s CHANGED_MODEL, then include the output in this section of the PR.

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

@lauriemerrell lauriemerrell linked an issue Sep 30, 2025 that may be closed by this pull request
@lauriemerrell
Copy link
Contributor Author

If any reviewer looks at this -- setting a past cutover date for testing purposes, will bump

@lauriemerrell
Copy link
Contributor Author

TODO: Add pytest checks for the validator jobs

@github-actions
Copy link

github-actions bot commented Oct 1, 2025

Terraform plan in iac/cal-itp-data-infra-staging/airflow/us

Plan: 1 to add, 2 to change, 0 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+   create
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-staging-composer-dags["models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "FZrGzA==" -> (known after apply)
!~      detect_md5hash      = "xTWjkGqqzCqWNDhJxp6R/w==" -> "different hash"
!~      generation          = 1749663120395357 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"
!~      md5hash             = "xTWjkGqqzCqWNDhJxp6R/w==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["models/mart/gtfs_quality/_mart_gtfs_quality.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "Rs71zQ==" -> (known after apply)
!~      detect_md5hash      = "rnq255v0R46m5jQzuY/z0g==" -> "different hash"
!~      generation          = 1749663113676778 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/models/mart/gtfs_quality/_mart_gtfs_quality.yml"
!~      md5hash             = "rnq255v0R46m5jQzuY/z0g==" -> (known after apply)
        name                = "data/warehouse/models/mart/gtfs_quality/_mart_gtfs_quality.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"] will be created
+   resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
+       bucket         = "calitp-staging-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"
+       storage_class  = (known after apply)
    }

Plan: 1 to add, 2 to change, 0 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #858

@github-actions
Copy link

github-actions bot commented Oct 1, 2025

Terraform plan in iac/cal-itp-data-infra/airflow/us

Plan: 1 to add, 2 to change, 0 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+   create
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-composer-dags["models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "FZrGzA==" -> (known after apply)
!~      detect_md5hash      = "xTWjkGqqzCqWNDhJxp6R/w==" -> "different hash"
!~      generation          = 1751416661426004 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"
!~      md5hash             = "xTWjkGqqzCqWNDhJxp6R/w==" -> (known after apply)
        name                = "data/warehouse/models/intermediate/gtfs_quality/int_gtfs_quality__schedule_validator_rule_details_unioned.sql"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/mart/gtfs_quality/_mart_gtfs_quality.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "Rs71zQ==" -> (known after apply)
!~      detect_md5hash      = "rnq255v0R46m5jQzuY/z0g==" -> "different hash"
!~      generation          = 1751416667981834 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/mart/gtfs_quality/_mart_gtfs_quality.yml"
!~      md5hash             = "rnq255v0R46m5jQzuY/z0g==" -> (known after apply)
        name                = "data/warehouse/models/mart/gtfs_quality/_mart_gtfs_quality.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"] will be created
+   resource "google_storage_bucket_object" "calitp-composer-dags" {
+       bucket         = "calitp-composer"
+       content        = (sensitive value)
+       content_type   = (known after apply)
+       crc32c         = (known after apply)
+       detect_md5hash = "different hash"
+       generation     = (known after apply)
+       id             = (known after apply)
+       kms_key_name   = (known after apply)
+       md5hash        = (known after apply)
+       md5hexhash     = (known after apply)
+       media_link     = (known after apply)
+       name           = "data/warehouse/seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"
+       output_name    = (known after apply)
+       self_link      = (known after apply)
+       source         = "../../../../warehouse/seeds/gtfs_schedule_validator_rule_details_v7_1_0.csv"
+       storage_class  = (known after apply)
    }

Plan: 1 to add, 2 to change, 0 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #858

@lauriemerrell lauriemerrell changed the title Update to GTFS schedule validator v7.1 (#3763) DRAFT Update to GTFS schedule validator v7.1 (#3763) Oct 1, 2025
@lauriemerrell lauriemerrell changed the title DRAFT Update to GTFS schedule validator v7.1 (#3763) DRAFT Update to GTFS schedule validator v7.1 & add tests (#3763) Oct 6, 2025
@lauriemerrell lauriemerrell changed the title DRAFT Update to GTFS schedule validator v7.1 & add tests (#3763) DRAFT Update to GTFS schedule validator v7.1 & add tests (#3763, #4378) Oct 6, 2025
@lauriemerrell lauriemerrell force-pushed the 3763-update-to-gtfs-schedule-validator-v71 branch from c25d59d to e241830 Compare October 22, 2025 23:02
@lauriemerrell lauriemerrell force-pushed the 3763-update-to-gtfs-schedule-validator-v71 branch from c1e8cbb to 000368b Compare October 22, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Developer sees that there is unit test coverage for GTFS Schedule validate Update to GTFS Schedule Validator v7.1

2 participants