-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploratory data analysis for reports site data model: Vendor information & organizations/services/datasets #194
Comments
self-assigned per convo today w/ @tiffanychu90, going to prioritize this since it's been blocking us for a while |
It appears that there are multiple component values relevant in some way to either the publishing of GTFS or GTFS-RT. Right now the table is set to manually code them in order to support a simplified presentation of schedule and RT vendors on the reports site. For these components, the listed products appear reasonable. For the site, I propose reporting and organizing on the vendor level rather than the product level, with the exception of in-house and pending product values.
185 total organizations in March reports (date_start = '2023-02-01') 8 are missing at least one schedule vendor (or pending/in-house)
In other words, we're missing schedule vendor info for about 4% of total orgs, and RT vendor info for 26% of orgs that have RT. SQL for organizations with neither RT nor Schedule vendors:
There appears to be no product/component data in Airtable for these. Since in-house is a value entered for some other service components, we shouldn't assume that its done in-house. The long term path would be for the Transit Data Quality team to connect with these agencies and determine how to correctly track.
Proposing we put this on hold for now since service definitions may be changing soon. This situation does appear rare, and by keeping vendor information fairly general and aggregated at the organization level I think we can move forward. Also see warehouse work at cal-itp/data-infra#2374 More thoughts to come on how to operationalize. |
#181 contemplates using vendor information to filter the monthly index on the reports site -- I think if an organization has multiple vendors then filtering the index to any combination of those vendors should show a link to their report |
Before tackling #181 and #193, we need to investigate the state of the Airtable data model with respect to the fields that we want to incorporate into the reports site to get a sense for data completeness, up-to-dateness, and any risks of fanout (many-to-many relationships).
Specifically, we want to investigate:
dim_services
,dim_service_components
, anddim_components
tables)Submitting this ticket in the reports repo because it is directly associated with planned feature & data development on the reports site, but these questions are probably of broader interest as well.
The text was updated successfully, but these errors were encountered: