Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTFS Guidelines: links to feeds in aggregators #574

Closed
1 task done
evansiroky opened this issue Oct 25, 2021 · 10 comments
Closed
1 task done

GTFS Guidelines: links to feeds in aggregators #574

evansiroky opened this issue Oct 25, 2021 · 10 comments
Labels
product: transit-data-quality Items that are a part of the Transit Data Quality Product of which @evansiroky is the product owner. project-msd Issues related to the mobility services data project

Comments

@evansiroky
Copy link
Member

evansiroky commented Oct 25, 2021

Question

Are all of the provider's feed links cataloged in transit.land and openmobilitydata.org?

Metrics

  • Whether the feeds are in transit.land and are the same as our known links
  • Whether the feeds are in openmobilitydata.org and are the same as our known links

Data sources

  • transit.land API
  • openmobilitydata.org API
  • mapping or code to automatically discover provider listing in aggregator
  • Airtable assessment until code works?

Dependencies

(Data Servicess Team to Copy and Fill Out Below)

The QuVR MD template below will be filled out by a member of the data services team.
This allows us to describe the request, in a way that is easy to hand-off for analysis.
After the research phase, we will sync with the asker to figure out if the metric and dashboard pieces are needed.

Before starting research:

  • Question: Question written as a single sentence.
  • View: E.g. views.gtfs_schedule_fact_daily_feed_files.
  • Research:
    • How should the results be presented?
    • When are they needed by?

After reviewing research with the asker:

  • Metric: what specific calculations are needed?
  • Dashboard: where should we put the result?
@evansiroky evansiroky added this to the GTFS Guidelines milestone Oct 25, 2021
@evansiroky evansiroky added project-msd Issues related to the mobility services data project status: info needed The issue needs more information before it can be resolved. labels Nov 10, 2021
@evansiroky evansiroky removed the status: info needed The issue needs more information before it can be resolved. label Jan 18, 2022
@holly-g
Copy link
Contributor

holly-g commented Apr 14, 2022

Hi @evansiroky, Andrew's PR to close out #924 will unblock this ticket.
Is this ticket a duplicate of Feature request: have the ability to return URLs not in input data within a certain region, which is on deck for the next sprint?

@evansiroky
Copy link
Member Author

evansiroky commented Apr 14, 2022

This is not a duplicate of cal-itp/gtfs-aggregator-checker#20. As noted in #924, this ticket could serve as a placeholder for the unit of work to create a metabase question in the GTFS Guidelines dashboard while #924 could be the unit of work for adding processing the data in the pipeline. Once there is a metabase question available with data, both #924 and this issue can be closed.

@lauriemerrell
Copy link
Contributor

TODO:

  • Create an external table to load the files output by the Airflow task created in Put aggregator checker into airflow #1374
  • Create a view to get that data into Metabase
  • Add a question to the GTFS Guidelines dashboard checking for this

@lauriemerrell lauriemerrell self-assigned this Apr 28, 2022
@holly-g holly-g assigned Nkdiaz and unassigned evansiroky May 3, 2022
@holly-g
Copy link
Contributor

holly-g commented May 3, 2022

@Nkdiaz and @lauriemerrell to pair

@evansiroky evansiroky added the product: transit-data-quality Items that are a part of the Transit Data Quality Product of which @evansiroky is the product owner. label May 6, 2022
@lauriemerrell
Copy link
Contributor

@evansiroky -- We are looking at the data now and it looks like the URLs for Transitland are always like https://transit.land/feeds/<identifier> and same for Transitfeeds (https://transitfeeds.com/p/<identifiers>), so the aggregator URLs will never be the same as the URL we are scraping unless we're actually scraping from the aggregator (which we do for ACE -- we get them from Transitfeeds -- but no other feeds). Do you still want a column that explicitly checks for sameness or that shows the aggregator URL in this question, or is a presence/absence check sufficient?

@evansiroky
Copy link
Member Author

I'm a little confused by what is being asked here and what data you're looking at. The gtfs-aggregator-checker is doing the API calls to transit.land and web scraping transit feeds to gather all URLs. Then it checks to see which input URLs were found in those sets. So unless I'm misunderstanding, I think the checks you're wondering about are already being done by the gtfs-aggregator-checker.

As far as desired presentation of the data to the user, in #924, I created an example output table for the GTFS Guidelines dashboard as follows:

URL type URL In transitfeeds.com? In transit.land?
gtfs_schedule_url https://transit.torranceca.gov/home/showdocument?id=16673 Absent Present
gtfs_rt_vehicle_positions_url null N/A N/A
gtfs_rt_service_alerts_url null N/A N/A
gtfs_rt_trip_updates_url null N/A N/A

At the very least, there needs to be an absence check. However, it would be nice to link to the relevant aggregator page when the feed is present. If you analyze the hyperlink in the above table you'll see a link to the transit.land entry for Torrance Transit in the In transit.land? column.

@lauriemerrell
Copy link
Contributor

Ahhh ok thanks for clarifying @evansiroky -- I was just going off of what is returned in the files that we have, and I thought the ask was that our URL was the same as the URL that is returned. But I see what you mean that our URL was already used to do that initial lookup.

I think that that table is possible, though I doubt we can get it to display like the word present with a hyperlink -- probably we would just list the raw link if it's available.

@evansiroky
Copy link
Member Author

I think that that table is possible, though I doubt we can get it to display like the word present with a hyperlink -- probably we would just list the raw link if it's available.

That works for me. Can you let me know if this linking feature is possible with Metabase though when trying this out? It'll help set expectations for what kind of design I can ask for in future metabase tables.

@holly-g
Copy link
Contributor

holly-g commented Jul 14, 2022

Checked in with Evan -- moving to icebox. Status to be re-visited 8/29.

@holly-g
Copy link
Contributor

holly-g commented Oct 4, 2022

See #1666

@holly-g holly-g closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product: transit-data-quality Items that are a part of the Transit Data Quality Product of which @evansiroky is the product owner. project-msd Issues related to the mobility services data project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants