Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim header column names, added Metra test #152

Merged
merged 3 commits into from
Jan 1, 2024

Conversation

kylerchin
Copy link
Contributor

Metra https://metra.com/metra-gtfs-api would not load and I was getting complaints about that. The header from stops.txt looks like
stop_id, stop_name, stop_desc, stop_lat, stop_lon, zone_id, stop_url, wheelchair_boarding and thus stop_name is not eq to stop_name.

This resolves this issue by trimming the header column names for every file.

Metra Rail is added as a test to ensure this is working.

The patch is bumped up a version to 0.39.1.

Metra would not load and I was getting complaints about that.
The header looks like
stop_id, stop_name, stop_desc, stop_lat, stop_lon, zone_id, stop_url, wheelchair_boarding
and thus ` stop_name` is not eq to `stop_name`.

This resolves this issue by trimming the header column names for every file.

Metra Rail is added as a test to ensure this is working.
@Tristramg
Copy link
Collaborator

Nice finding. I’ll have to discuss a bit with people from https://github.com/etalab/transport-validator as such a bad file should yield an error on the validator, but still be easy to parse.
I’m not sure yet what would be the best way to keep that information (and if I don’t have any suggestion rapidly, then let’s take it as it is: it’s still a nice improvement).

Could you just make the metra gtfs a bit small (maybe trim the stops_times.txt filie) ?

@kylerchin
Copy link
Contributor Author

Sure! Let me take a look!

@Tristramg
Copy link
Collaborator

Thank you for the change. We still need the file, just with a few lines

@lolpro11
Copy link

lolpro11 commented Dec 31, 2023

Just a quick question. Would this change be O(n)? It's much simpler to check the separator, then invoke this change on an if statement. This would save about 2 billion steps if I am ingesting the World's GTFS data.

@Tristramg
Copy link
Collaborator

The function read_objs is called once per csv_file, so a dozen of times per GTFS file. The trimming only occurs for the headers, not for each line. I think this will scale without any trouble.

(and even if we decided to trim each line, what might happen eventually, removing trailing white spaces is still quite cheap compared to parsing numbers)

@kylerchin
Copy link
Contributor Author

@Tristramg thoughts so far?

Copy link
Collaborator

@Tristramg Tristramg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
(I’m squashing with github to avoid having the larger file living in the history)

@Tristramg Tristramg merged commit 9fc3af3 into rust-transit:main Jan 1, 2024
2 checks passed
@kylerchin
Copy link
Contributor Author

Looks like UK is now working with this now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants