Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
linked to etalab/transport-validator#172
Memory consumption is too great because we do the parsing in 2 phases, first into a
RawGtfs
then into aGtfs
and during the conversion we have both structures in memory, resulting in doubling the peak memory needed. In the FR IDF dataset (~13 000 000 stop times), the RawGTFS takes 2.3 G or memory and the GTFs 2.1G, and the peak memory needed is ~3.9.This PR add a reverse loop on the stop times and schrink to fit the vector (reverse iterating so not to allocate a new vector). Even if it's a naive implementation (we shouldn't have to schrink_to_fit it at every element), it seems the performance impact is negligible and on the IDF dataset
/usr/bin/time
measure goes from 3.9G to 3.4G