Release RawStopTime earlier #141

antoine-de · 2023-08-31T12:02:44Z

linked to etalab/transport-validator#172

Memory consumption is too great because we do the parsing in 2 phases, first into a RawGtfs then into a Gtfs and during the conversion we have both structures in memory, resulting in doubling the peak memory needed. In the FR IDF dataset (~13 000 000 stop times), the RawGTFS takes 2.3 G or memory and the GTFs 2.1G, and the peak memory needed is ~3.9.

This PR add a reverse loop on the stop times and schrink to fit the vector (reverse iterating so not to allocate a new vector). Even if it's a naive implementation (we shouldn't have to schrink_to_fit it at every element), it seems the performance impact is negligible and on the IDF dataset /usr/bin/time measure goes from 3.9G to 3.4G

guaranteed

Does change a lot of stuff though

antoine-de added 3 commits August 31, 2023 13:54

schrink to fit vector

702746c

Change stop_time stop_sequence in dataset, else the sort cannot be

998bdc0

guaranteed

remove one clone

ba5f1cf

Does change a lot of stuff though

Tristramg approved these changes Sep 27, 2023

View reviewed changes

Tristramg merged commit 2a58597 into rust-transit:main Sep 27, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release RawStopTime earlier #141

Release RawStopTime earlier #141

antoine-de commented Aug 31, 2023

Release RawStopTime earlier #141

Release RawStopTime earlier #141

Conversation

antoine-de commented Aug 31, 2023