Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the amount of records to use in the analyses #25

Open
flauschzelle opened this issue Oct 17, 2020 · 0 comments
Open

Limit the amount of records to use in the analyses #25

flauschzelle opened this issue Oct 17, 2020 · 0 comments

Comments

@flauschzelle
Copy link
Contributor

According to our automated healthchecks, the analyser takes a few minutes longer every day - which will lead to massive problems if it ever reaches a duration of more than 24 hours, but even at shorter times, already seems inefficient.

This may be caused by the ever-growing amount of records in our database and the analyser therefore having to process more data each day. Some of which are probably already useless anyway because they get outdated with every change in the structure of the public transport network (new routes and/or stops being introduced or old ones going out of service).

And even for the route_variants that are still used in the present, it would be useful to limit the amount of data, because at some point, more data will just make the analyes become slower, but will have no measurable impact on the quality/precision of the result.

To solve this, we first have to find out at how many records for any specific combination of stop & route_variant the data quality stops to get significantly better. And then automate throwing away old data until the amount of data is below that limit. That should also limit the max amount of time needed for the analyser to run once per day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant