You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to our automated healthchecks, the analyser takes a few minutes longer every day - which will lead to massive problems if it ever reaches a duration of more than 24 hours, but even at shorter times, already seems inefficient.
This may be caused by the ever-growing amount of records in our database and the analyser therefore having to process more data each day. Some of which are probably already useless anyway because they get outdated with every change in the structure of the public transport network (new routes and/or stops being introduced or old ones going out of service).
And even for the route_variants that are still used in the present, it would be useful to limit the amount of data, because at some point, more data will just make the analyes become slower, but will have no measurable impact on the quality/precision of the result.
To solve this, we first have to find out at how many records for any specific combination of stop & route_variant the data quality stops to get significantly better. And then automate throwing away old data until the amount of data is below that limit. That should also limit the max amount of time needed for the analyser to run once per day.
The text was updated successfully, but these errors were encountered:
According to our automated healthchecks, the analyser takes a few minutes longer every day - which will lead to massive problems if it ever reaches a duration of more than 24 hours, but even at shorter times, already seems inefficient.
This may be caused by the ever-growing amount of records in our database and the analyser therefore having to process more data each day. Some of which are probably already useless anyway because they get outdated with every change in the structure of the public transport network (new routes and/or stops being introduced or old ones going out of service).
And even for the route_variants that are still used in the present, it would be useful to limit the amount of data, because at some point, more data will just make the analyes become slower, but will have no measurable impact on the quality/precision of the result.
To solve this, we first have to find out at how many records for any specific combination of stop & route_variant the data quality stops to get significantly better. And then automate throwing away old data until the amount of data is below that limit. That should also limit the max amount of time needed for the analyser to run once per day.
The text was updated successfully, but these errors were encountered: