-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task tree and iterators: backend performance improvements #207
Conversation
…sk updates and deletion
…oesn't improve anything actually
Co-authored-by: Michał Praszmo <[email protected]>
b0ae13e
to
dad2644
Compare
Co-authored-by: Michał Praszmo <[email protected]>
Some of the issues seen while testing this in a bunch of different setups:
TL;DR In order to deploy the newly-proposed fully qualified task ids, we'd have to either add some kind of heuristics to determine whether the consumer "knows" fquid or we'd have to just upgrade all services and karton-system at once (seem non-trivial) |
For me it was kinda obvious that such major update requires both client-side and server-side update. We can solve it by adding two features:
|
Thanks for feedback, I'll try to separate non-breaking changes from this PR into another one and implement some safety checks here. |
To be continued in #255 |
Initial plans:
karton.task:<root_id>:<task_id>
naming (called fully-qualified task identifiers) suggested by @rakovskij-stanislav (Get task tree status #178 (comment)) to speed up task tree inspectionWhat is done:
Added
karton.task:<root_id>:<task_id>
to speed up querying for single analysis status. This change comes with new task field calledrevision
. It's required because there are cases where task method needs to be aware if task was found under<root_id>:<task_id>
or just<task_id>
:KartonBackend.register_task
) might be done by Karton service with different version (karton-system, karton-dashboard) which may introduce inconsistency.In the same time, this identifier wasn't stored anywhere in
Task
itself. I was afraid about inconsistencies due to storingfquid
as a separate field so finally decided to introducerevision
versioning which can be also useful for future modificationsOptimized task deserialization:
parse_resources=False
) without parsing resources and with use oforjson
.custom_hook
used byjson.loads
for Resource deserialization has significant impact on performance (and is not supported byorjson
)__slots__
to speed up attribute access.Benchmark with 30k tasks loaded into Karton:
Added
iter_tasks
family of methods that use iterators to make less memory footprint (inspired by SystemService and backend improvements #193). In addition, this version usesSCAN
instead ofKEYS
to not block Redis for too long when there are lots of tasks. Possible inconsistencies due to operating on cursor instead of fixed list should not be a problem.karton.core.inspect.KartonState
is lazy-loading things and has additional methodget_analysis
if you want to load data only about specific root_uid. Main properties still load all tasks though, because we still need to deserialize everything to filter out FINISHED/CRASHED tasks.Added new set to Redis calledkarton.assigned:<identity>
that keeps references to tasks that were routed to consumer with given identity. It optimizes access to tasks processed by single consumer, but... it doesn't come with any huge benefits. It can be used to speed up access tohttps://karton-dashboard/queue/<identity>
and fixes some GC issues (Removing hanging "started" tasks for non-persistent kartoniks when all instances are gone #10), but GC and Karton main view still need to process all the tasks in Redis so it's not a big deal.Bumped Python version used by karton-system Docker image to 3.11 (see https://docs.python.org/3/whatsnew/3.11.html#optimizations. I haven't made any benchmark tho).