Task tree and iterators: backend performance improvements #207

psrok1 · 2023-02-21T17:47:44Z

Initial plans:

Implement karton.task:<root_id>:<task_id> naming (called fully-qualified task identifiers) suggested by @rakovskij-stanislav (Get task tree status #178 (comment)) to speed up task tree inspection
On top of that I'm going to continue great work of @a1ext and reimplement improvements and suggestions from SystemService and backend improvements #193

What is done:

Added karton.task:<root_id>:<task_id> to speed up querying for single analysis status. This change comes with new task field called revision. It's required because there are cases where task method needs to be aware if task was found under <root_id>:<task_id> or just <task_id>:
- Task update (KartonBackend.register_task) might be done by Karton service with different version (karton-system, karton-dashboard) which may introduce inconsistency.
- Task deletion is a bit more complicated when identifiers are mixed.
In the same time, this identifier wasn't stored anywhere in Task itself. I was afraid about inconsistencies due to storing fquid as a separate field so finally decided to introduce revision versioning which can be also useful for future modifications

Optimized task deserialization:

Introduced quick task deserialization (parse_resources=False) without parsing resources and with use of orjson. custom_hook used by json.loads for Resource deserialization has significant impact on performance (and is not supported by orjson)
During deserialization, we pass Task attribute values directly to the constructor (so they're not set twice). In addition, introduced __slots__ to speed up attribute access.

Benchmark with 30k tasks loaded into Karton:

# Karton 5.0.1
In [30]: timeit.timeit(lambda: list(backend.get_all_tasks()), number=10)
Out[30]: 5.391155833960511

# Karton 5.1
In [30]: timeit.timeit(lambda: list(backend.get_all_tasks()), number=10)
Out[30]: 4.257205436006188

# Karton 5.1 with parse_resources=False
In [6]: timeit.timeit(lambda: backend.get_all_tasks(parse_resources=False), number=10)
Out[6]: 3.729756842018105

# Karton 5.1 with orjson + parse_resources=False
In [7]: timeit.timeit(lambda: backend.get_all_tasks(parse_resources=False), number=10)
Out[7]: 2.8112106900662184

Added iter_tasks family of methods that use iterators to make less memory footprint (inspired by SystemService and backend improvements #193). In addition, this version uses SCAN instead of KEYS to not block Redis for too long when there are lots of tasks. Possible inconsistencies due to operating on cursor instead of fixed list should not be a problem.
karton.core.inspect.KartonState is lazy-loading things and has additional method get_analysis if you want to load data only about specific root_uid. Main properties still load all tasks though, because we still need to deserialize everything to filter out FINISHED/CRASHED tasks.
Added new set to Redis called karton.assigned:<identity> that keeps references to tasks that were routed to consumer with given identity. It optimizes access to tasks processed by single consumer, but... it doesn't come with any huge benefits. It can be used to speed up access to https://karton-dashboard/queue/<identity> and fixes some GC issues (Removing hanging "started" tasks for non-persistent kartoniks when all instances are gone #10), but GC and Karton main view still need to process all the tasks in Redis so it's not a big deal.
Bumped Python version used by karton-system Docker image to 3.11 (see https://docs.python.org/3/whatsnew/3.11.html#optimizations. I haven't made any benchmark tho).

…sk updates and deletion

…oesn't improve anything actually

Dockerfile

.github/workflows/test.yml

karton/core/inspect.py

karton/system/system.py

Co-authored-by: Michał Praszmo <[email protected]>

karton/core/task.py

Co-authored-by: Michał Praszmo <[email protected]>

nazywam · 2023-06-09T16:22:27Z

Some of the issues seen while testing this in a bunch of different setups:

old karton-system + new karton services - When a service spawns a new task it uses the fully qualified task id (fquid), because the old karton-system cannot handle them they are stuck in dispatched state and are not cleaned up correctly.
new karton-system + old karton services - The old services don't know how to properly read the fquid task from their queues which leads to karton tasks being stuck in "spawned" state.

TL;DR In order to deploy the newly-proposed fully qualified task ids, we'd have to either add some kind of heuristics to determine whether the consumer "knows" fquid or we'd have to just upgrade all services and karton-system at once (seem non-trivial)

rakovskij-stanislav · 2023-06-09T16:30:34Z

For me it was kinda obvious that such major update requires both client-side and server-side update.

We can solve it by adding two features:

client-side check that server version is at least x.y.z, otherwise raise ServerIncompatibilityException with the suggestion to use lower version of karton on client-side
server-side check on new worker with force unbind the client version of karton-core of it is lower than a.b.c.

psrok1 · 2023-06-11T23:10:15Z

Thanks for feedback, I'll try to separate non-breaking changes from this PR into another one and implement some safety checks here.

karton/core/task.py

psrok1 · 2024-05-13T17:36:40Z

To be continued in #255

psrok1 added 11 commits February 21, 2023 18:44

Task tree and iterators: backend performance improvements

594c3c1

Task: need to introduce revision field to use correct fquid during ta…

497cdcb

…sk updates and deletion

Assigning routed tasks to consumer set

ef157ca

New changes

7fe7f33

use orjson

04f32bc

Merge branch 'master' into feature/multiple-task-performance

4a73b95

Use optimizations in karton.inspect

36f19cf

Added some docs

4c576a5

Fixed docs

8709526

Ignore orjson in mypy

c6244bc

Use newer Python in karton-system

37382b8

psrok1 marked this pull request as ready for review February 23, 2023 17:38

psrok1 requested a review from nazywam February 23, 2023 17:44

psrok1 mentioned this pull request Feb 23, 2023

SystemService and backend improvements #193

Closed

This was linked to issues Feb 23, 2023

Removing hanging "started" tasks for non-persistent kartoniks when all instances are gone #10

Closed

Get task tree status #178

Closed

psrok1 added 2 commits February 23, 2023 18:52

Set back the original GC_INTERVAL

6cc3a4d

Merge branch 'master' into feature/multiple-task-performance

17f0941

psrok1 marked this pull request as draft March 2, 2023 14:59

Removed karton.assigned set: it's not really safe for migration and d…

d35a05b

…oesn't improve anything actually

psrok1 marked this pull request as ready for review March 2, 2023 15:14

psrok1 commented Mar 2, 2023

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

nazywam reviewed Mar 3, 2023

View reviewed changes

psrok1 and others added 3 commits March 3, 2023 14:24

Apply suggestions from code review

0acfc9f

Co-authored-by: Michał Praszmo <[email protected]>

Add orjson as regular dependency + fix typing

2c06466

Apply other suggestions from review.

dad2644

psrok1 force-pushed the feature/multiple-task-performance branch from b0ae13e to dad2644 Compare March 3, 2023 16:42

psrok1 requested a review from nazywam March 3, 2023 16:44

psrok1 mentioned this pull request Mar 16, 2023

Karton reanalysis API is slow CERT-Polska/mwdb-core#650

Open

Merge branch 'master' into feature/multiple-task-performance

b440afc

psrok1 added 2 commits May 25, 2023 13:00

Version bump

36b89a0

Merge branch 'master' into feature/multiple-task-performance

c0d1da1

nazywam reviewed May 31, 2023

View reviewed changes

karton/core/task.py Outdated Show resolved Hide resolved

Update karton/core/task.py

3f868d9

Co-authored-by: Michał Praszmo <[email protected]>

psrok1 commented Jun 20, 2023

View reviewed changes

karton/core/task.py Outdated Show resolved Hide resolved

psrok1 mentioned this pull request Jun 20, 2023

Task iterators: backend performance improvements #218

Merged

psrok1 marked this pull request as draft June 20, 2023 10:36

psrok1 removed a link to an issue Jun 20, 2023

Removing hanging "started" tasks for non-persistent kartoniks when all instances are gone #10

Closed

Merge branch 'master' into feature/multiple-task-performance

49c1968

psrok1 closed this May 13, 2024

psrok1 mentioned this pull request May 14, 2024

Task tree: backend performance improvements #255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task tree and iterators: backend performance improvements #207

Task tree and iterators: backend performance improvements #207

psrok1 commented Feb 21, 2023 •

edited

Loading

nazywam commented Jun 9, 2023

rakovskij-stanislav commented Jun 9, 2023

psrok1 commented Jun 11, 2023 •

edited

Loading

psrok1 commented May 13, 2024

Task tree and iterators: backend performance improvements #207

Task tree and iterators: backend performance improvements #207

Conversation

psrok1 commented Feb 21, 2023 • edited Loading

nazywam commented Jun 9, 2023

rakovskij-stanislav commented Jun 9, 2023

psrok1 commented Jun 11, 2023 • edited Loading

psrok1 commented May 13, 2024

psrok1 commented Feb 21, 2023 •

edited

Loading

psrok1 commented Jun 11, 2023 •

edited

Loading