Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(tasks/routes): process uploaded files in parallel as celery groups #504

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

matthiasschaub
Copy link
Collaborator

@matthiasschaub matthiasschaub commented Sep 5, 2024

Instead of georeferencing uploaded files in one celery tasks sequential
(w/ the output of a single big zip file, when too big leads to memory
issues) files are now processed in parallel in one salary group.

The same is done for digitize sketches: Digitize files in parallel as
celery group instead of as single celery task.

Closes #498
Closes #399

Other changes:

  • refactor(tasks): move ml-model init to worker process init
  • misc(scripts): start celery worker w/ concurrency of 1
  • fix(merge): write color to properties
  • refactor: move merge and zip_ to helpers module
  • refactor(celery): use chord and immutable signature. Use chord for executing task after group is finished.
  • feat(api): return errors as list for /status requests
  • tests(api): add unit-tests for /download endpoint
  • refactor: provide more detailed error message
  • test: fix flaky tests due to missing mocks of cleanup_map_frames and cleanup_blobs

Changed Dependencies

  • build(deps): add pytest plugin to rerun failures

Open Tasks:

@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch 2 times, most recently from cd6ab84 to 2d2498b Compare September 5, 2024 14:32
@matthiasschaub matthiasschaub changed the title Split georeferencing into multiple tasks refactor(tasks/routes): process uploaded files in parallel as celery group Sep 5, 2024
@matthiasschaub matthiasschaub changed the title refactor(tasks/routes): process uploaded files in parallel as celery group refactor(tasks/routes): process uploaded files in parallel as celery groups Sep 6, 2024
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch 3 times, most recently from b59b217 to 93ad756 Compare September 9, 2024 09:40
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch 5 times, most recently from 4ce2b37 to e742f2d Compare October 14, 2024 21:11
instead of georeferencing uploaded files in one celery tasks sequential
(w/ the output of a single big zip file which if too big leads to memory
issues) files are now processed in parallel in one celery group.

The same is done for digitize sketches: Digitize files in parallel as
celery group instead of as single celery task.

Other changes:
- refactor: move merge and zip_ to helpers module
- refactor(celery): use chord and immutable signature. Use chord for executing task after group is finished.
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch from 718f2f3 to 9120d8a Compare October 17, 2024 02:56
into test_routes_api_health.py and test_routes_api_status.py
Build user message from changed info and new errors response fields.
Raster and vector results generated by previous version are stored as
single AsyncResult in the result store. These need to be retrieved
differently as it is now down with GroupResults.
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch from 459d343 to fbb243f Compare October 19, 2024 03:33
Some tests cluttered the Celery queue, which led to timeout errors for
next test cases.
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch from fbb243f to 3c3a91c Compare October 22, 2024 21:15
@matthiasschaub matthiasschaub marked this pull request as ready for review October 22, 2024 23:35
@matthiasschaub matthiasschaub force-pushed the split-georeferencing-into-multiple-tasks branch from 3c3a91c to 6ad8135 Compare October 27, 2024 00:15
@rtroilo rtroilo self-requested a review December 2, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant