-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task/WG-383: add queue for heavy tasks #220
Conversation
Also, fixes nsf log snippet application.
This was high as we used to support direct file upload instead of using TAPIS
…-Cloud/geoapi into task/WG-285-update-potree-converter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sh -c ' | ||
celery -A geoapi.celery_app worker -l info -Q default -n default_worker@geoapi & | ||
celery -A geoapi.celery_app worker -l info -Q heavy --concurrency=6 -n heavy_worker@geoapi & | ||
wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - is this preferable over letting the process hang on the celery
command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstijerina as we have two celery tasks, at least the first one needed to be ran in the background, &
but second one could have been run in the foreground and then the wait skipped.
i didn't see a way to run celery and say, i want X concurrency on this queue and Y concurrency on this other queue
@@ -167,7 +167,7 @@ def _handle_point_cloud_conversion_error(pointCloudId, userId, files, error_desc | |||
f"Processing failed for point cloud ({pointCloudId})!") | |||
|
|||
|
|||
@app.task(rate_limit="1/s") | |||
@app.task(queue='heavy') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion - Should we add a logger to let us know which queue a task is using? Probably not needed just curious if that would help us down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the task itself gets logged but not which queue it was in. we could have a base celery task that our tasks use and could log that info. good idea when we make improvements to this.
Overview:
Add
heavy
queue for computationally intensive tasks. In this PR, the queue will be used by:The number of workers for both queues is:
default
queue uses Celery's default behavior: nproc number of worker processes.heavy
queue is set to nproc/2 number of workers (so intensive/long tasks don't monopolize things)Note: While we've discussed categorizing tasks based on their true intensity/need (e.g., differentiating between small and large point cloud processing), this PR doesn't implement that. Instead, it makes incremental improvements by just having a smaller queue for large tasks to i) reduce the risk of memory exhaustion during large point cloud processing, and ii) prevent system monopolization by intensive tasks.
Related Jira tickets:
Summary of Changes:
Testing Steps: