Add support for running multiple Celery queues #1130

Ian2012 · 2024-10-03T21:17:31Z

Is your feature request related to a problem? Please describe.

Open edX already has implemented solutions for running multiple celery queues such as the old known high and high_mem default queues however, tutor operates by using a single worker deployment with a default pool (prefork) and this is not always as performant as desired as that pool is designed for CPU intensive tasks such as the ones related to grading, while I/O bound tasks such as the ones used in Aspects would benefit from having a gevent pool which used green threads aka virtual threads to accomplish high levels of concurrency. This has already been tested and implemented in tutor-contrib-celery and the benefits have been notorious as the resources are better handled and I/O bound tasks are resolved pretty fast.

Describe the solution you'd like

Allow tutor users to configure multiple Celery deployments with specialized settings for each queue's tasks. With defaults matching what's currently expected to run on each queue.

Describe alternatives you've considered

Developing a separate plugin for Celery: tutor-contrib-celery however we think the community would benefit from having a standard way to set Celery and also don't need to build a custom openedx images with those requirements: openedx/edx-platform#35591

Additional context

https://celery.school/celery-worker-pools

The text was updated successfully, but these errors were encountered:

regisb · 2024-10-04T08:44:39Z

Hey @Ian2012! I'm glad you are opening this issue today as there are multiple conversations going on right now about Celery:

#1126
#1010
#1128

As usual, I'd like to resolve these issues by:

Improving the default settings for end users.
Making it possible to easily customize these default settings.

Let's address these two items in reverse order:

Customization

Let's assume that we provide a mechanism in Tutor core that makes it possible to customize the celery command exactly as you need. You are then also able to run any celery worker by adding containers to the docker-compose/k8s override files. I don't have a perfect solution to this problem just yet, but I'm working on it. Assuming this solution exists, it should enable you to customize the celery workers as you see fit, right?

There would be one remaining problem, which is the unavailability of gevent in the openedx Docker image. You propose to address this issue in your edx-platform PR. An alternative would be to add gevent to the additional packages installed by Tutor. Which solution do you like best?

Good defaults

My second concern is having good defaults for users who don't know how to tune celery workers. In your opinion, what changes should we make to the default celery command or settings to improve those defaults? For instance, in your tutor-contrib-celery plugin, I see that you set CELERY_ACKS_LATE = True (here). Do we want to add this change to tutor core? Are there other changes that we should be adding?

Ian2012 · 2024-10-04T12:15:22Z

Customization

I've seen some comments around having a celeryconfig.py file with a patch on it for general settings, and for the custom workers we could use environment variables.

For 'gevent', both options are available, but if it's okay to have it on the tutor, then let's proceed with that.

Good defaults

Besides the CELERY_ACKS_LATE Python setting, we have explored no other Python settings, but that's something we can test and measure when the PR is open. Here are some settings we have explored:

autoscale: One thing we have considered was tweaking the autoscale parameter however, the HPA doesn't play nicely when autoscale is enabled and the performance is affected.
pool: It depends on I/O vs CPU-bound tasks, but there are two options here, use prefork for CPU-bound tasks that use processes instead of threads, and there is no penalty for context switching with concurrency matching CPU limits which are usually 1 CPU, which means 1 task at a time, and use gevent for I/O bound tasks with a high concurrency value such as 100 (already tested). I would recommend having at least two deployments where the default queue is for I/O bound tasks and the high queue is for CPU-intensive tasks.
concurrency: It depends on the type of queue and resource constraints and limits (bare metal, cloud provider, instance type, autoscaling, etc) but we can provide the defaults described earlier.

I can open a PR with the following changes to be considered:

Changes to default Python settings in lms settings.
Create a separate Python configuration file for Celery.
Create a mechanism to easily define and tune new workers, see CELERY_WORKERS_CONFIG
Having good defaults for the multiple queues that match what OpenedX expects. (needed discussion with operators of medium and large instances)

regisb · 2024-10-04T13:15:13Z

I've seen some comments around having a celeryconfig.py file with a patch on it for general settings, and for the custom workers we could use environment variables.

Yes, I suggested that. But after investigating I found no way to configure a celery worker using such a settings file. As far as I know the celery worker command does not support a --config=... option: https://docs.celeryq.dev/en/stable/userguide/workers.html

Do you know a way to achieve that?

Ian2012 · 2024-10-04T13:43:21Z

We need to use the celery app config_from_object method on the production.py settings files:

default_config = 'myproj.celeryconfig'
app.config_from_object(default_config)

Ian2012 · 2024-10-04T13:48:59Z

It may require changes to edx-platform celery's app but I don't think we need it. We also have the option to use environment variables but python files provides more prefxility

regisb · 2024-10-04T13:56:25Z

You are referring to this? https://docs.celeryq.dev/en/stable/reference/celery.html#celery.Celery.config_from_object
This is the method that is being called in lms.celery: https://github.com/openedx/edx-platform/blob/master/lms/celery.py

As far as I know, this method is used to configure the Celery application, which is not the same thing as the Celery worker. For instance, were you able to configure the worker concurrency using this method? If yes, how?

Ian2012 · 2024-10-04T16:54:47Z

Yes, here is the code snippet with some other settings:

CELERYD_CONCURRENCY = 2 # worker_concurrency
CELERYD_MAX_TASKS_PER_CHILD = 100 # worker_max_tasks_per_child
CELERYD_POOL = 'threads' # worker_pool

See https://celery-safwan.readthedocs.io/en/latest/userguide/configuration.html#new-lowercase-settings for more information on the available settings, users can also use the APP variable to directly modify the variables on the celery object.

We are still missing having separate settings for each lms-worker queue but that can be solved by injecting an environment variable and dynamically resolve the settings on a dictionary:

CELERY_WORKER_TYPE = os.environ.get("CELERY_WORKER_TYPE", "default")

worker_settings = {
    "lms": {
        "default": {
            "parameters": {
                "worker_concurrency": 4,
                "worker_pool": "threads"
            }
        },
        "high": {},
        "high_mem": {},
    },
    "cms": {
        "default": {},
        "low": {},
    },
}
from openedx.core.lib.celery import APP

worker_variants = worker_settings.get(SERVICE_VARIANT)
for variant, config in worker_variants.items():
    if CELERY_WORKER_TYPE == variant:
        for parameter, value in config.get("parameters", {}).items():
            conf = APP.conf
            setattr(conf, parameter, value)

This is working locally

Ian2012 · 2024-10-04T16:58:11Z

Some parameters cannot be added this way, like the queues but can be added to the command line args. But if we are going to add command line arguments and everything is configurable using the pattern above then we don't need this script, just inject every setting via the filter and allow operators to override it

Ian2012 · 2024-10-04T21:49:53Z

This is the POC: #1131

regisb · 2024-10-07T14:54:19Z

Customization

I saw your PoC ##1131. I'm not in favor of a filter that would be used to create extra containers. There is already another mechanisms to do that, which is patching the docker-compose/deployments.yml files. Let's avoid creating another one.

Let's focus on the best mechanism to override/customise the existing celery command. Controlling celery via environment variables or a celeryconfig.py file will give us limited control, because not all CLI options can be configured using this mechanism. In particular, --without-gossip and --autoscale=... can only be defined from the CLI. So we need to be able to customize the celery command.

I suggest to use two filters: hooks.Filters.LMS_WORKER_COMMAND and hooks.Filters.CMS_WORKER_COMMAND which will include the default arguments.

Good defaults

pool

I would recommend having at least two deployments where the default queue is for I/O bound tasks and the high queue is for CPU-intensive tasks.

We should stick to just one worker for the LMS and the CMS. Any extra worker will use extra server resources. In that context, I think we should stick to processes and the prefork pool, right?

autoscale

Is it really a good idea to default to --autoscale=1,4 for most users? I came up with this idea, but now that I think of it I'm not sure about the unpredictability that it creates.

acks_late

ACKS_LATE will mean that tasks will be retried when they fail. I think this would be a major change, right? Do we really want that? I'd be afraid that a single task would run infinitely when it fails.

Ian2012 · 2024-10-07T15:24:22Z

Customization

I don't think we would need two filters, I think the lms and cms queues are already very similar, so we can just use one filter for it: hooks.Filters.CELERY_WORKER_COMMAND.

Good defaults

pool

Are we just going to use add one queue? Operators should be able to add workers for the different queues to improve the performance of those tasks. Also, the current implementation can cause problems because it only excludes the default queue from the other service, but other queues such as lms high and lms high_mem are not excluded which causes lms tasks to be run on the cms workers, and vice-versa. This has already been a problem in production for us.

autoscale

No, it causes problems with CPU/memory HPA and doesn't provide any advantages over scaling the deployments. This would only benefit local installations, not k8s installations. I wouldn't default this one but leave a warning or recommendation in the docs for it.

acks_late

No, it will not retry failed tasks. Failed tasks will be acknowledged as failed, but will not be retried. Late acknowledge prevents forceful killed workers from removing tasks from the queue.

Multique queues

Then, will tutor-contrib-celery be the default option to run multiple celery queues?

If it is a resource problem, can we default to only one worker but allow to have multiple workers with different settings? This is what we already do on the Celery plugin.

Ian2012 · 2024-10-08T14:28:07Z

I also imagine having multiple queues only on Kubernetes deployment, not in local installations

jfavellar90 · 2024-10-16T23:01:35Z

Hi! Joining late to the conversation, but I would like to leave my 50 cents:

I'm more inclined to use separated filters: hooks.Filters.LMS_WORKER_COMMAND and hooks.Filters.CMS_WORKER_COMMAND. The tasks executed by every worker can have different behaviors so it can be helpful to configure them separately.
An important point to take in mind is that enabling multiple queues should modify the --exclude-queues parameter. In a multi-queue scenario, the idea is to listen to a single queue (--queue parameter) as we used to do in the old Ansible model.
Pool: Agree with the approach you guys proposed. Prefork is OK for most of the workloads, however, we should make this parameter configurable for specific use cases (e.g. Aspects and I/O bound tasks)
Autoscale: as @Ian2012 mentioned, it is not recommended. We should stick to the default concurrency settings and allow the modification of the configuration.
acks_late: I've had some issues with this setting in Kubernetes. At times some tasks take the worker to its hardware limits and this triggers a pod restart. When the pod is restarted and acks_late is enabled, the problematic task is re-queued and eventually taken by another worker that, again, gets all its resources consumed and continues the cycle, affecting workers' performance in general. What we do is revoke the task via Celery Flower so the workers can keep operating normally. Probably it will require more debugging to spot the problematic tasks. This happens in medium to large-size installations, however, I consider it worth mentioning.
prefetch_multiplier: this is an interesting setting to configure depending on the nature of the tasks in the queues. For single-queue approaches, the default is OK. The multi-queue approach could benefit from the tunning of this setting.

Ian2012 · 2024-10-18T18:10:22Z

@regisb I've updated the POC to use the two new filters to define the celery command: #1134

btw, one issue with the --exclude-queue setting is that it only excludes 1 queue but the cms/lms have multiple queues, and this caused problems like wrongly sending lms.high_mem and lms.high tasks to the cms worker (such as grading tasks) which cannot be processed as the grading app is not installed on the cms.

regisb added this to Tutor project management Oct 3, 2024

github-project-automation bot moved this to Pending Triage in Tutor project management Oct 3, 2024

Ian2012 mentioned this issue Oct 3, 2024

chore: add gevent as bundle dependency openedx/edx-platform#35591

Closed

regisb mentioned this issue Oct 7, 2024

POC feat: add support for celery multiqueue #1131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for running multiple Celery queues #1130

Add support for running multiple Celery queues #1130

Ian2012 commented Oct 3, 2024

regisb commented Oct 4, 2024

Ian2012 commented Oct 4, 2024 •

edited

Loading

regisb commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

regisb commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

Ian2012 commented Oct 4, 2024 •

edited

Loading

Ian2012 commented Oct 4, 2024

regisb commented Oct 7, 2024

Ian2012 commented Oct 7, 2024 •

edited

Loading

Ian2012 commented Oct 8, 2024 •

edited

Loading

jfavellar90 commented Oct 16, 2024

Ian2012 commented Oct 18, 2024

Add support for running multiple Celery queues #1130

Add support for running multiple Celery queues #1130

Comments

Ian2012 commented Oct 3, 2024

regisb commented Oct 4, 2024

Customization

Good defaults

Ian2012 commented Oct 4, 2024 • edited Loading

Customization

Good defaults

regisb commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

regisb commented Oct 4, 2024

Ian2012 commented Oct 4, 2024

Ian2012 commented Oct 4, 2024 • edited Loading

Ian2012 commented Oct 4, 2024

regisb commented Oct 7, 2024

Customization

Good defaults

pool

autoscale

acks_late

Ian2012 commented Oct 7, 2024 • edited Loading

Customization

Good defaults

pool

autoscale

acks_late

Multique queues

Ian2012 commented Oct 8, 2024 • edited Loading

jfavellar90 commented Oct 16, 2024

Ian2012 commented Oct 18, 2024

Ian2012 commented Oct 4, 2024 •

edited

Loading

Ian2012 commented Oct 4, 2024 •

edited

Loading

Ian2012 commented Oct 7, 2024 •

edited

Loading

Ian2012 commented Oct 8, 2024 •

edited

Loading