-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updateDelaySet may cause very high redis cpu usage #1893
Comments
Before I dig more into this issue, could you tell me more about your queue setup? for instance why do you have so many delayed jobs, and what is the average delay for these jobs? are you using rate limiting? |
also, are you using priorities or just normal jobs? |
My queue config: |
ok, so you have both delayed jobs and rate limited to 1000 jobs per second right? |
yes exactly |
and where are you running your redis instances? |
we run our redis instance on aws elastic cache redis. |
did you consider upgrading the instance type to a faster one? I know in aws is wasteful since larger instances also has more cores but redis will only use one. |
I assume you are using last bull version 3.18.1 right? |
yes we tried upgrading our instance but nothing difference. The queue can run a period of time but this issue will occur occasionally. we use 3.18. |
yes. Moving the delayed jobs back to wait is not very efficient, specially in this case when your queues are always full of jobs. I think I can implement a solution for this, but it may take a couple of days before it is ready (if everything goes well). |
I implemented some improvements that should reduce the CPU usage for a case like this, I wonder if you have any chance to verify it before we make a new release? #1897 |
I tried #1897 with option |
ok, I will try to mimic those numbers in a synthetic test. |
I added a small fix to the above mentioned PR, would you mind to test if it works faster? |
Thx for your working. I tested again and the consuming speed was very fast. And the rate is close to the rate limiter. Jobs moves to active and get consumed quickly but the redis cpu is much lower than before. |
I found the
However the slow log length was just 4 and didn't increase any more. |
yes, I think it is normal, my fix basically disables the updateDelayTimer as long as there are jobs in the queue to be processed, but sometimes when it gets rate limited it will kick again. It is very difficult to make it perfect just using LUA scripts. I am working on a redis module that solves all this problems perfectly but it will not work on AWS elasticache since it does not support modules. |
Closing this since there are no more actions that I can take. |
We have a very large queue which has about several millions of delay jobs. And our queue has tens of workers(pods) in k8s cluster and even double or triple more pods when the business peak time. We have encountered an issue that bull can lead high redis cpu usage and the redis gets slow then stop prosessing jobs. It's a pain last for several months. Finally we find that the updateDelaySet get triggered so frequently on each worker and thus caused the high redis cpu usage and then consumer get so slow even stop processing waiting jobs and the waiting queue get stacked and finally the entire bull queue stop working.
I found that the updateDelaySet was expensive when the queue was huge so we should stop doing
updateDelaySet
when the waiting queue has too many jobs to process. We can add an option saymaxWaitingJobs
which default set Number.MAX. The user can set this value according their needs.The text was updated successfully, but these errors were encountered: