-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job not logged in failed_jobs if timeout occurs within database transaction #49389
Comments
Verify #49389 Signed-off-by: Mior Muhammad Zaki <[email protected]>
HI there, I added the following tests to verify the issue and was unable to replicate the problem, |
Hey there, thanks for reporting this issue. We'll need more info and/or code to debug this further. Can you please create a repository with the command below, commit the code that reproduces the issue as one separate commit on the main/master branch and share the repository here? Please make sure that you have the latest version of the Laravel installer in order to run this command. Please also make sure you have both Git & the GitHub CLI tool properly set up.
Do not amend and create a separate commit with your custom changes. After you've posted the repository, we'll try to reproduce the issue. Thanks! |
@crynobone Thank you for the time invested. We'll as a next step try and reproduce this in a fresh laravel project to rule out the fact that this has something to do with our environment and get back to you asap. |
* [10.x] Test Improvements Verify #49389 Signed-off-by: Mior Muhammad Zaki <[email protected]> * Apply fixes from StyleCI --------- Signed-off-by: Mior Muhammad Zaki <[email protected]> Co-authored-by: StyleCI Bot <[email protected]>
We have successfully reproduced the mentioned issue in a fresh standalone laravel project. Step to reproduceexecute the command (in root directory:
Thank you for your consideration & time |
Can you push the repository to github as suggested above? |
@graemlourens It seems that you didn't specify https://laravel.com/docs/10.x/queues#max-job-attempts-and-timeout and I don't see how it would goes to |
@crynobone We did not specify $tries as by default its 1 and therefore the setting was unnecessary, and we did set the $timeout on the job properly However just to rule out any error on our side, we added tries=1 on the job, as well as also on the worker, and it had no influence, as suspected. One Job is logged (the one without transaction) and one job is NOT logged (the one with transaction) Both jobs are configured exactly the same, the only difference in them is that one has the sleep inside of a transaction and the other does not. I'm not sure how better we can show you this bug. I highly suspect that you did not execute the steps we provided to reproduce the error or else you could confirm this behaviour by proof of one entry in the failed_jobs table. If you reconsider to reopen this ticket, we'll be glad to push it as a repository as you requested. |
@crynobone we've had further success in this matter: Strangely the bug is fixed if you use 'Batchable' on the Job. The reason the job is then logged correctly is because of Illuminate\Queue\Jobs\Job. In the 'fail' method there is a check on line 195:
In this case you're running 'rollBack' on the $batchRepository. Even if the job is not attached to a batch, this fixes the problem. However i don't think its correct for us to have to set all jobs as batchable to resolve this issue. I still believe this should also be handled for jobs without batches. How do you see this? |
Tries doesn't have a default of framework/src/Illuminate/Queue/Jobs/Job.php Line 278 in 047edbc
|
@crynobone ok, still however as mentioned, we set tries to 1 and it still happens. Please also refer to my temporary solution that if you use 'Batchable' on the Job, it works, without it does not. How do you feel about this? |
I cannot replicate the issue and your reproduction code doesn't show what you are claiming is true. This is why we advised sending the repository to GitHub. |
@crynobone you are confirming you downloaded our ZIP, ran the commands we provided, and you did NOT see the outcome we described? I do not see how this is possible. I can provide a screenshare movie if you prefer. Or are you telling me that you will only reproduce it when we have pushed it to repository on github? |
Having everything in a single commit allow us to completely eliminate any other possibility outside just the affected code instead having to verify evey files inside the zip file. |
@crynobone a coworker of mine has uploaded the repo as requested: https://github.com/GrzegorzMorgas/laravel-bug-report We were a little confused if to commit everything as ONE commit or the base commit of base setup and then a second commit with the changes required to reproduce the issue. The instructions are a little ambigous to us: "commit the code ... as one separate commit" Please let us know if what we created is ok and you can reproduce or you would like the repository in a different way |
@graemlourens use the command from here: #49389 (comment) Then commit all custom changes separately. |
@driesvints @crynobone Understood. We've done as requested: https://github.com/GrzegorzMorgas/bug-report Step to reproduceexecute the command (in root directory of the folder):
Important findIf you 'use Batchable' on the Job, then also the one timeouting with transaction suddenly works. We have traced this back to
In Because there is a rollback there, it seems to work. However we believe it should not be necessary to set all jobs as Batchable to get around this bug. Thank you for your consideration & time |
Thanks @graemlourens. Might be a while before we can dig into this because we're all heading into holidays. |
I have reviewed this issue and it may not be an actual bug after all. This happens because of the async signal handler. Here are two solutions:
|
@gkmk Thank you for your insight and suggestions. The reason i would tend to shy away from option 1) is because then behaviour would be different between jobs that are batchable, and ones that are not. This would be an unnecessary inconsistency in my opinion. (See my previous comment) On a more subjective note, we have > 100 jobs, most of them have transactions and do not require failed methods. Option 2) would add failed methods just to handle this specific case to not loose jobs, which seems to be quite an overhead. Will be interesting to see what the core of laravel developers have to say. |
Yes, this is intended because Batchable Jobs offer
The documentation already cover this in detail: |
@crynobone @driesvints @themsaid we disagree with your assessment and rest our case. We hope that in future there will be a fix for this inconsistency which will reduce the chance of loosing jobs, which we find is crucial to be able to rely on laravel queues as a major component for larger applications. for anybody else worried about this: Our solution will most likely be to declare all Jobs as Batchable, which seems to not have any ill-effect and even if the job is not dispatched within a batch it therefore handles the transaction correctly so that the job is logged properly in case of timeout within a transaction. |
Read the documentation, the worker can be restarted using a process manager (which should be the default usage in any production applications anyway), or use |
@crynobone we use laravel horizon which automatically handles that. But please be aware: If the job had $tries=1 (which most of our jobs do) the job is NOT retried, and NOT logged, and therefore lost. Please correct my if i'm wrong. This is the whole point of this ticket - loosing jobs |
Laravel Version
10.37.3
PHP Version
8.2.7
Database Driver & Version
MariaDB 10.3.27
Description
We had a 'phantom' problem that jobs that failed, were not logged to the failed_jobs table. After some investigation we found out that by mistake we were doing a heavy process within a db transaction which lead the specific job to timeout.
After testing we found out that the issue only happens if the jobs timeout during an open db transaction
I reported this in laravel/horizon but it turns out it has nothing to do with horizon but seems to be a framework issue.
Steps To Reproduce
We would expect a job in the failed_jobs table, however this is not the case, and the JobFailed event is also not dispatched
The text was updated successfully, but these errors were encountered: