Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Death #664

Closed
josephjclark opened this issue Apr 15, 2024 · 1 comment · Fixed by #668
Closed

Worker Death #664

josephjclark opened this issue Apr 15, 2024 · 1 comment · Fixed by #668
Labels
bug Something isn't working

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Apr 15, 2024

In production this afternoon we had a nasty case of worker death

image

Link to logs

A couple of things to note:

  • The worker is queuing requests internally. Like the worker accepts a job, hands it to the child process pool, and the pool enqueues it. That shouldn't be happening
  • It looks like we had some errors and then maybe timeouts after logs?
  • Mutchi mentioned a postgres bug around this time

Suspicion 1: Child processes are dying and not being recreated (but maybe they;'re managing to send out an error to the worker so that the queue is getting freed up)

Suspicion 2: Postgres is erroring and calling process.exit(), and the exit is not properly handled

@josephjclark josephjclark added the bug Something isn't working label Apr 15, 2024
@github-project-automation github-project-automation bot moved this to New Issues in v2 Apr 15, 2024
@josephjclark josephjclark moved this from New Issues to In progress in v2 Apr 15, 2024
@josephjclark
Copy link
Collaborator Author

I have managed to repro locally.

I don't really think it matters what the error is. In my case, I'm passing "ssl": true and my postgres fails because there's no cert.

But the symptom is the same: the workflow errors, logs start timing out, and the workerpool does not relinquish the child process.

@josephjclark josephjclark mentioned this issue Apr 16, 2024
3 tasks
@github-project-automation github-project-automation bot moved this from In progress to Done in v2 Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant