Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: limit connections a worker can accept at any time #165

Merged
merged 1 commit into from
Feb 9, 2025

Conversation

ankush
Copy link
Member

@ankush ankush commented Feb 9, 2025

  • gthread by default accepts up to 1000 connections
  • it can only serve nthreads requests at any time i.e. 2-6 in our deployments.
  • So when worker dies it might have accepted 10s of requests which all get cancelled resulting in 502 error.

Fix:

  • Don't keep connections alive
  • Limit accepted connections to n+1 where n is number of threads. n+1 avoids serial chain of accept -> threadpool and ensures better threadpool utilization.

Note: diff says +2 because there's an off by one error in gunicorn logic, that doesn't matter much, we just need to avoid accepting huge number of requests +1 or +2 are both fine)

Things I verified:

  • Throughput is not affected / only marginally affected for silly things like ping i.e. threadpool utilization is just fine.
  • max futures never exceed n+1

@ankush ankush merged commit ebccbf0 into master Feb 9, 2025
2 checks passed
@ankush ankush deleted the gthread_max_con branch February 9, 2025 07:06
ankush added a commit that referenced this pull request Feb 9, 2025
@ankush
Copy link
Member Author

ankush commented Feb 9, 2025

Reverted, gunicorn has another bug where nr_conns are sometimes not being decremented back to 0 when all requests have ended. So it gets stuck in loop of hoping to reduce nr_conns enough to accept new connections but never does because there's nothing left to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant