Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker crashes when using S3 / sites with very short DNS TTL #30

Open
eriede opened this issue Jul 17, 2018 · 7 comments
Open

Worker crashes when using S3 / sites with very short DNS TTL #30

eriede opened this issue Jul 17, 2018 · 7 comments

Comments

@eriede
Copy link

eriede commented Jul 17, 2018

When a worker process connection times out after 2 * the DNS TTL period has passed the worker process may crash. (nginx 1.12). AWS S3 has a very short DNS timeout and exhibits the issue.

The issue is that the peer data structure is accessed after the dynamic server releases the peer.

I have created a patch which uses reference counting on active requests instead of the interleaving scheme currently used. This will ensure that the peer's memory remains valid during outstanding requests.

eriede pushed a commit to eriede/nginx-upstream-dynamic-servers that referenced this issue Jul 17, 2018
Use a reference counting scheme on open requests to keep the dynamic sever memory alive while a request could still have a reference to an old peer.
@zhaofeng0019
Copy link

i think your solution may cause memoy leak when there are requests can't end during a TTL -- the previous pool can't be free. And may not solve the memory crash when using upstream keepalive.
i think i may solve the problem
https://github.com/zhaofeng0019/nginx-upstream-dynamic-resolve-servers
and this solution is doing some references to your solution.
would you like to check it out? thanks a lot.

@eriede
Copy link
Author

eriede commented Jun 22, 2020 via email

@zhaofeng0019
Copy link

or would you please see my pull request for this project? it solves all the memory problem and works well.

@eriede
Copy link
Author

eriede commented Jun 22, 2020 via email

@zhaofeng0019
Copy link

you can see that in native nginx code , all the lb option will use round robin finally, including keepalive,
so you can see in my code, i only do ngx_http_upstream_init_round_robin function when the dns result changed, you can just save the function pointer of other lb option at the init_process function.

@zhaofeng0019
Copy link

if do in this way, support all the native nginx lb option, no crash, i tested.

@eriede
Copy link
Author

eriede commented Jun 22, 2020

I uploaded our keepalive solution that only works for round robin lb with 1.12, to get it out in the public domain. https://github.com/eriede/nginx-upstream-dynamic-servers/tree/round-robin-keepalive. I'm happy to do a code review on your code if you would like, but I don't have the time to do testing on the various nginx versions, so I can't accept pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants