-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The asychttpserver dies when a reverse proxy is used. #18161
Comments
I did another test with several values for the argument assumedDescriptorsPerRequest. Even when it is equal to zero and in this case it shouldn't accept requests, the assumedDescriptorsPerRequest has no effect. My OS is Arch Linux, if it helps in something. import asynchttpserver, asyncdispatch
proc main {.async.} =
var server = newAsyncHttpServer()
proc cb(req: Request) {.async.} =
let headers = {"Date": "Tue, 29 Apr 2014 23:40:08 GMT",
"Content-type": "text/plain; charset=utf-8"}
await req.respond(Http200, "Hello World", headers.newHttpHeaders())
server.listen Port(9000)
while true:
echo "Max Descriptors: ", maxDescriptors()
echo "Active Descriptors: ", activeDescriptors()
if server.shouldAcceptRequest(assumedDescriptorsPerRequest = 1):
echo "Accept Request: ", activeDescriptors()
await server.acceptRequest(cb)
else:
poll()
asyncCheck main()
runForever() Result:
I did another test but with an app made in golang and in this case the issue also happens, but the server doesn't die. Is it possible to implement something identical in Nim?
|
I still dislike this For reference, lots of discussion in the PR for this feature: #15957. |
Yes @dom96, I also think that the implementation of golang is the most correct. But in the previous solution it was not possible to catch the exception #15925. I just want to catch the exception and have the server keep running and not die! I took a look at the asyncdispatch file and I think the activeDescriptors function doesn't read the actual value of the number of open files. ...
# https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asyncdispatch.nim#L299
proc getGlobalDispatcher*(): PDispatcher =
if gDisp.isNil:
setGlobalDispatcher(newDispatcher())
result = gDisp
...
# https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asyncdispatch.nim#L1939
proc activeDescriptors*(): int {.inline.} =
## Returns the current number of active file descriptors for the current
## event loop. This is a cheap operation that does not involve a system call.
when defined(windows):
result = getGlobalDispatcher().handles.len
elif not defined(nimdoc):
result = getGlobalDispatcher().selector.count
... |
potential duplicate of #16603, or at least related to it. note, i had a WIP branch to fix this on linux, by querying the actual number of fd's, see timotheecour#750, see the RFC here: nim-lang/RFCs#382 |
The function to get the number of opened file descriptors for user is useful and can be used in many contexts. I don't know if the function will slow everything down. Anything simple like this, but keep the http server running. proc serve*(server: AsyncHttpServer, port: Port,
callback: proc (request: Request): Future[void] {.closure, gcsafe.},
address = "") {.async.} =
listen server, port, address
while true:
var
address = ""
client: AsyncSocket
try:
(address, client) = await server.socket.acceptAddr()
except OSError as e:
echo e.meg # Too many open files
client.close()
await sleepAsync(500)
continue
# poll()
asyncCheck processClient(server, client, address, callback) |
to avoid serving a request if we know ahead of time it will fail (and cause other in-flight requests to fail as a result), eg: without shouldAcceptRequest logic, you can end up in a situation where all in-flight requests fail, in cases where you have requests that take a while to complete and that can open multiple fd's over the course of the request:
without shouldAcceptRequest, you could get:
another important benefit is to avoid in-flight requests to fail (which could be a bad thing for transactions etc) |
The entire basis of |
I've been checking the code in asynchttpserver file and found that in the processClient function the connection is never closed after the "while" loop. So even if the keepalive header is off the number of open files is always increasing. With the keepalive off and if the connection is closed after the "while" loop the server never dies. # https://github.com/nim-lang/Nim/blob/version-1-4/lib/pure/asynchttpserver.nim#L292
proc processClient(server: AsyncHttpServer, client: AsyncSocket, address: string,
callback: proc (request: Request):
Future[void] {.closure, gcsafe.}) {.async.} =
var request = newFutureVar[Request]("asynchttpserver.processClient")
request.mget().url = initUri()
request.mget().headers = newHttpHeaders()
var lineFut = newFutureVar[string]("asynchttpserver.processClient")
lineFut.mget() = newStringOfCap(80)
while not client.isClosed:
let retry = await processRequest(
server, request, client, address, lineFut, callback
)
if not retry: break
client.close() # <<<==== Close the connection |
I've been checking the code in asynchttpserver file and found that in the processClient function the connection is never closed after the "while" loop. So even if the keepalive header is off the number of open files is always increasing. With the keepalive off and if the connection is closed after the "while" loop the server never dies. If the header keepalive is on and if it exceeds the pre-defined number of file descriptors it automatically closes the connection! I also added a timeout for keepalive connections.
@timotheecour, can you see this PR #18198? |
yes, i commented there |
…ainst devel branch. The client header "connection" if equal to "keep-alive" is disabled whenever the number of file descriptors exceeds the specified maxFDs or exceeds the keep-alive timeout. If the value of header "connection" equal "close", the connection is closed, if it is "keep-alive" it remains open and the number of file descriptors increases and the server die. To avoid this issue the "keep-alive" connection is disabled. It is temporary until the number of files descriptors goes down, when the value goes down the connection can remain open. It also solves the issue when the "connection" is equal to "update".
I agree with @Varriount. My take on how to solve this: |
No, it's not a TOCTOU bug, it's simply defensive programming. Real world analogy: I don't send you 2 dollars when I have only 1 dollar left on my account. Yes, by the time you get the $2 somebody might give $6 to me so that I don't run into debt. But the chances of this happening are too slim so I won't risk it. |
How is this not a TOCTOU bug?
Also, why is
invalid?
How do you know that you only have 1 dollar left in your account? Keep in mind that, in order for the analogy to be accurate:
If you want to send me $2, it makes more sense to simply attempt to send me the money. Checking beforehand would incur a fee, and wouldn't actually guarantee that the transfer would be successful anyway. You would be able to get a rough estimate (for a fee), but that would be it. To be more explicit, there are per-process and system-wide file descriptor limits, as well as per-process and system-wide file descriptor counts, All four of these can change at any given moment. In particular, the number of file descriptors being used can change very rapidly. Furthermore, calling To be clear, I can see a rough check possibly being useful, but it runs into the problems outlined in this comment: nim-lang/RFCs#382 (comment) |
Well it continues and then eventually the connection would be accepted, much like if accept would have returned EAGAIN. You seem to misrepresent what is actually happening just so that you can name it "TOCTOU". Maybe this is helpful:
from https://doc.rust-lang.org/nomicon/races.html
But it is not the same check at all, |
So where is your PR? And how is "just log it" good API design? It's terrible, libraries shouldn't log by themselves, we have "frameworks" for that. |
Yes, "much like", but not exactly like. This logic is less accurate, and is vulnerable to a preventable race condition between the time it runs, and the time
I don't understand how this refutes any point in my argument, unless you're asserting that efforts to prevent race conditions are pointless. Yes, some race conditions can't be prevented, however others can, like in this situation.
Your point here seems to be that the check being performed before the call to accept is superior to handling the exception from
The first point is possible, but practically tiresome. As someone who works on large backend services, it is very difficult to know how many file descriptors are being used by even a single kind of request handler - I would have to measure not only how many file descriptors each kind of request handler in the service uses, but also the file descriptors being used by support libraries and running threads. Given that a single service can contain possibly hundreds of kinds of request handlers, and that I would have to continuously re-measure as development occurs and libraries are updated, such measurements would quickly become unfeasible. The second point is harder to do, but I suppose it is technically possible, if expensive from a performance standpoint (multiple syscalls would likely need to be made). The last point is impossible to do with a check, unless you're running in kernel mode. You would need to ensure that no file descriptor is allocated between the time the check is made, and the time the I have yet to be told how this argument:
is false, and have yet to be told how something like this:
won't work. Instead, tangential details of my arguments are attacked instead - classification of the bug - and unrelated points are brought up - the possible merits of the existing code. Call the bug TOCTOU, or a race condition, or anything you want. Point out any number of possible benefits the current logic may have. At the end of the day, there is still a bug here, even in the presence of the current logic, and it won't be fixed by making a check before a call to |
Quoting @Varriount ^. This is the reason why the
This is already an edge case in my eyes, if you're running out of fds on your system then you need to fix that. For edge cases like this there is no shame to simply copy what other mature libraries do. Golang logs so why don't we do the same? The server will continue running and things will be fine. You can also expose a function call that will throw an exception so that they can be handled for more custom circumstances. Now you ask me where my PR is, which implies that you'd agree with me if I just created a PR. That implies that you will accept whatever comes first, which isn't great. It also forgets that @mrhdias is trying to create the PR, I'm trying to help them do so. |
@Araq my PR would start by implementing nim-lang/RFCs#382 but I'm not sure why you downvoted it (without leaving a comment); it's useful for other use cases but in particular for properly implementing
estimating an upper bound is sufficient and should be feasible in lots of situations; in those cases server code can be made more robust against over-committing server and aborting in-flight requests by denying |
I feel like there's some confusion here, as this appears to be about two questions:
My answer to both questions is "No". I am speaking as someone who has developed, and currently does develop, backend servers/services for a living.
Let's expand this. How about just providing client code with the ability to determine whether a connection should be accepted? Then, if a situation calls for it, they can implement this class of logic in a way that is tailored to their program. That way, instead of hard-coding feature after feature that we assume users will want and use, we leave it up to the users. Also, lets be real: given how most applications are developed, do you really think a developer is going to remember to set this "estimated file descriptor usage" in 99.99% of cases? Honestly? Because if they don't set it, then in 99.99% of cases, this check is just wasted CPU cycles.
Do these situations overlap with the kinds of applications where this "feature" might actually be useful? And assuming that they do, you are stating that it is feasible for the for developers of such programs to:
|
Well here is what really happened: asynchttpserver had a severe bug (yet another btw). I fixed the bug. Yes, really. There was a test case that failed and after my PR it started to work. I know perfectly well that:
If you think that you cannot estimate the number of file descriptor handles well then I cannot help you but I notice that every major OS has a maximum limit of these things, indicating that it's a resource you should be aware of. |
I really I just want us to fix this bug correctly now. You keep saying you are waiting for a PR, I am just trying to help mrhdias implement this PR. Can we just agree to get rid of this API and expose proper exceptions for this case? With the default behaviour of logging these errors like Go (and probably most implementations) does? |
Good, I have no objection here.
Well I happen to like the mechanism. And if you tell me that we need logging otherwise to compensate for this case I like the mechanism even moreso because the Nim standard library doesn't log. I know we have no official guideline that says that, because it never occured to me that anybody would accept a standard library that logs. |
The psutils package of the python language has the "num_fds" function that gives he number of file descriptors currently opened for every OS. #!/usr/bin/python
# https://psutil.readthedocs.io/en/latest/#psutil.Process.num_fds
# https://github.com/giampaolo/psutil/blob/master/psutil/_pslinux.py#L2135
import psutil
p = psutil.Process()
print(p.num_fds()) |
Note that psutils isn't part of Python's standard library. |
Using resource module you can try replicate what psutil does here: BTW, is this ticket still applicable today? |
I made an app that uses the asynchttpserver library and it works well. But many times when there are a lot of requests the server die with the following error: Exception message: Too many open files.
I did a test with wrk and when the requests are made directly to the server without any reverse proxy in between the server don't crash. But when I use a reverse proxy the server goes down!
Nginx configuration
Example to test
After run:
Current Output
Additional Information
The text was updated successfully, but these errors were encountered: