Skip to content

Commit

Permalink
[Frontend] Fix tcp port reservation for api server
Browse files Browse the repository at this point in the history
PR #8537 changed this code to bind the TCP port prior to the engine
starting to ensure that port isn't used unexpectedly by something
launched during the engine startup. (ray is mentioned in the
discussino history and the comment in the code)

This change introduced some new unexpected behavior as discussed in
issue #9737. Restarting vllm within a short time after handling an API
request where the client hasn't closed its end of the connection would
cause vllm to fail to start with a "port in use" error.

The primary issue was the use of the `fd` option to the uvicorn
config. This option does not actually do what we want. The relevant
code can be found here:

https://github.com/encode/uvicorn/blob/fe3910083e3990695bc19c2ef671dd447262ae18/uvicorn/config.py#L496-L501

The important line to note is this one:

    sock = socket.fromfd(self.fd, socket.AF_UNIX, socket.SOCK_STREAM)

Note that `uvicorn` is expecting the fd to be for an `AF_UNIX` socket.
We are passing in a TCP (`AF_INET`) socket. We seem to be lucky that
this mostly works anyway, though it's surprising it works at all!

Instead of using this `fd` option, set the `SO_REUSEADDR` option on
the socket, which will allow `uvicorn` to bind to the same address and
port.

When reserving the port, we previously always specified `""` for the
host. I fixed that too so that we bind to the host that we pass to
`uvicorn` for it to bind to, as well.

Finally, explicitly `close()` the socket as a final step of cleanup.

Closes #9737

Signed-off-by: Russell Bryant <[email protected]>
  • Loading branch information
russellb committed Nov 5, 2024
1 parent 04cef2c commit a402d48
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions vllm/entrypoints/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -569,7 +569,8 @@ async def run_server(args, **uvicorn_kwargs) -> None:
# This avoids race conditions with ray.
# see https://github.com/vllm-project/vllm/issues/8204
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("", args.port))
sock.bind((args.host or "", args.port))
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

def signal_handler(*_) -> None:
# Interrupt server on sigterm while initializing
Expand All @@ -593,13 +594,14 @@ def signal_handler(*_) -> None:
ssl_certfile=args.ssl_certfile,
ssl_ca_certs=args.ssl_ca_certs,
ssl_cert_reqs=args.ssl_cert_reqs,
fd=sock.fileno(),
**uvicorn_kwargs,
)

# NB: Await server shutdown only after the backend context is exited
await shutdown_task

sock.close()


if __name__ == "__main__":
# NOTE(simon):
Expand Down

0 comments on commit a402d48

Please sign in to comment.