Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly unnecessary polling in gthread #3317

Open
ankush opened this issue Oct 29, 2024 · 1 comment · May be fixed by #3319
Open

Possibly unnecessary polling in gthread #3317

ankush opened this issue Oct 29, 2024 · 1 comment · May be fixed by #3319

Comments

@ankush
Copy link

ankush commented Oct 29, 2024

I was tracing system calls for some unrelated reason and observed that an idle gthread worker makes epoll_wait system call every second.

λ sudo strace -p `pgrep -f "gunicorn: worker" | head -n1`
strace: Process 30815 attached
epoll_wait(7, [], 1, 666)               = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */, {tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */, {tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */, {tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */, {tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */], 0) = 0
epoll_wait(7, ^Cstrace: Process 30815 detached
 <detached ...>

Here is the relevant code:

events = self.poller.select(1.0)

This poller is used for waking up process to:

  1. Accept incoming connections -
    self.poller.register(sock, selectors.EVENT_READ, acceptor)
  2. Read incoming data -
    self.poller.register(conn.sock, selectors.EVENT_READ,
    partial(self.on_client_socket_readable, conn))

Since same poller is used for both, there's no question of delays induced by blocking epoll call... and if blocking was indeed happening then 1 second blocking is anyways unacceptable.

This doesn't happen for sync worker which has effective timeout same as request timeout. That too is only done to let master process know worker is still alive.

ret = select.select(self.wait_fds, [], [], timeout)

My question is why the timeout needs to be so short? This short timeout causes the process to wake up every second even though nothing has happened.


Is this done to clean up futures frequently?

result = futures.wait(self.futures, timeout=0,
return_when=futures.FIRST_COMPLETED)

@ankush
Copy link
Author

ankush commented Oct 29, 2024

Oof, I meant to post it in https://github.com/benoitc/gunicorn/discussions 😐

Anyway, this is kind of somewhere in the middle.

@ankush ankush changed the title Potential CPU cycles waste in gthread polling mechanism? Possibly unnecessary polling in gthread Oct 29, 2024
ankush added a commit to ankush/gunicorn that referenced this issue Oct 29, 2024
gthread calls epoll_wait (and 2 other syscalls) every second because it
specifies timeout to be 1 second.

```
λ sudo strace -p `pgrep -f "gunicorn: worker" | head -n1`
strace: Process 30815 attached
epoll_wait(7, [], 1, 666)               = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */, {tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */, {tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */, {tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */, {tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */], 0) = 0
epoll_wait(7, ^Cstrace: Process 30815 detached
 <detached ...>
 ```

Timing out every second wakes up the process and loads it on CPU even
if there is nothing to service.

This can be detrimental when you have total workers >> total cores and
multi-tenant setup where certain tenants might be sitting idle. (but not
"idle enough" because of 1s polling timeout)

This can possibly keep a completed future in queue for a small while,
but I don't see any obious problem with it except few bytes of extra
memory usage. I could be wrong here.

fixes benoitc#3317
ankush added a commit to ankush/gunicorn that referenced this issue Oct 29, 2024
gthread calls epoll_wait (and 2 other syscalls) every second because it
specifies timeout to be 1 second.

```
λ sudo strace -p `pgrep -f "gunicorn: worker" | head -n1`
strace: Process 30815 attached
epoll_wait(7, [], 1, 666)               = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */, {tv_sec=3157, tv_nsec=198136276} /* 1970-01-01T06:22:37.198136276+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */, {tv_sec=3158, tv_nsec=204192934} /* 1970-01-01T06:22:38.204192934+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */, {tv_sec=3159, tv_nsec=210145196} /* 1970-01-01T06:22:39.210145196+0530 */], 0) = 0
epoll_wait(7, [], 1, 1000)              = 0
getppid()                               = 30800
utimensat(6, NULL, [{tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */, {tv_sec=3160, tv_nsec=215517372} /* 1970-01-01T06:22:40.215517372+0530 */], 0) = 0
epoll_wait(7, ^Cstrace: Process 30815 detached
 <detached ...>
 ```

Timing out every second wakes up the process and loads it on CPU even
if there is nothing to service.

This can be detrimental when you have total workers >> total cores and
multi-tenant setup where certain tenants might be sitting idle. (but not
"idle enough" because of 1s polling timeout)

This can possibly keep a completed future in queue for a small while,
but I don't see any obious problem with it except few bytes of extra
memory usage. I could be wrong here.

fixes benoitc#3317
@ankush ankush linked a pull request Oct 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant