Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance improvement: setup signal notify in a new go routine #4654

Merged
merged 1 commit into from
Mar 1, 2025

Conversation

lifubang
Copy link
Member

@lifubang lifubang commented Feb 28, 2025

There is a big loop(at least 65 times) in signal.Notify, it costs as much time as runc init, so we can call it in parallel ro reduce the container start time. In a general test, it can be reduced about 38.70% ↓.

In my test with this bash script:

#!/bin/bash
i=1
n=$2
runc=$1
while [ $i -le $2 ]; do
  $runc run test
  ((i++))
done

runc v1.2.5

lifubang@acmcoder:/opt/debian$ ./runc-1.2.5 -v
runc version 1.2.5
commit: v1.2.5-0-g59923ef1
spec: 1.2.0
go: go1.23.6
libseccomp: 2.5.5
lifubang@acmcoder:/opt/debian$ time sudo ./test.sh ./runc-1.2.5 100

real	0m5.152s
user	0m0.006s
sys	0m0.016s

runc with this patch

time sudo ./test.sh /home/lifubang/go/src/github.com/opencontainers/runc/runc 100

real	0m3.158s
user	0m0.010s
sys	0m0.014s

(5152 - 3158)/5152 = 0.38

@lifubang lifubang force-pushed the performance-in-signal-notify branch from 987a0d5 to 60026a7 Compare March 1, 2025 00:16
@lifubang
Copy link
Member Author

lifubang commented Mar 1, 2025

Can we include this one in v1.3.0-rc.1? @kolyshkin @rata

@kolyshkin
Copy link
Contributor

Can we include this one in v1.3.0-rc.1? @kolyshkin @rata

I'm not against it, and let @cyphar decide.

The code LGTM, and in my measurements it shows 5% to 20% wall clock time improvement, which is still very good.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kolyshkin kolyshkin force-pushed the performance-in-signal-notify branch from 60026a7 to a7b0a82 Compare March 1, 2025 01:19
@@ -246,7 +246,7 @@ func (r *runner) run(config *specs.Process) (int, error) {
// Setting up IO is a two stage process. We need to modify process to deal
// with detaching containers, and then we get a tty after the container has
// started.
handler := newSignalHandler(r.enableSubreaper, r.notifySocket)
handlerCh := newSignalHandler(r.enableSubreaper, r.notifySocket)
tty, err := setupIO(process, r.container, config.Terminal, detach, r.consoleSocket)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see if doing setupIO or setupPidfdSocket in a goroutine also provides a speed-up? I guess they're both not very complicated...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test these two paths, both of them spend only several milliseconds on doing the setup tasks.
I checked the code, they only connect to a socket or run a small loop for stdio, so I think it's not worth to run them in a new go routine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Thanks!

@cyphar
Copy link
Member

cyphar commented Mar 1, 2025

I'm happy to take it in -rc.1 if it's ready (I had a nit and a follow-up question) but we could also add it to -rc.2 since it's a trivial bug / performance fix and not a new feature.

@lifubang lifubang force-pushed the performance-in-signal-notify branch 2 times, most recently from 89c945f to 33b9f2b Compare March 1, 2025 12:14
@lifubang lifubang changed the title performance improvement: setup signal notify in an new go routine performance improvement: setup signal notify in a new go routine Mar 1, 2025
There is a big loop(at least 65 times) in `signal.Notify`, it costs as much
time as `runc init`, so we can call it in parallel ro reduce the container
start time. In a general test, it can be reduced about 38.70%.

Signed-off-by: lifubang <[email protected]>
(cyphar: move signal channel definition inside goroutine)
Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar cyphar force-pushed the performance-in-signal-notify branch from 33b9f2b to d92dd22 Compare March 1, 2025 12:47
@cyphar
Copy link
Member

cyphar commented Mar 1, 2025

FWIW, it seems like it's more like 5-10% faster on my box, but still a decent improvement:

% sudo hyperfine -w 50 -r 500 "./runc.with-patch run -b bundle test" "./runc.without-patch run -b bundle test"
Benchmark 1: ./runc.with-patch run -b bundle test
  Time (mean ± σ):      13.6 ms ±   1.4 ms    [User: 5.5 ms, System: 11.6 ms]
  Range (min … max):    10.5 ms …  19.4 ms    500 runs

Benchmark 2: ./runc.without-patch run -b bundle test
  Time (mean ± σ):      14.2 ms ±   1.4 ms    [User: 5.7 ms, System: 11.6 ms]
  Range (min … max):    11.3 ms …  17.8 ms    500 runs

Summary
  ./runc.with-patch run -b bundle test ran
    1.04 ± 0.15 times faster than ./runc.without-patch run -b bundle test
% sudo hyperfine -w 50 -r 1000 "./runc.with-patch run -b bundle test" "./runc.without-patch run -b bundle test"
Benchmark 1: ./runc.with-patch run -b bundle test
  Time (mean ± σ):      13.5 ms ±   1.5 ms    [User: 5.7 ms, System: 11.4 ms]
  Range (min … max):    10.1 ms …  18.8 ms    1000 runs

Benchmark 2: ./runc.without-patch run -b bundle test
  Time (mean ± σ):      14.7 ms ±   1.5 ms    [User: 6.1 ms, System: 11.7 ms]
  Range (min … max):    11.4 ms …  20.0 ms    1000 runs

Summary
  ./runc.with-patch run -b bundle test ran
    1.09 ± 0.16 times faster than ./runc.without-patch run -b bundle test

@cyphar cyphar merged commit 701516b into opencontainers:main Mar 1, 2025
34 checks passed
@lifubang
Copy link
Member Author

lifubang commented Mar 1, 2025

FWIW, it seems like it's more like 5-10% faster on my box, but still a decent improvement:

Very strange, the improve percentage may be related to the configuration of the machine?
In my machine, it has a 68-69% improvement for runc run:

lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 500 "./runc-patch run test" "./runc-HEAD run test"
Benchmark 1: ./runc-patch run test
  Time (mean ± σ):      29.7 ms ±   3.8 ms    [User: 7.6 ms, System: 26.6 ms]
  Range (min … max):    21.5 ms …  42.2 ms    500 runs
 
Benchmark 2: ./runc-HEAD run test
  Time (mean ± σ):      50.2 ms ±   4.7 ms    [User: 8.5 ms, System: 30.8 ms]
  Range (min … max):    39.9 ms …  75.6 ms    500 runs
 
Summary
  ./runc-patch run test ran
    1.69 ± 0.27 times faster than ./runc-HEAD run test
lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 1000 "./runc-patch run test" "./runc-HEAD run test"
Benchmark 1: ./runc-patch run test
  Time (mean ± σ):      30.1 ms ±   4.0 ms    [User: 7.7 ms, System: 26.9 ms]
  Range (min … max):    19.9 ms …  60.1 ms    1000 runs
 
Benchmark 2: ./runc-HEAD run test
  Time (mean ± σ):      50.4 ms ±   5.1 ms    [User: 8.6 ms, System: 30.6 ms]
  Range (min … max):    38.1 ms …  96.9 ms    1000 runs
 
Summary
  ./runc-patch run test ran
    1.68 ± 0.28 times faster than ./runc-HEAD run test

And has a 93-106% improvement for runc exec [-t|-d].

lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 500 "./runc-patch exec test true" "./runc-HEAD exec test true" Benchmark 1: ./runc-patch exec test true Time (mean ± σ): 22.4 ms ± 3.2 ms [User: 6.9 ms, System: 21.7 ms] Range (min … max): 14.1 ms … 36.7 ms 500 runs

Benchmark 2: ./runc-HEAD exec test true
Time (mean ± σ): 43.7 ms ± 4.7 ms [User: 7.5 ms, System: 23.7 ms]
Range (min … max): 32.0 ms … 73.4 ms 500 runs

Summary
./runc-patch exec test true ran
1.95 ± 0.35 times faster than ./runc-HEAD exec test true
lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 1000 "./runc-patch exec test true" "./runc-HEAD exec test true"
Benchmark 1: ./runc-patch exec test true
Time (mean ± σ): 22.7 ms ± 3.7 ms [User: 6.6 ms, System: 22.1 ms]
Range (min … max): 12.2 ms … 59.6 ms 1000 runs

Benchmark 2: ./runc-HEAD exec test true
Time (mean ± σ): 44.1 ms ± 5.2 ms [User: 7.4 ms, System: 23.9 ms]
Range (min … max): 32.3 ms … 106.7 ms 1000 runs

Summary
./runc-patch exec test true ran
1.94 ± 0.39 times faster than ./runc-HEAD exec test true
lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 500 "./runc-patch exec -d test true" "./runc-HEAD exec -d test true"
Benchmark 1: ./runc-patch exec -d test true
Time (mean ± σ): 20.4 ms ± 2.9 ms [User: 4.7 ms, System: 15.2 ms]
Range (min … max): 13.3 ms … 30.9 ms 500 runs

Benchmark 2: ./runc-HEAD exec -d test true
Time (mean ± σ): 41.9 ms ± 3.8 ms [User: 5.3 ms, System: 18.4 ms]
Range (min … max): 31.5 ms … 53.4 ms 500 runs

Summary
./runc-patch exec -d test true ran
2.06 ± 0.34 times faster than ./runc-HEAD exec -d test true
lifubang@acmcoder:/opt/debian$ sudo hyperfine -w 50 -r 500 "./runc-patch exec -t test true" "./runc-HEAD exec -t test true"
Benchmark 1: ./runc-patch exec -t test true
Time (mean ± σ): 22.7 ms ± 3.1 ms [User: 6.7 ms, System: 22.1 ms]
Range (min … max): 16.2 ms … 35.6 ms 500 runs

Benchmark 2: ./runc-HEAD exec -t test true
Time (mean ± σ): 43.9 ms ± 4.0 ms [User: 7.4 ms, System: 24.9 ms]
Range (min … max): 34.0 ms … 56.5 ms 500 runs

Summary
./runc-patch exec -t test true ran
1.93 ± 0.32 times faster than ./runc-HEAD exec -t test true

@lifubang lifubang deleted the performance-in-signal-notify branch March 1, 2025 16:21
@cyphar
Copy link
Member

cyphar commented Mar 1, 2025

I find it interesting that your runs spend much more time in system time than me (as well as each run being ~5x slower). Out of interest, what CPU are you running on / what kernel? I'm on an AMD Ryzen 7 7840U and Linux 6.12.6-1-default.

@kolyshkin
Copy link
Contributor

As I said earlier, I'm also seeing 5% to 20% improvement (Linux 6.12 / Intel Core i7-12800H).

@lifubang
Copy link
Member Author

lifubang commented Mar 2, 2025

I find it interesting that your runs spend much more time in system time than me (as well as each run being ~5x slower).

Perhaps I should run these performance tests on the host machine instead of a virtual machine.
I was running these tests in a virtual machine in mac m1 notebook:

lifubang@acmcoder:/opt/debian$ lscpu
Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   8
  On-line CPU(s) list:    0-7

lifubang@acmcoder:/opt/debian$ uname -a
Linux acmcoder 6.8.0-54-generic #56-Ubuntu SMP PREEMPT_DYNAMIC Sat Feb  8 00:17:08 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

I think we also need to test it in a virtual machine, because most of servers are running in a virtual machine in the cloud. But maybe my virtual machine needs to be improved, I will test it on the cloud host when I have the chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants