Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduce write latency #18121

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chaochn47
Copy link
Member

@chaochn47 chaochn47 commented Jun 4, 2024

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

Related to #18109

Please run

GOWORK=off make build
GOWORK=off make tools

rm -rf default.etcd /tmp/etcd.log

# start etcd with 10GB quota
bin/etcd --quota-backend-bytes=10737418240 --log-outputs=/tmp/etcd.log

In another terminal run

bin/tools/benchmark watch-latency --streams 1 --watchers-per-stream 1 --prevkv=true --put-total=10000 --put-rate=400 --key-size=256 --val-size=262144

Result I got with multiple runs

Put summary:

Summary:
  Total:	20.6745 secs.
  Slowest:	1.4204 secs.
  Fastest:	0.0564 secs.
  Average:	0.7613 secs.
  Stddev:	0.2624 secs.
  Requests/sec:	502.9860

Response time histogram:
  0.0564 [1]	|
  0.1928 [333]	|∎∎
  0.3292 [402]	|∎∎∎
  0.4656 [1016]	|∎∎∎∎∎∎∎
  0.6020 [486]	|∎∎∎
  0.7384 [791]	|∎∎∎∎∎∎
  0.8748 [5092]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.0112 [1142]	|∎∎∎∎∎∎∎∎
  1.1476 [291]	|∎∎
  1.2840 [568]	|∎∎∎∎
  1.4204 [277]	|∎∎

Latency distribution:
  10% in 0.3665 secs.
  25% in 0.6338 secs.
  50% in 0.8286 secs.
  75% in 0.8611 secs.
  90% in 1.0341 secs.
  95% in 1.2456 secs.
  99% in 1.4002 secs.
  99.9% in 1.4185 secs.

Signed-off-by: Chao Chen <[email protected]>
@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@@ -61,31 +61,23 @@ func init() {
}

func watchLatencyFunc(_ *cobra.Command, _ []string) {
key := string(mustRandBytes(watchLKeySize))
key := "/registry/pods"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this impacts the results? If it doesn't then please remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. Will remove.

Just to demonstrate that it could happen on pods, and it's a generic problem (not limited to events)

putReport.Results() <- report.Result{Start: start, End: end}
putTimes[i] = end

var putCount atomic.Uint64
Copy link
Member

@serathius serathius Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what is the goal of the changes? From what I see you removed measuring watch latency, parallelized the puts, and you measure put latency. Maybe I'm just surprised you modified watch latency benchmark and not put one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that ==

I was meant to demonstrate that write latency could be impacted by slow syncWatcher.

Hopefully we can find a balance between write and watch latency.

@@ -83,7 +83,7 @@ func NewReport(precision string) Report { return newReport(precision) }

func newReport(precision string) *report {
r := &report{
results: make(chan Result, 16),
results: make(chan Result, 65536),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this impacts the results? If it doesn't then please remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. Will remove.

@chaochn47
Copy link
Member Author

chaochn47 commented Jun 17, 2024

This is not a proper reproduce of the issue we have seen in the clusterloader2 test.

The put latency is bounded by the disk IO which could be reflected by wal fsync latency metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants