Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Encoding for LIST Responses #5116

Open
4 tasks
serathius opened this issue Jan 31, 2025 · 3 comments
Open
4 tasks

Streaming Encoding for LIST Responses #5116

serathius opened this issue Jan 31, 2025 · 3 comments
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@serathius
Copy link
Contributor

Enhancement Description

  • One-line enhancement description (can be used as a release note): Streaming Response Encoding
  • Kubernetes Enhancement Proposal:
  • Previous discussion: Streaming json list encoder kubernetes#129334
  • Primary contact (assignee): serathius@
  • Responsible SIGs: api-machinery
  • Enhancement target (which target equals to which milestone):
    • Beta release target (x.y): 1.33
    • Stable release target (x.y): 1.34
  • Beta
    • KEP (k/enhancements) update PR(s):
    • Code (k/k) update PR(s):
    • Docs (k/website) update(s):
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 31, 2025
@serathius
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 31, 2025
@serathius serathius changed the title Streaming Response Encoding Streaming Encoding for LIST Responses Jan 31, 2025
@chenk008
Copy link
Contributor

chenk008 commented Feb 7, 2025

I'm glad to see this proposal. We have also implemented similar capabilities in our inner repo and are preparing to push this part to upstream. We have submitted a CFP for the upcoming KubeCon China conference.

In our implementation, we use sync.Pool to efficiently manage memory allocation and cache the serialized results of each item. When the buffer reaches a certain size, we execute a flush operation to parallelize the serialization processing and write to http2.

Additionally, we have added support for gzip compression, which is only enabled when the first batch of cached data reaches 128 * 1024.

For json serialization, we have customized the StreamMarshal method for unstructuredList.

As for protobuf, we generate code through a generator to ensure reverse protobuf marshalling compatibility.

type StreamMarshaller interface {
	// return the object size and the item size slice
	StreamSize() (uint64, []int)

	StreamMarshal(w stream.Writer, itemSize []int) error
}

And it has conducted extensive testing with large datasets and have obtained comparative results. @yulongfang Can you share some benchmark results?

@yulongfang
Copy link

Thank @chenk008 for your introduction. We have many large-scale clusters in Alibaba Cloud. When the controllers of these large-scale clusters are restarted, they will initiate a full list request to the apiserver, which will have a certain impact on the stability of the cluster. We have to use larger machines to run the apiserver, resulting in a waste of resources.

In this context, we adopted the method to carry out relevant optimization and achieved the following results.

list json format return data stress test scenario description:

  • apiserver version: 1.30
  • apiserver specification: 32c 128GB
  • apiserver replica number: 1 replica
  • number of stock resources: build 10,000 100kb cr information
  • stress test scenario: increase pressure according to the gradient of qps 0.1 / 0.5

list json format return data related stress test data:

qps 0.05

  • before optimization: cpu 35.7 c mem 89Gb
  • stream json after optimization: cpu 6.22 c mem 60 Gb

qps 0.1

  • before optimization: cpu 11 c mem 146Gb
  • stream json after optimization: cpu 7.45 c mem 97 Gb

list protobuf Format Returned data Stress test scenario description:

  • apiserver version: 1.30
  • apiserver specification: 32c 128GB
  • apiserver replica number: 1 replica
  • Number of existing resources: Build 10,000 configmaps information of size 100kb
  • Stress test scenario: Increase pressure according to the gradient of qps 0.1 / 0.5

list configmaps format Returned data Related stress test data:

qps 0.05

  • Before optimization: cpu 16.8 c mem 54.3Gb
  • After stream json optimization: cpu 16.8 c mem 16.1 Gb

qps 0.1

  • Before optimization: cpu 42 c mem 122Gb
  • After stream json optimization: cpu 42 c mem 18 Gb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

4 participants