Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to study CPU-to-memory utilization based on the data from providers #144

Open
ustiugov opened this issue Feb 9, 2023 · 3 comments
Open
Assignees

Comments

@ustiugov
Copy link
Member

ustiugov commented Feb 9, 2023

In our load scenarios, the fleet is CPU bound with quite low memory usage. Alibaba, however, reports that although a "typical" server would run 2k function instances, their CPU utilization remains <50% in their recent RunD paper.

One need to evaluate CPU and memory utilization, using their CPU/memory ratio (100 vCPU, probably with SMT enabled, to 384GB, or 1 to 4), which is similar to our expectation for AWS (metal.m5 instance: 48vCPU with SMT disabled to 384GB).

Please configure a similar ratio by disabling cores or occupying memory, and run the slowdown sweep experiment again.

The expected outcome would be the refined CPU quotas derived from the memory usage, which would deliver similar CPU and memory utilization levels as reported by the providers.

@leokondrashov
Copy link
Contributor

As I looked into the function image code, it does not actually use any memory as caller requests, it just reports the amount of requested memory without allocating it.
@cvetkovic, am I correct? I've seen you are the last one to change this part in image code.

If so, there is no reason to try other CPU/memory ratio because now memory is only used for the pod itself, not for imitating the "useful" work. Memory allocation should be fixed at the first place, after that we would be able to continue to throttle the server by memory consumption.

@ustiugov
Copy link
Member Author

@cvetkovic is that the case in the main branch? I missed this change somehow

@cvetkovic
Copy link
Contributor

cvetkovic commented Feb 14, 2023

@leokondrashov The way we do it currently is that we give hint to the kube-scheduler through CPU and MEM requests. This guarantees that the pod will get at least the amount of resources specified in requests. This value is also used by Linux cgroups for resource throttling/multiplexing on resource overcommitment. Limits means that a pod will be evicted if it uses more resources than specified in limits.

Here is the way we calculate these values. OVERCOMMITMENT_RATIO is 10.
image

We currently do not allocate any memory in the functions that run as we experienced a lot of timeouts if memory allocation is done. Malloc can take a lot of time when a memory chunk is bigger, and cannot fit into a single-digit milliseconds function execution time.

I think this feature of the load generator is the most sensitive in terms of designing it properly. One needs to fully understand what K8s offers, and we didn't have too much time to explore this. What we have currently is just a temporary solution which I and Dmitrii agreed upon while redesigning the loader.

Let me know if you want to have a chat on this issue sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants