Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit gProfiler memory & CPU usage and --log-usage support in exe mode. #564

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Creatone
Copy link

@Creatone Creatone commented Oct 27, 2022

Description

TBA / WIP

Related Issue

#529 #227

Motivation and Context

TBA / WIP

How Has This Been Tested?

TBA / WIP

Screenshots

Checklist:

  • I have read the CONTRIBUTING document.
  • I have updated the relevant documentation.
  • I have added tests for new logic.

Paweł Szulik added 2 commits October 24, 2022 17:09
@Jongy Jongy self-requested a review October 27, 2022 19:11
@Jongy Jongy added the enhancement New feature or request label Oct 27, 2022
gprofiler/main.py Outdated Show resolved Hide resolved
help="Limit on the memory used by gProfiler."
)

parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I specific no CPU limit? i.e --limit-cpu none should be a possible parameter, not necessarily this way, but you should be able to limit ONLY the memory or CPU.

default=0.5, # 500m, same as in the k8s DaemonSet
dest="cpu_limit",
type=float,
help="Limit on the cpu used by gProfiler."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Limit on the cpu used by gProfiler."
help="Limit on the cpu used by gProfiler. Units are cores and the default is %(default)s"

same for memory (with appropriate units)

gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
# Set limits and return path of the cgroup.
def set_limits(cpu: float, memory: int) -> str:
cgroups = {}
logger.debug("Check if cgroup version is supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is not needed IMO. As I suggested in https://github.com/Granulate/gprofiler/pull/564/files#r1007993619, log only the negative case.

gprofiler/main.py Outdated Show resolved Hide resolved
assert controller in cgroup_v1_hierarchies
return f"{cgroup_v1_hierarchies[controller]}{cgroup}"
else:
return f"{find_v2_hierarchy()}/{controller}{cgroup}"
Copy link
Contributor

@Jongy Jongy Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work now - because cgroups v2 files are different. For example - there is no cpuacct.usage file, but cpu.stat file which contains a usage_usec field.

I created a ticket for cgroups v2 support. Until that's done, I suggest you raise an exception here if v2 is in use.

Also I see that granulate-utils does this check: len(get_cgroups(os.getpid())) == 1 (in _verify_preconditions). It's subtly different from what you did here (checks the controllers available for this processes instead of the mounted controllers). I think it'll produce the same results, but let's be consistent and use the same check here (you can export it to a function in granulate-utils - generally, if you need to make changes in granulate-utils, you can opne a PR there as well, and in the gProfiler repo you just update the revision of the submodule to point to your PR revision)

gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
gprofiler/main.py Outdated Show resolved Hide resolved
@Creatone
Copy link
Author

@Jongy To be honest I added "[WIP]" to the title to show you that I'm working on that. There will be plenty of force pushes there. Many things will look differently. I wasn't guessing that you'll review that already. ✌️

Copy link
Contributor

@Jongy Jongy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing.

Let's test:

  • cgroup limitting when a container - gprofiler should not move / make changes.
  • cgroup limitting when a privileged executable - gprofiler should move & make changes
  • cgroup limitting when a non-privileged executable - gprofiler should try to move, fail, and continue running okay (this can be tested by running as root but with dropped capabilities.. happens in AWS Fargate, Databricks and a few other platforms)
    I have a few more ideas, will add

@Jongy
Copy link
Contributor

Jongy commented Oct 28, 2022

@Jongy To be honest I added "[WIP]" to the title to show you that I'm working on that. There will be plenty of force pushes there. Many things will look differently. I wasn't guessing that you'll review that already.

Oh sorry :sweat_smiley: I saw that it's not a draft so I started. Nevermind, no problem

@Jongy
Copy link
Contributor

Jongy commented Oct 28, 2022

By the way - we eventually use squash-and-merge so there is no need to force push, you can just add commits on top of it. But up to you

@Creatone
Copy link
Author

@Jongy Ahhhh it is my mistake, nevertheless, the code will be better reviewed :)

@Creatone Creatone marked this pull request as draft October 28, 2022 12:30
@Creatone Creatone changed the title [WIP] Limit gProfiler memory & CPU usage and --log-usage support in exe mode. Limit gProfiler memory & CPU usage and --log-usage support in exe mode. Oct 28, 2022
Paweł Szulik added 2 commits November 16, 2022 18:04
Signed-off-by: Paweł Szulik <[email protected]>
@Creatone
Copy link
Author

Creatone commented Dec 2, 2022

@Jongy Could you check the tests from c4be935?
It is a good way to check your conditions?

@Jongy
Copy link
Contributor

Jongy commented Dec 5, 2022

@Jongy Could you check the tests from c4be935? It is a good way to check your conditions?

The tests check if gProfiler behaved as predicted, but I prefer if we add true "end-to-end" tests, i.e - the test will actually check the cgroup limit imposed on the gProfiler process and see that it matches what we expected. You can get the gProfiler PID, get its cgroup, verify that it actually moved to a new cgroup named gprofiler, and that the limits of that cgroup are what we expect. You can use the same APIs from granulate-utils to read the actual limits set on the cgroup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants