-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable resource monitoring in cgroup v2 machines #3316
Conversation
@@ -389,73 +502,8 @@ def log_root_paths(self): | |||
else: | |||
log_cgroup_info("The {0} controller is mounted at {1}".format(controller, mount_point)) | |||
|
|||
def start_extension_command(self, extension_name, command, cmd_name, timeout, shell, cwd, env, stdout, stderr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to abstract class
@@ -407,41 +423,6 @@ def _create_all_files(files_to_create): | |||
CGroupConfigurator._Impl._cleanup_unit_file(unit_file) | |||
return | |||
|
|||
@staticmethod | |||
def _get_current_cpu_quota(unit_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to util class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posted some minor comments/questions
azurelinuxagent/ga/cgroupapi.py
Outdated
@@ -250,12 +294,81 @@ def log_root_paths(self): | |||
""" | |||
raise NotImplementedError() | |||
|
|||
def start_extension_command(self, extension_name, command, cmd_name, timeout, shell, cwd, env, stdout, stderr, | |||
error_code=ExtensionErrorCodes.PluginUnknownFailure): | |||
def get_enabled_controllers_at_root(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplementedError()?
azurelinuxagent/ga/cgroupapi.py
Outdated
@@ -250,12 +294,81 @@ def log_root_paths(self): | |||
""" | |||
raise NotImplementedError() | |||
|
|||
def start_extension_command(self, extension_name, command, cmd_name, timeout, shell, cwd, env, stdout, stderr, | |||
error_code=ExtensionErrorCodes.PluginUnknownFailure): | |||
def get_enabled_controllers_at_root(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to get_controllers_enabled_at_root?
@@ -351,6 +351,22 @@ def _reset_agent_cgroup_setup(self): | |||
except Exception as err: | |||
logger.warn("Error while resetting the quotas: {0}".format(err)) | |||
|
|||
def _is_enforcement_supported(self, controller): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not check whether enforcement is supported, but rather whether we are enforcing limits using that controller, right? if yes, can we rename the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does two things
- If Agent supports enforcement (like feature support)
- If above is true, it will check controller is mounted/enabled to enforce the limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is 1 checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ex: CPU enforcement not supported in v2 yet but supported in v1
if controller == "cpu":
if self.using_cgroup_v2():
return False
elif "cpu,cpuacct" in self._cgroups_api.get_controllers_enabled_at_root():
return True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, that is my point. this is not checking for "supported" (e.g. CPU Quotas are supported on V2); it's checking whether we are using the feature or not. name is misleading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to _can_use_controller_for_enforcement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
azurelinuxagent/ga/cgroupapi.py
Outdated
finally: | ||
with self._systemd_run_commands_lock: | ||
self._systemd_run_commands.remove(process.pid) | ||
def get_enabled_controllers_at_root(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to abstract class too? And Maybe we should add a comment in the abstract class that this only returns the agent supported controllers enabled at root (cpu and/or memory)
@@ -351,6 +351,22 @@ def _reset_agent_cgroup_setup(self): | |||
except Exception as err: | |||
logger.warn("Error while resetting the quotas: {0}".format(err)) | |||
|
|||
def _is_enforcement_supported(self, controller): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving this to the controller implementations?
CpuControllerV1 -> return True
CpuControllerV2 -> return False
_MemoryController -> return False
Then during cgroup initialization, we can just check controller.is_enforcement_supported()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but for extensions we setup the limits before we start the extension, and we will not have controllers objects to get this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about making them static in the controller implementation and checking via cgroupapi?
If you think that overcomplicates then I'm ok with leaving the check here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure i followed, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each controller implementation has a static can_enforce() check:
CpuControllerV1 -> return True
CpuControllerV2 -> return False
_MemoryController -> return False
Cgroupapi implementations have a can_enforce_cpu() check:
CgroupApiV1 -> return CgroupV1.CPU_CONTROLLER in self._cgroup_mountpoints and CpuControllerV1.can_enforce()
CgroupApiV2 -> return CgropuV2.CPU_CONTROLLER in self._controllers_enabled_at_root and CpuControllerV2.can_enforce()
Then cgroupconfigurator just calls self._cgroupapi.can_enforce_cpu()
^ This allows us to avoid these if/else conditions, but I'm also ok with leaving as is if this overcomplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can even simply without static can_enforce(), not sure if we need this
Cgroupapi implementations have a can_enforce_cpu() check:
CgroupApiV1 -> return CgroupV1.CPU_CONTROLLER in self._cgroup_mountpoints
CgroupApiV2 -> return False
can_enforce_memory() -> return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
@@ -164,7 +164,7 @@ def _found_cgroup_procs(file): | |||
return True | |||
|
|||
# In v2, the cgroup.procs file is present in the scope cgroup for extensions | |||
for cgroup_file in glob.iglob(os.path.join(self.path, "*/cgroup.procs")): | |||
for cgroup_file in glob.iglob(os.path.join(self.path, "*.scope/cgroup.procs")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to match more specifically the scopes we create?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we create scopes for extension cmds we run. It will be like
install_xxx.scope
enable_xxx.scope
update_xxx.scope
more specific would be matching all cmds that we support. Even though it's limited list, I don't know if we want to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"*" would match any of those 3 examples, and any other scope, if there is ever one.
my ask is if we can make it match only the scope that we know we created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, never mind, sorry, your original change was ok. we want to continue tracking as long as there are any processes, no matter who created them, so * is OK
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3316 +/- ##
===========================================
+ Coverage 71.97% 72.69% +0.71%
===========================================
Files 103 114 +11
Lines 15692 17200 +1508
Branches 2486 2304 -182
===========================================
+ Hits 11295 12504 +1209
- Misses 3881 4139 +258
- Partials 516 557 +41 ☔ View full report in Codecov by Sentry. |
tests/ga/test_cgroupconfigurator.py
Outdated
@@ -176,6 +172,43 @@ def test_initialize_should_not_enable_cgroups_when_the_agent_is_not_in_the_syste | |||
self.assertEqual(len(tracked), 0, "No cgroups should be tracked. Tracked: {0}".format(tracked)) | |||
self.assertFalse(os.path.exists(agent_drop_in_file_cpu_quota), "{0} should not have been created".format(agent_drop_in_file_cpu_quota)) | |||
|
|||
def test_initialize_should_not_enable_cgroups_v2(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_initialize_should_not_enable_cgroups_v2(self): | |
def test_initialize_should_enable_cgroups_v2(self): |
should enable cgroups v2?
Description
This PR just enable the resource monitoring in v2(> ubuntu 22) machines.
Couple of distinctions between V1 and V2
Issue #
PR information
develop
branch.Quality of Code and Contribution Guidelines