Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overwrite extension heartbeat with policy error #3321

Merged
merged 13 commits into from
Mar 4, 2025

Conversation

mgunnala
Copy link

Description

Issue #

This PR addresses the following TODOs for the policy/allowlist feature:

  • in the case of an extension disallowed error, overwrite extension heartbeat with the error
  • instead of logging policy locally with each goal state, copy the policy file to the history folder

PR information

  • Ensure development PR is based on the develop branch.
  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

policy_error = None
try:
policy_engine = ExtensionPolicyEngine()
gs_history = self.protocol.get_goal_state().history
policy_engine = ExtensionPolicyEngine(gs_history)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to have ExtensionPolicyEngine expose the path to the policy file and save it here, in order to avoid adding a dependency on GoalStateHistory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -309,3 +310,7 @@ def save_shared_conf(self, text):

def save_manifest(self, name, text):
self.save(text, _MANIFEST_FILE_NAME.format(name))

def save_policy(self, text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this more generic, in case we need to add other files later... maybe add_file(self, path)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to save_file(self, path)

@@ -773,10 +775,27 @@ def __handle_ext_disallowed_error(self, ext_handler_i, error_code, report_op, me

ext_handler_i.set_handler_status(status=ExtHandlerStatusValue.not_ready, message=message, code=error_code)

# If extension is reporting heartbeat, overwrite the heartbeat file with error message.
heartbeat_file = os.path.join(conf.get_lib_dir(), ext_handler_i.get_heartbeat_file())
if os.path.isfile(heartbeat_file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to do this every time we report heartbeat, not only on when the error happens

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -3715,6 +3715,61 @@ def test_uninstall_should_succeed_if_extension_allowed(self):
vm_status = args[0]
self.assertEqual(0, len(vm_status.vmAgent.extensionHandlers))

def test_should_report_policy_failure_instead_of_heartbeat(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we do any end-to-end testing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added unit tests, since heartbeat isn't currently being e2e tested and none of the extensions we e2e test report heartbeat

If we want to add an e2e test for this, I could add AzureSecurityLinuxAgent to test heartbeat

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an e2e test with AzureSecurityLinuxAgent

"""
Save contents of the specified file to the history folder. Output file will have the same name as the original file.
"""
text = fileutil.read_file(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file copy would be better to maintain the permissions of the original file

policy_error = None
try:
gs_history = self.protocol.get_goal_state().history
policy_file = conf.get_policy_file_path()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move the responsibility of deciding how to get the path of the file (from conf currently) to the ExtensionPolicyEngine and then expose a method to get the file the engine is using (or None if no file)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the implementation - ExtensionPolicyEngine exposes a property "policy_file_contents" that can be copied to history here

try:
heartbeat = ext_handler_i.collect_heartbeat()
if heartbeat is not None:
handler_status.status = heartbeat.get('status')
if not ext_disallowed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                        if ext_disallowed:
                            pass  # The status already describes why the extension is disallowed
                        else:
                            handler_status.status = heartbeat.get('status')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

self._policy = self._parse_policy(policy_contents)

@property
def policy(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this being used?

Copy link
Author

@mgunnala mgunnala Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no longer being used currently, but I kept it here because we'll need to expose the policy for the extension-specific policy feature. I can remove it too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove this for now, I will add it back as needed in the future

gs_history = self.protocol.get_goal_state().history
policy_file = conf.get_policy_file_path()
if gs_history is not None and os.path.isfile(policy_file):
gs_history.save_file(policy_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not be saving the same policy we parse in ExtensionPolicyEngine init if we do this here.

Could we dump the contents of the policy file to the history folder in ExtensionPolicyEngine.__read_policy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally copied policy to history in PolicyEngine.__read_policy, but Norberto suggested moving it here to remove the PolicyEngine dependency on GoalStateHistory. The read time difference is minimal (< 0.01 ms), but if you think it's still an issue, I can move it back to PolicyEngine.

Copy link
Contributor

@maddieford maddieford Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narrieta @mgunnala what do you think of keeping the contents of the policy file when it's read as a property in ExtensionPolicyEngine. Then we can dump the contents to the history folder here in exthandlers.

If we keep it as is, it will be unlikely that we save a different policy to the history folder than what is being applied, but if it does happen it will be difficult to debug

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing the raw policy file contents as a property in ExtensionPolicyEngine makes sense to me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the implementation - ExtensionPolicyEngine stores raw policy file contents in "policy_file_contents" property immediately after reading, it is copied to history here.

handler_status.message = parse_formatted_message(heartbeat.get('formattedMessage'))
heartbeat_message = parse_formatted_message(heartbeat.get('formattedMessage'))
if ext_disallowed:
handler_status.message += " Extension was previously enabled and reported the following heartbeat:\n{0}".format(heartbeat_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handler_status.message is set to None in ExtHandlerStatus init. I don't think handler_status.message will be None since we write status on behalf of the Handler when it's disallowed, but we should probably add None check here before concatenating in case

Copy link
Author

@mgunnala mgunnala Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If handler_status.message is None here, what should it be set to?

report_ext_handler_status() doesn't have the exact cause of the ext disallowed error (policy failure, ext disabled, etc.) - we could set the message to something like "Extension will not be processed because it is disallowed. Extension was previously enabled ... " but that might be confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can update the check to 'if ext_disallowed and handler_status.message is not None'

If ext is disallowed and the handler_status.message is None then we've hit an unexpected case and should just write the heartbeat to the status imo.

We can add a comment that handler_status.message should not be None if the ext is disallowed (because the agent writes handler status on behalf of the ext in that case)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ext_disallowed and handler_status.message is not None:
We should not just pass the heartbeat, but rather add a generic message explaining the extension was not execute. Also, since this is an unexpected condition, we should probably send telemetry to alert us of the issue. If sending telemetry, be careful not send on every single heartbeat

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feel free to skip sending telemetry, as long was we have a unit test asserting not None when disallowed; it may happen that a faulty extension sets the status to None)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have unit tests asserting that status is appropriately reported (not None) when extension is disallowed.

Is it sufficient to add a message like "Extension was not processed: failed to report status." ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when not None, we use

Status + Extension was previously enabled and reported the following heartbeat:\n{0}

how about

Extension was not executed, but it was previously enabled and reported the following heartbeat:\n{0}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

narrieta
narrieta previously approved these changes Mar 4, 2025
@narrieta narrieta merged commit e2caad2 into Azure:develop Mar 4, 2025
11 checks passed
@mgunnala mgunnala deleted the overwrite_heartbeat branch March 4, 2025 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants