-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwrite extension heartbeat with policy error #3321
Conversation
azurelinuxagent/ga/exthandlers.py
Outdated
policy_error = None | ||
try: | ||
policy_engine = ExtensionPolicyEngine() | ||
gs_history = self.protocol.get_goal_state().history | ||
policy_engine = ExtensionPolicyEngine(gs_history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may want to have ExtensionPolicyEngine expose the path to the policy file and save it here, in order to avoid adding a dependency on GoalStateHistory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -309,3 +310,7 @@ def save_shared_conf(self, text): | |||
|
|||
def save_manifest(self, name, text): | |||
self.save(text, _MANIFEST_FILE_NAME.format(name)) | |||
|
|||
def save_policy(self, text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this more generic, in case we need to add other files later... maybe add_file(self, path)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to save_file(self, path)
azurelinuxagent/ga/exthandlers.py
Outdated
@@ -773,10 +775,27 @@ def __handle_ext_disallowed_error(self, ext_handler_i, error_code, report_op, me | |||
|
|||
ext_handler_i.set_handler_status(status=ExtHandlerStatusValue.not_ready, message=message, code=error_code) | |||
|
|||
# If extension is reporting heartbeat, overwrite the heartbeat file with error message. | |||
heartbeat_file = os.path.join(conf.get_lib_dir(), ext_handler_i.get_heartbeat_file()) | |||
if os.path.isfile(heartbeat_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to do this every time we report heartbeat, not only on when the error happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tests/ga/test_extension.py
Outdated
@@ -3715,6 +3715,61 @@ def test_uninstall_should_succeed_if_extension_allowed(self): | |||
vm_status = args[0] | |||
self.assertEqual(0, len(vm_status.vmAgent.extensionHandlers)) | |||
|
|||
def test_should_report_policy_failure_instead_of_heartbeat(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we do any end-to-end testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only added unit tests, since heartbeat isn't currently being e2e tested and none of the extensions we e2e test report heartbeat
If we want to add an e2e test for this, I could add AzureSecurityLinuxAgent to test heartbeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an e2e test with AzureSecurityLinuxAgent
""" | ||
Save contents of the specified file to the history folder. Output file will have the same name as the original file. | ||
""" | ||
text = fileutil.read_file(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file copy would be better to maintain the permissions of the original file
azurelinuxagent/ga/exthandlers.py
Outdated
policy_error = None | ||
try: | ||
gs_history = self.protocol.get_goal_state().history | ||
policy_file = conf.get_policy_file_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move the responsibility of deciding how to get the path of the file (from conf currently) to the ExtensionPolicyEngine and then expose a method to get the file the engine is using (or None if no file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the implementation - ExtensionPolicyEngine exposes a property "policy_file_contents" that can be copied to history here
azurelinuxagent/ga/exthandlers.py
Outdated
try: | ||
heartbeat = ext_handler_i.collect_heartbeat() | ||
if heartbeat is not None: | ||
handler_status.status = heartbeat.get('status') | ||
if not ext_disallowed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ext_disallowed:
pass # The status already describes why the extension is disallowed
else:
handler_status.status = heartbeat.get('status')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
self._policy = self._parse_policy(policy_contents) | ||
|
||
@property | ||
def policy(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no longer being used currently, but I kept it here because we'll need to expose the policy for the extension-specific policy feature. I can remove it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to remove this for now, I will add it back as needed in the future
azurelinuxagent/ga/exthandlers.py
Outdated
gs_history = self.protocol.get_goal_state().history | ||
policy_file = conf.get_policy_file_path() | ||
if gs_history is not None and os.path.isfile(policy_file): | ||
gs_history.save_file(policy_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not be saving the same policy we parse in ExtensionPolicyEngine init if we do this here.
Could we dump the contents of the policy file to the history folder in ExtensionPolicyEngine.__read_policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally copied policy to history in PolicyEngine.__read_policy, but Norberto suggested moving it here to remove the PolicyEngine dependency on GoalStateHistory. The read time difference is minimal (< 0.01 ms), but if you think it's still an issue, I can move it back to PolicyEngine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@narrieta @mgunnala what do you think of keeping the contents of the policy file when it's read as a property in ExtensionPolicyEngine. Then we can dump the contents to the history folder here in exthandlers.
If we keep it as is, it will be unlikely that we save a different policy to the history folder than what is being applied, but if it does happen it will be difficult to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing the raw policy file contents as a property in ExtensionPolicyEngine makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the implementation - ExtensionPolicyEngine stores raw policy file contents in "policy_file_contents" property immediately after reading, it is copied to history here.
azurelinuxagent/ga/exthandlers.py
Outdated
handler_status.message = parse_formatted_message(heartbeat.get('formattedMessage')) | ||
heartbeat_message = parse_formatted_message(heartbeat.get('formattedMessage')) | ||
if ext_disallowed: | ||
handler_status.message += " Extension was previously enabled and reported the following heartbeat:\n{0}".format(heartbeat_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handler_status.message is set to None in ExtHandlerStatus init. I don't think handler_status.message will be None since we write status on behalf of the Handler when it's disallowed, but we should probably add None check here before concatenating in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If handler_status.message is None here, what should it be set to?
report_ext_handler_status() doesn't have the exact cause of the ext disallowed error (policy failure, ext disabled, etc.) - we could set the message to something like "Extension will not be processed because it is disallowed. Extension was previously enabled ... " but that might be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can update the check to 'if ext_disallowed and handler_status.message is not None'
If ext is disallowed and the handler_status.message is None then we've hit an unexpected case and should just write the heartbeat to the status imo.
We can add a comment that handler_status.message should not be None if the ext is disallowed (because the agent writes handler status on behalf of the ext in that case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ext_disallowed and handler_status.message is not None:
We should not just pass the heartbeat, but rather add a generic message explaining the extension was not execute. Also, since this is an unexpected condition, we should probably send telemetry to alert us of the issue. If sending telemetry, be careful not send on every single heartbeat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(feel free to skip sending telemetry, as long was we have a unit test asserting not None when disallowed; it may happen that a faulty extension sets the status to None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have unit tests asserting that status is appropriately reported (not None) when extension is disallowed.
Is it sufficient to add a message like "Extension was not processed: failed to report status." ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when not None, we use
Status + Extension was previously enabled and reported the following heartbeat:\n{0}
how about
Extension was not executed, but it was previously enabled and reported the following heartbeat:\n{0}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Description
Issue #
This PR addresses the following TODOs for the policy/allowlist feature:
PR information
develop
branch.Quality of Code and Contribution Guidelines