Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] If unable to rotate, do not restart #61

Open
IK-Adrian opened this issue Jul 23, 2024 · 2 comments
Open

[Feature request] If unable to rotate, do not restart #61

IK-Adrian opened this issue Jul 23, 2024 · 2 comments
Assignees

Comments

@IK-Adrian
Copy link

If the rotation process fails, the rotator shouldn't restart Greengrass and see if miraculously everything worked. Instead, it should probably fail the job.

class StateCreatingCertificate(State):
""" Certificate rotator state: Creating Certificate """
def on_rx_message(self, topic: str, message: dict) -> None:
if topic == f'{TOPIC_BASE_CERT}/create/accepted':
print('Got new certificate. Backing up old certificate and private key, and switching to new.')
rotated = self._context.pki.rotate(message['certificatePem'])
if not rotated:
print('Error rotating the certificate and private key.')
print('Restarting to activate new certificate and private key.')
self._context.stop()

@gregbreen gregbreen self-assigned this Jul 24, 2024
@gregbreen
Copy link
Contributor

gregbreen commented Jul 24, 2024

It's not letting it restart to "see if miraculously everything worked". The rotate() method should never fail. It is essentially an insane failure, because the rotate() method should not suffer exceptions in practice. If it does fail, this on_rx_message() doesn't know how far it progressed before it failed. It's similar to rotate() being interrupted by an unexpected loss of power. You may have seen some comments in the code about losing power at "an inopportune moment". The component is designed to recover from a spontaneous reset or power loss during the rotate() method. I'm letting it restart here, to go through the same recovery process, rather than build special recovery handling for an exception that should never occur in practice.

Anyway, you're right that it would be better to fail the job right here, and take the recovery actions right here. Low priority, but I might do it.

@IK-Adrian
Copy link
Author

I understand that this is a almost-never-happens case, but taking into account that by this point we know something went wrong, restarting the device when not needed may just be a loss of service without any real reason for it. Maybe using a less broad aproach for the try ... except ... block can allow the rotate method itself to go back to a valid state before returning, and therefore we can just call fail the job here and go to StateIdle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants