-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spawner: require runtime operation suitability #5649
Spawner: require runtime operation suitability #5649
Conversation
9262f98
to
328a9ef
Compare
328a9ef
to
503639e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu, thank you for this. I don't see any blockers for merging this, but I just have some comments which I would like to discuss firs.
avocado/core/plugin_interfaces.py
Outdated
@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result): | |||
:type result: `avocado.core.teststatus.STATUSES` | |||
""" | |||
|
|||
@staticmethod | |||
@abc.abstractmethod | |||
def is_operational(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have this as staticmethod. IIUIC you don't use it anywhere it static context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually used like this (static) in the process spawner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I meant that you never call it as static method. This is used in avocado/plugins/runner_nrunner.py where you're initializing the spawner object. Therefore, IMO this method doesn't have to be a static method and it will fix the pylint W0221 warning on PodmanSpawner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is obligatory for all spawners is there any chance we can at least add something like
@staticmethod
def is_operational():
return True
to the LXC spawner within the same pull request, i.e. on the same ground as both the process and podman spawners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the static method choices, I also think simple instance-bound methods should be good enough unless we really need something more complicated but I am aware that the spawners in general have been started with mostly static method and this and converting to simpler methods might not be within the scope of this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change is_oeprational()
to be an instance method. The fact that the podman spawner needs that, is a good enough "hint" that it's the better choice indeed.
avocado/plugins/runner_nrunner.py
Outdated
spawner_name = test_suite.config.get("run.spawner") | ||
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj | ||
if not spawner.is_operational(): | ||
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite {test_suite.name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this message is sufficient, because it doesn't say anything about why the spawner is not operational. IMO, this should be more verbose and each spawner should try to describe why it is not operational.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take is that each spawner is free to log as much as it wants about the failure. The main goal of this is to stop having false positives and lead users into investigating the issue.
We can add some method of passing the failure reason from the spawner to the runner, and having the runner log it, but I feel it's unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't have a problem to leave this responsibility to spawners, but then I would add logs about failure to the PodmanSpawner. Because IMO Spawner "PodmanSpawner" is not operational
doesn't give you many hints about have happened and what you need to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if the spawner can detect it has a problem it should hopefully also be able to report what that problem is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the podman spawner, I've added reporting for three possible errors:
- The podman binary could not be found
- The execution of the podman binary signaled failure (return code xxx)"
- The podman binary did not report a suitable version (>= 3.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu, I also left some comments as I am now slightly more in sync with changes involving spawners.
avocado/core/plugin_interfaces.py
Outdated
@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result): | |||
:type result: `avocado.core.teststatus.STATUSES` | |||
""" | |||
|
|||
@staticmethod | |||
@abc.abstractmethod | |||
def is_operational(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is obligatory for all spawners is there any chance we can at least add something like
@staticmethod
def is_operational():
return True
to the LXC spawner within the same pull request, i.e. on the same ground as both the process and podman spawners?
avocado/core/plugin_interfaces.py
Outdated
@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result): | |||
:type result: `avocado.core.teststatus.STATUSES` | |||
""" | |||
|
|||
@staticmethod | |||
@abc.abstractmethod | |||
def is_operational(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the static method choices, I also think simple instance-bound methods should be good enough unless we really need something more complicated but I am aware that the spawners in general have been started with mostly static method and this and converting to simpler methods might not be within the scope of this pull request.
avocado/plugins/runner_nrunner.py
Outdated
spawner_name = test_suite.config.get("run.spawner") | ||
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj | ||
if not spawner.is_operational(): | ||
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite {test_suite.name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if the spawner can detect it has a problem it should hopefully also be able to report what that problem is.
503639e
to
3d1ddc3
Compare
5f3eab9
to
d6bc2c2
Compare
/packit copr-build |
Great step forward @clebergnu, I will make sure to take a look and update all PRs/issues waiting for further input on my side by the end of the week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu just a few tiny things since there is much to say and this PR is more or less good to go. My main point is about a tiny extra step regarding the LXC spawner.
avocado/plugins/runner_nrunner.py
Outdated
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj | ||
if not spawner.is_operational(): | ||
suite_name = f" {test_suite.name}" if test_suite.name else "" | ||
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite{suite_name}. Please check the logs for more information.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one would naturally look into the logs next which would allow us to not have to provide a long message like this comment can be fully ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu, thank you for your changes. Expect the @pevogam comments it LGTM.
Hi @clebergnu do you think you could prioritize this so that I could move forward with the remote spawner request and have enough time to complete everything there before the LTS release? |
When a job gets interrupted, users want to notice it. So far, it's been presented in the Human UI as a simple "INFO" message, that may "fade into the background" with the other messages. By bumping it a level to a warning, it should be easier for users to notice it. This is not made into a error, because the caused for the interruption may be an expected situation, such as a job timeout. Signed-off-by: Cleber Rosa <[email protected]>
fad914c
to
edce2a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, LGTM and thanks for adding the is_operational
methods to all spawners, the current choices are definitely reasonable for the current use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu, thank you for this updated. I have just one comment related to Fedora image in the selftest.
This adds a new mandatory method to spawners: is_operational(). The goal of this method is to signal to the execution of the test suite whether the spawner is fully set up and capable to operate. This will of course, depend on the spawner implementation and requirements. In the case of the podman spawner, often times jobs configured to use that spawner will succeed, when they actually need to signal a failure. This is what happens on a system without a working podman installation: # avocado run --spawner=podman -- /bin/true JOB ID : daf6869a348f14c52460adc6f18f89f35f8d6ecd JOB LOG : /root/avocado/job-results/job-2023-04-14T20.57-daf6869/job.log RESULTS : PASS 0 | ERROR 0 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 0.32 s [root@22d3a3b15197 ~]# echo $? 0 Clearly, test workflows depending on the tests errouneously succeed. With this change, an error is reported both on the logs/UI and on the exit code. Signed-off-by: Cleber Rosa <[email protected]>
edce2a0
to
5ebb619
Compare
FYI, just pushed a version with the requested change. |
This adds a new mandatory method to spawners: is_operational(). The goal of this method is to signal to the execution of the test suite whether the spawner is fully set up and capable to operate. This will of course, depend on the spawner implementation and requirements.
In the case of the podman spawner, often times jobs configured to use that spawner will succeed, when they actually need to signal a failure. This is what happens on a system without a working podman installation:
Clearly, test workflows depending on the tests errouneously succeed. With this change, an error is reported both on the logs/UI and on the exit code.