Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawner: require runtime operation suitability #5649

Merged
merged 2 commits into from
Jan 10, 2024

Conversation

clebergnu
Copy link
Contributor

This adds a new mandatory method to spawners: is_operational(). The goal of this method is to signal to the execution of the test suite whether the spawner is fully set up and capable to operate. This will of course, depend on the spawner implementation and requirements.

In the case of the podman spawner, often times jobs configured to use that spawner will succeed, when they actually need to signal a failure. This is what happens on a system without a working podman installation:

   # avocado run --spawner=podman -- /bin/true
   JOB ID     : daf6869a348f14c52460adc6f18f89f35f8d6ecd
   JOB LOG    : /root/avocado/job-results/job-2023-04-14T20.57-daf6869/job.log
   RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 0 | CANCEL 0
   JOB TIME   : 0.32 s
   [root@22d3a3b15197 ~]# echo $?
   0

Clearly, test workflows depending on the tests errouneously succeed. With this change, an error is reported both on the logs/UI and on the exit code.

@clebergnu clebergnu marked this pull request as ready for review April 15, 2023 01:57
@clebergnu clebergnu self-assigned this Apr 15, 2023
@clebergnu clebergnu added this to the #102 (102 Dalmatians) milestone Apr 15, 2023
@richtja richtja self-requested a review April 19, 2023 13:39
Copy link
Contributor

@richtja richtja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clebergnu, thank you for this. I don't see any blockers for merging this, but I just have some comments which I would like to discuss firs.

@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result):
:type result: `avocado.core.teststatus.STATUSES`
"""

@staticmethod
@abc.abstractmethod
def is_operational():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have this as staticmethod. IIUIC you don't use it anywhere it static context.

Copy link
Contributor Author

@clebergnu clebergnu Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually used like this (static) in the process spawner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I meant that you never call it as static method. This is used in avocado/plugins/runner_nrunner.py where you're initializing the spawner object. Therefore, IMO this method doesn't have to be a static method and it will fix the pylint W0221 warning on PodmanSpawner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method is obligatory for all spawners is there any chance we can at least add something like

    @staticmethod
    def is_operational():
        return True

to the LXC spawner within the same pull request, i.e. on the same ground as both the process and podman spawners?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the static method choices, I also think simple instance-bound methods should be good enough unless we really need something more complicated but I am aware that the spawners in general have been started with mostly static method and this and converting to simpler methods might not be within the scope of this pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change is_oeprational() to be an instance method. The fact that the podman spawner needs that, is a good enough "hint" that it's the better choice indeed.

spawner_name = test_suite.config.get("run.spawner")
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj
if not spawner.is_operational():
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite {test_suite.name}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this message is sufficient, because it doesn't say anything about why the spawner is not operational. IMO, this should be more verbose and each spawner should try to describe why it is not operational.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take is that each spawner is free to log as much as it wants about the failure. The main goal of this is to stop having false positives and lead users into investigating the issue.

We can add some method of passing the failure reason from the spawner to the runner, and having the runner log it, but I feel it's unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't have a problem to leave this responsibility to spawners, but then I would add logs about failure to the PodmanSpawner. Because IMO Spawner "PodmanSpawner" is not operational doesn't give you many hints about have happened and what you need to fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if the spawner can detect it has a problem it should hopefully also be able to report what that problem is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the podman spawner, I've added reporting for three possible errors:

  1. The podman binary could not be found
  2. The execution of the podman binary signaled failure (return code xxx)"
  3. The podman binary did not report a suitable version (>= 3.0)

Copy link
Contributor

@pevogam pevogam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clebergnu, I also left some comments as I am now slightly more in sync with changes involving spawners.

@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result):
:type result: `avocado.core.teststatus.STATUSES`
"""

@staticmethod
@abc.abstractmethod
def is_operational():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method is obligatory for all spawners is there any chance we can at least add something like

    @staticmethod
    def is_operational():
        return True

to the LXC spawner within the same pull request, i.e. on the same ground as both the process and podman spawners?

@@ -424,6 +424,17 @@ async def update_requirement_cache(runtime_task, result):
:type result: `avocado.core.teststatus.STATUSES`
"""

@staticmethod
@abc.abstractmethod
def is_operational():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the static method choices, I also think simple instance-bound methods should be good enough unless we really need something more complicated but I am aware that the spawners in general have been started with mostly static method and this and converting to simpler methods might not be within the scope of this pull request.

spawner_name = test_suite.config.get("run.spawner")
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj
if not spawner.is_operational():
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite {test_suite.name}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if the spawner can detect it has a problem it should hopefully also be able to report what that problem is.

@clebergnu
Copy link
Contributor Author

@richtja @pevogam I believe I've addressed all issues you guys raised. Let me know if I'm missing something.

And thanks for the review!

@clebergnu clebergnu force-pushed the spawner_is_operational branch 3 times, most recently from 5f3eab9 to d6bc2c2 Compare July 18, 2023 17:01
@clebergnu
Copy link
Contributor Author

/packit copr-build

@pevogam
Copy link
Contributor

pevogam commented Jul 20, 2023

Great step forward @clebergnu, I will make sure to take a look and update all PRs/issues waiting for further input on my side by the end of the week.

Copy link
Contributor

@pevogam pevogam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clebergnu just a few tiny things since there is much to say and this PR is more or less good to go. My main point is about a tiny extra step regarding the LXC spawner.

avocado/core/plugin_interfaces.py Show resolved Hide resolved
avocado/core/plugin_interfaces.py Show resolved Hide resolved
spawner = SpawnerDispatcher(test_suite.config, job)[spawner_name].obj
if not spawner.is_operational():
suite_name = f" {test_suite.name}" if test_suite.name else ""
msg = f'Spawner "{spawner_name}" is not operational, aborting execution of suite{suite_name}. Please check the logs for more information.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one would naturally look into the logs next which would allow us to not have to provide a long message like this comment can be fully ignored.

avocado/plugins/spawners/podman.py Outdated Show resolved Hide resolved
avocado/plugins/spawners/lxc.py Show resolved Hide resolved
Copy link
Contributor

@richtja richtja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clebergnu, thank you for your changes. Expect the @pevogam comments it LGTM.

@pevogam
Copy link
Contributor

pevogam commented Aug 28, 2023

Hi @clebergnu do you think you could prioritize this so that I could move forward with the remote spawner request and have enough time to complete everything there before the LTS release?

When a job gets interrupted, users want to notice it.  So far, it's
been presented in the Human UI as a simple "INFO" message, that may
"fade into the background" with the other messages.

By bumping it a level to a warning, it should be easier for users to
notice it.

This is not made into a error, because the caused for the interruption
may be an expected situation, such as a job timeout.

Signed-off-by: Cleber Rosa <[email protected]>
@clebergnu clebergnu force-pushed the spawner_is_operational branch 2 times, most recently from fad914c to edce2a0 Compare January 8, 2024 21:02
@clebergnu
Copy link
Contributor Author

@richtja @pevogam I've rebased this PR, so a second round of review is probably a good thing.

Copy link
Contributor

@pevogam pevogam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, LGTM and thanks for adding the is_operational methods to all spawners, the current choices are definitely reasonable for the current use.

Copy link
Contributor

@richtja richtja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @clebergnu, thank you for this updated. I have just one comment related to Fedora image in the selftest.

selftests/functional/plugin/spawners/podman.py Outdated Show resolved Hide resolved
This adds a new mandatory method to spawners: is_operational().  The
goal of this method is to signal to the execution of the test suite
whether the spawner is fully set up and capable to operate.  This will
of course, depend on the spawner implementation and requirements.

In the case of the podman spawner, often times jobs configured to use
that spawner will succeed, when they actually need to signal a
failure.  This is what happens on a system without a working podman
installation:

   # avocado run --spawner=podman -- /bin/true
   JOB ID     : daf6869a348f14c52460adc6f18f89f35f8d6ecd
   JOB LOG    : /root/avocado/job-results/job-2023-04-14T20.57-daf6869/job.log
   RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 0 | CANCEL 0
   JOB TIME   : 0.32 s
   [root@22d3a3b15197 ~]# echo $?
   0

Clearly, test workflows depending on the tests errouneously succeed.
With this change, an error is reported both on the logs/UI and on the
exit code.

Signed-off-by: Cleber Rosa <[email protected]>
@clebergnu
Copy link
Contributor Author

Hi @clebergnu, thank you for this updated. I have just one comment related to Fedora image in the selftest.

FYI, just pushed a version with the requested change.

@richtja richtja merged commit aa82dac into avocado-framework:master Jan 10, 2024
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants