Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way to estimate running components #12189

Conversation

todor-ivanov
Copy link
Contributor

@todor-ivanov todor-ivanov commented Nov 28, 2024

Fixes #12184

Status

ready

Description

With the current PR we address the issue reported by T0 team in the following comment: #12184 (comment)

Here we change the way we estimate the set of running components. Instead of just listing the log directories here we rely on the returned value from the actual manage command and the execute-agent call to the underlying daemon to run the agent.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

#12185

External dependencies / deployment changes

None

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: failed
    • 1 new failures
    • 2 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/110/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Contributor Author

Thanks @vkuznet

@todor-ivanov todor-ivanov merged commit f4f0dc8 into dmwm:master Dec 2, 2024
1 of 3 checks passed
@amaltaro
Copy link
Contributor

amaltaro commented Dec 2, 2024

With this change, we no longer look at components that have been terminated and no longer running.
Relying on the Running stdout string isn't reliable, as that is actually a bug that we have to eventually fix in the system: #12091

IMO, if the agent is supposed to be running, this script should be ensuring that any component down or stuck gets restarted - which was the previous behavior, unless I missed something subtle in these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WMAgent - cronjobs spam
4 participants