-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-status-consumer: improve logging of "not alive" workflows #443
Conversation
b1a73dc
to
c33bcc8
Compare
@@ -240,46 +232,6 @@ def _update_job_progress(workflow_uuid, msg): | |||
job.status = job_status | |||
|
|||
|
|||
def _update_job_cache(msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, whilst it is true that caching has been disabled since a long time, we may want to resurrect and retest it one day perhaps... Hence let me muse a bit on this.
Does this code actually runs on the server-side? We have cache=off by default on the client side, so I think this code should not even be triggered on the server-side. If it is not, then we don't seem to have a problem here? If it is, then we should fix it so that it would not be? (or inactivate it via a server-side configuration settings without removal of the code).
That said, philosophically speaking, I like suckless approach arguing that "the more code lines you have removed, the more progress you have made" 😉 Removing old inactive code is a good thing, especially if we investigate something like Armada or Kueue soon... Then it might be a perfect time for cleaning job cache everywhere. Until then, perhaps "inactivation" may still be having some advantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, there is a lack of documentation for cache internals so I have no idea how it works or where it is configured exactly. Need to dig into the code.
Doesn't look like we have a cache config in the helm
chart for this. Also, user docs only have a few things mentioned like the "CACHE=off" parameter for r-client. There are some Python tests for cache.
Looks like only the serial
engine supports cache. Other engines do not check if a job is cached from the job-controller
.
I heard that cache was not working so, in my opinion, we should remove it. Otherwise, this is just a dead code. But, I agree, maybe, I should create a separate issue to remove the cache and we can address it later (espcecially if we plan work on improving consumers and internals)
9282a46
to
3ca6537
Compare
3ca6537
to
ae32eb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging works nicely; the core issue to be addressed later.
addresses #437
How to test:
reana-client run -w test
pending
state) and delete itreana-client delete -w test
reana-client list
, it will show you thattest
workflow is deletedkubectl get pods
, you will find batch pod inNotReady
statekubectl logs deployment/reana-workflow-controller job-status-consumer
, you will see that workflow was not in an alive state in DB and logs indicate that.This doesn't solve a core issue but, at least, we can have better logs.