Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send email on CalledProcessError #171

Closed
wants to merge 2 commits into from

Conversation

jwokaty
Copy link
Collaborator

@jwokaty jwokaty commented Jun 27, 2022

No description provided.

@jwokaty jwokaty marked this pull request as draft June 27, 2022 15:10
@jwokaty
Copy link
Collaborator Author

jwokaty commented Jun 27, 2022

This doesn't seem to be targeting the right area.

@hpages
Copy link
Contributor

hpages commented Jun 30, 2022

Hi Jen, can you explain what you are trying to achieve with this? Thanks

@jwokaty
Copy link
Collaborator Author

jwokaty commented Jul 1, 2022

I'd like the build machines to send an email when certain things go wrong because you don't know unless you look at the log what exactly is wrong; however, my example code isn't targeting the right area. In this attempt, I wanted to target the problem that during the postrun an expected build machine's products aren't available but the BBS is expecting them to exist.

@hpages
Copy link
Contributor

hpages commented Aug 23, 2022

I wanted to target the problem that during the postrun an expected build machine's products aren't available but the BBS is expecting them to exist.

I don't think bbs.jobs.call() is used during postrun.sh but you might want to double-check that.

An easier way to go would be to look at the log the next time this situation happens. It will tell you exactly where the code failed and display the callstack at the point of failure. I suspect this will be somewhere during execution of the BBS-make-OUTGOING.py script. With this information, it's going to be much easier to decide how to implement the email notification.

P.S.: While looking into this I noticed that BBS-make-OUTGOING.py was importing bbs.jobs even though it doesn't use it. I just corrected that (commit ec33085).

@hpages
Copy link
Contributor

hpages commented Aug 24, 2022

FWIW it seems that today's builds actually ran into such situation. The log file on nebbiolo2 (nebbiolo2-20220823-postrun.log) contains a lot of interesting information about what went wrong. Without going into too many details, it looks like the error didn't happen in BBS-make-OUTGOING.py as I thought it would, but later during BBS-make-PROPAGATION_STATUS_DB.py. The cause of the failure is that Windows builder palomino4 had not sent any binary packages back to nebbiolo2 (i.e. ~biocbuild/public_html/BBS/3.16/bioc/products-in/palomino4/buildbin was empty on nebbiolo2) at the time postrun.sh started on nebbiolo2.

At least 90% of the time postrun.sh fails because of a situation like this one i.e. a node that is very late, or got stuck somewhere, or died for some reason.

One possible approach would be to have postrun.sh check that all the expected end-of-build tickets are here and to send the notification email if they are not. This could be done by BBS-make-BUILD_STATUS_DB.py which is the first thing that is run by postrun.sh. The script doesn't necessarily have to stop, just send the notification, because most of the times postrun.sh will be able to complete even if a node has not finished, in which case we get an incomplete report (i.e. lots of NAs in the BUILD BIN column) but that's still better than no report at all.

This would cover you for 90% of the build failures.

@jwokaty
Copy link
Collaborator Author

jwokaty commented Jun 23, 2023

I'm closing since #308 is doing what I couldn't articulate very well at the time I started this PR.

@jwokaty jwokaty closed this Jun 23, 2023
@jwokaty jwokaty deleted the email-on-error branch June 23, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants