Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log failure to close and remove build directory #157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

moroten
Copy link
Contributor

@moroten moroten commented Dec 18, 2024

There is no real way to forward directory deletion errors when an execution has finished and a response has been sent. Therefore, print it as a log.

There is no real way to forward directory deletion errors when an
execution has failed as it would cover the main error. Therefore, print
it as a log.
@moroten moroten force-pushed the log-build-directory-close-error branch from 702e310 to 652e52e Compare December 18, 2024 19:59
Copy link
Member

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can’t we call attachErrorToExecuteResponse() from within that defer func?

Copy link
Member

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or wait. It’s CheckReadiness(). But even there we can set an error response, right?

@moroten
Copy link
Contributor Author

moroten commented Dec 18, 2024

Why can’t we call attachErrorToExecuteResponse() from within that defer func?

attachErrorToExecuteResponse() hides errors if there are already other errors. Yes, let's use it in CheckReadiness(), but maybe we should refactor it to either print extra errors or in some way actually attach all errors.

@moroten
Copy link
Contributor Author

moroten commented Dec 19, 2024

This thread now turns into a more generic error logging discussion.

I think a lot of the errors in bb-worker are of interest for the cluster maintainer. For example "Failed to create input root directory" is more likely to be a problem with the cluster and not the action. It is still interesting for the user to know why the action failed, but to see a pattern of errors inside a worker is also very useful.

Would it harm to log all the action errors as worker output? Maybe I can add a configuration for logging_build_executor.go to print all the response errors and at the same time add options for enabling the Action: and ExecuteResponse: prints.

@stagnation
Copy link

stagnation commented Dec 19, 2024

To chime in and make this quirk of error reporting in bazel explicit: We also saw issues with Windows Workers where bazel just retries immediately and does not show the original error to the user, and then gets the new follow up error for the retries and after exhausting its retry limit prints the follow-up error to the user, which is less actionable. Windows file deletion does not work well when anti-virus programs keep locking files.

(With a high-concurrency runner. This has not been observed with concurrency=1.)

0: Failed to remove Build Directory
1: Failed to acquire Build Directory
2: Failed to acquire Build Directory
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants