Log failure to close and remove build directory #157

moroten · 2024-12-18T19:51:43Z

There is no real way to forward directory deletion errors when an execution has finished and a response has been sent. Therefore, print it as a log.

There is no real way to forward directory deletion errors when an execution has failed as it would cover the main error. Therefore, print it as a log.

EdSchouten

Why can’t we call attachErrorToExecuteResponse() from within that defer func?

EdSchouten

Or wait. It’s CheckReadiness(). But even there we can set an error response, right?

moroten · 2024-12-18T20:38:17Z

Why can’t we call attachErrorToExecuteResponse() from within that defer func?

attachErrorToExecuteResponse() hides errors if there are already other errors. Yes, let's use it in CheckReadiness(), but maybe we should refactor it to either print extra errors or in some way actually attach all errors.

moroten · 2024-12-19T09:20:11Z

This thread now turns into a more generic error logging discussion.

I think a lot of the errors in bb-worker are of interest for the cluster maintainer. For example "Failed to create input root directory" is more likely to be a problem with the cluster and not the action. It is still interesting for the user to know why the action failed, but to see a pattern of errors inside a worker is also very useful.

Would it harm to log all the action errors as worker output? Maybe I can add a configuration for logging_build_executor.go to print all the response errors and at the same time add options for enabling the Action: and ExecuteResponse: prints.

stagnation · 2024-12-19T10:38:10Z

To chime in and make this quirk of error reporting in bazel explicit: We also saw issues with Windows Workers where bazel just retries immediately and does not show the original error to the user, and then gets the new follow up error for the retries and after exhausting its retry limit prints the follow-up error to the user, which is less actionable. Windows file deletion does not work well when anti-virus programs keep locking files.

(With a high-concurrency runner. This has not been observed with concurrency=1.)

0: Failed to remove Build Directory
1: Failed to acquire Build Directory
2: Failed to acquire Build Directory
...

Log failure to close and remove build directory

652e52e

There is no real way to forward directory deletion errors when an execution has failed as it would cover the main error. Therefore, print it as a log.

moroten force-pushed the log-build-directory-close-error branch from 702e310 to 652e52e Compare December 18, 2024 19:59

EdSchouten reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log failure to close and remove build directory #157

Log failure to close and remove build directory #157

moroten commented Dec 18, 2024

EdSchouten left a comment

EdSchouten left a comment

moroten commented Dec 18, 2024

moroten commented Dec 19, 2024

stagnation commented Dec 19, 2024 •

edited

Loading

Log failure to close and remove build directory #157

Are you sure you want to change the base?

Log failure to close and remove build directory #157

Conversation

moroten commented Dec 18, 2024

EdSchouten left a comment

Choose a reason for hiding this comment

EdSchouten left a comment

Choose a reason for hiding this comment

moroten commented Dec 18, 2024

moroten commented Dec 19, 2024

stagnation commented Dec 19, 2024 • edited Loading

stagnation commented Dec 19, 2024 •

edited

Loading