Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement error handling when updating execution state in Bidder #4684

Open
coderabbitai bot opened this issue Nov 1, 2024 · 6 comments
Open

Implement error handling when updating execution state in Bidder #4684

coderabbitai bot opened this issue Nov 1, 2024 · 6 comments
Assignees

Comments

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2024

There is a TODO comment in pkg/compute/bidder.go indicating that error handling needs to be implemented when updating the execution state:

// TODO: handle error by either gracefully skipping if the execution is no longer in the created state
//  or by failing the execution

It's important to handle these errors to ensure the system remains robust in case the execution state changes unexpectedly.

This issue was generated based on the discussion in PR #4683: #4683

Original comment: #4683 (comment)

Requested by @wdbaruni

Copy link

linear bot commented Nov 1, 2024

@virajbhartiya
Copy link
Contributor

virajbhartiya commented Feb 13, 2025

Can I work on this? @wdbaruni

@wdbaruni
Copy link
Member

Hi @virajbhartiya, thank you for contributing to the project.

There might be some additional complexities around this issue. Mainly understanding when this edge case might happen, what the right action should be, and avoid returning unnecessary errors that might cause a poison pill upstream.

Though I checked the event watcher that is calling the bidder, and its failure handling strategy is to skip after 3 retries. Meaning it wont block and won't cause poison pills if we become aggressive about returning errors.

_, err = watcherRegistry.Create(ctx, computeExecutionHandlerWatcherID,
watcher.WithHandler(watchers.NewExecutionUpsertHandler(bufferRunner, bidder)),
watcher.WithAutoStart(),
watcher.WithFilter(watcher.EventFilter{
ObjectTypes: []string{compute.EventObjectExecutionUpsert},
}),
watcher.WithRetryStrategy(watcher.RetryStrategySkip),
watcher.WithMaxRetries(3),
watcher.WithInitialEventIterator(watcher.LatestIterator()))

My recommendation with this is to just return the error. We should implement error handling in the ExecutionUpsertHandler to avoid unnecessary retries, log and publish certain metrics, but that is for another issue.

@coderabbitai open an issue for implementing error handling in ExecutionUpsertHandler

Copy link
Contributor Author

coderabbitai bot commented Feb 13, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@wdbaruni I've created a new issue for implementing error handling in ExecutionUpsertHandler. The issue includes the context, requirements, and relevant links to help track this enhancement.


🏁 Script executed:

#!/bin/bash
# Create a new issue for implementing error handling in ExecutionUpsertHandler
gh issue create \
  --title "Implement error handling in ExecutionUpsertHandler to avoid unnecessary retries" \
  --body "## Background

Currently, when the Bidder encounters errors while updating execution state, these errors are propagated to the ExecutionUpsertHandler. While the event watcher has a retry mechanism (3 retries before skipping), we need more sophisticated error handling to:

- Avoid unnecessary retries for unrecoverable errors
- Log appropriate error messages
- Publish relevant metrics for monitoring

## Links
- Related Issue: #4684
- Discussion: https://github.com/bacalhau-project/bacalhau/pull/4683#discussion_r1826300726
- Code Reference: https://github.com/bacalhau-project/bacalhau/blob/f9fdcd5039892668b0e1e37c0f1cf6dd0775ab20/pkg/compute/watchers/executor_watcher.go#L24

## Requirements
1. Implement error handling logic in ExecutionUpsertHandler
2. Add appropriate logging for different error scenarios
3. Add metrics to track error occurrences and types
4. Document the error handling strategy

Requested by @wdbaruni" \
  --assignee "wdbaruni"

Length of output: 1123


@wdbaruni I've created issue #4849 for implementing error handling in ExecutionUpsertHandler. Let me know if you need any adjustments to the issue.

(_/)<
(^.^)
(> <)


✏️ Learnings added
Learnt from: wdbaruni
URL: https://github.com/bacalhau-project/bacalhau/issues/4684

Timestamp: 2025-02-13T07:02:10.709Z
Learning: In the Bacalhau project, the event watcher that calls the bidder has a failure handling strategy to skip after 3 retries, preventing upstream poison pills even with aggressive error returns.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@virajbhartiya
Copy link
Contributor

virajbhartiya commented Feb 13, 2025

Hey @wdbaruni , so I will first implement error handling in ExecutionUpsertHandler in #4848 and after that work on this issue. Is that okay?

@wdbaruni
Copy link
Member

That sounds great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants