Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WithTimeout opt to retrier #40

Closed

Conversation

slizco
Copy link
Contributor

@slizco slizco commented Dec 18, 2023

Users of the retrier package may wish to configure a retrier with a global timeout that applies across all retries. This differs from the backoffs configuration because each individual backoff only applies between successive calls to work(). Without configuring a timeout to encapsulate the entire RunFn call, a single call to work() is time-wise unbounded.

A caller could set a timeout on the context passed into RunFn(), but this means that the last error received, ret is not bubbled up. This PR ensures that the last error encountered is bubbled up to the caller, even when the timeout is hit.

@eapache
Copy link
Owner

eapache commented Dec 18, 2023

Hi, thanks for the suggestion and code. I have a few thoughts / questions:

  • I missed this, but thankfully CI pointed out that errors.Join is new as of golang 1.20 this year. As mentioned in the project README, I don’t mind bumping the minimum version occasionally, but I’d prefer to keep supporting versions a bit older than that for a little while at least, unless the case is very compelling.
  • As you mentioned in your write-up, calling WithTimeout can effectively be accomplished by calling context.WithTimeout, except for the change to the error handling. Is there a strong reason for the WithTimeout half of the change, or would the error-handling change on its own address your use case? If it’s just a convenience, I’m tempted to keep the functionality of this package more narrowly focused.
  • What is the backwards-compatibility effect of the error-handling part of the change? It’s been a while since I’ve done a deep dive into go error-handling, but my first instinct is that this might break programs which were already checking the return value for context.DeadlineExceeded? Happy to be wrong here though.

@eapache
Copy link
Owner

eapache commented Dec 18, 2023

the last error received, ret is not bubbled up

I suppose also that someone could argue this is behaviour as-designed? That the package should only bubble up the “final” error, and that since this error was still being retried when the context expired, it does not count as final?

That said, I can also see why it’s inconvenient to lose the underlying error when the context expires. I suppose the same thing would happen if the work function ever checked the context deadline itself - it would just return DeadlineExpired and you’d lose the previous iteration’s “real” error, whatever that was.

@slizco
Copy link
Contributor Author

slizco commented Dec 19, 2023

@eapache Yes, the actual timeout option is mostly a convenience. I'm primarily interested in getting the last returned error to bubble up. You could argue that swallowing that error and simply reporting context.DeadlineExceeded is expected behavior. However, I think this is inconsistent with the retrier behavior when backoffs are exhausted. In that case, the last error encountered is the one returned to the caller. There's no material difference between a timeout being hit or a retry limit being hit--in these cases, I'd expect the error handling to match.

If we could add an option to allow supporting of bubbling up the actual last error when ctx.Err() != nil, that would certainly solve the issue for my use case. Changing the default behavior may break existing uses of the retrier if callers are expecting the context.DeadlineExceeded be returned--that is true.

@slizco
Copy link
Contributor Author

slizco commented Dec 19, 2023

Alternatively, if you are okay with the WithTimeout option, we could do something like this:

if err := r.sleep(ctx, timer); err != nil {
  // Timeout in effect, timeout hit & ret is non-nil.
  if r.timeout > 0 && err == context.DeadlineExceeded && ret != nil {
	  return ret
   }
   return err
}

Then the existing default error-handling behavior remains as is, but the error is bubbled up if you've configured a timeout within the retrier, not just on the context.

@eapache
Copy link
Owner

eapache commented Dec 19, 2023

If we could add an option to allow supporting of bubbling up the actual last error when ctx.Err() != nil, that would certainly solve the issue for my use case.

This is my preference, as it's fully backwards compatible and avoids proxying methods that can be called on other packages.

I'm not sure exactly what to call the option though... maybe SuppressContextErrors()? Happy to take suggestions here, but regardless, a good godoc on this method will be important.

@tukeJonny
Copy link
Contributor

tukeJonny commented Jan 16, 2024

I believe the overall timeout should be specified in context.WithTimeout.
I fear that setting a global timeout in Retrier would lead to complexity and increase confusion for users.

On the issue of errors being bubbled up, I agree with @eapache that it should be provided as an option.
While maintaining the interface is essential to ensure backward compatibility, I believe it is also important to prevent misuse by users by not increasing implicit behavior.
In this respect, option is an excellent approach in terms of backward compatibility and documentation (Functional Option Pattern).

However, while SuppressContextErrors() is certainly straightforward, it is more difficult to deal with when non-context factors come up in the future.
I thought it would be better in the long run if it were named to give priority to errors originating from the user's function rather than errors internal to Retrier or errors originating from the context.
In this case, for example, one might think of naming it something like PreferUserErrors.
In any case, I thought that naming and documentation should take the following points into account.

  • This is error handling option.
  • Error originating from the user's function are returned first.
  • However, other errors may be returned if there is no error originating from the user's function.

@eapache
Copy link
Owner

eapache commented Feb 21, 2024

Closing as stale, but I'm still happy to take a PR for the error-handling portion of this change.

@eapache eapache closed this Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants