Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive retries in test workflow environment #626

Open
tsurdilo opened this issue Aug 11, 2021 · 6 comments
Open

Excessive retries in test workflow environment #626

tsurdilo opened this issue Aug 11, 2021 · 6 comments
Labels
enhancement User experience

Comments

@tsurdilo
Copy link
Contributor

When testing an activity that constantly fails (cannot recover from the failure) and has a large scheduleToClose timeout set (like days/months), or does not have one set at all
the testing framework will keep retrying in some cases thousands of times.

Asking to add a feature that in those edge cases to not retry, but skip time so the excessive retries are avoided in tests.

To reproduce you can have a workflow which calls a single activity with activity options setScheduleToCloseTimeout set to 5 days for example. Set the activity to fail with for example:
throw Activity.wrap(new Exception("oops"));

in test you will see that it performs over 4K retries before timing out.

@tsurdilo tsurdilo added the enhancement User experience label Aug 11, 2021
@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Aug 11, 2021

Linking #499 as a different, but very relevant request. We should discuss the behavior of retries with our testing framework / time skipping / debugging mode.

@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Aug 11, 2021

@tsurdilo what is the definition of "excessive" retries? And how the framework should define which is excessive in each specific scenario?

From my point of view, the described situation is not a problem and is actually expected behavior. A workflow activity should be retried the same amount of times in tests with time-skipping as it actually would be executed in 5 days before failing the workflow. And a lot of users will not be happy if it's not like that, it will be quite misleading and error-prone.
We can't just randomly decide what is extensive. For example, if activity retries every second in a normal environment and we have 5 days timeout and we have no limits for max retries - the total amount of retries in tests SHOULD be 5 * 24 * 60 * 60 = 432000

There is a way to achieve what you want though:
We basically want different ActivityOptions#RetryOptions in tests. So, we can modify RetryOptions and add maxRetries=10 or maxRetries=100 to RetryOptions if the execution happens inside unit-test (using some env var for example to define it, like proposed in the linked issue #499).
Are you proposing to provide this functionality in an easier-to-use form as a part of our testing framework?

@tsurdilo
Copy link
Contributor Author

tsurdilo commented Aug 11, 2021

i think in a test its different as you dont in most cases fix the error on activity after a number of retries. maybe if use knows they are testing activity failure specifically then the number of retries can even be 0 . or some number in config as activity options can be specified in workflow tested so they cannot be changed in test

@tsurdilo
Copy link
Contributor Author

@mfateev wdyt?

@Spikhalskiy
Copy link
Contributor

Spikhalskiy commented Aug 11, 2021

We shouldn't completely cut off activity retries in tests because a lot of users would want to test and verify the retries' count, behavior, or even potential side effects.
If we don't want to have retries in tests, we can specify ActivityOptions#RetryOptions#maxRetries=1 if we are executing in a unit test as I described before.
But I agree, that having such code OR some tricky options injection/override just to get this behavior in tests can be a little annoying for the application developers. So, your proposal is to provide this functionality (limiting the number of retries to 1 for all activities) as a part of our test framework to make it handier, correct?

@tsurdilo
Copy link
Contributor Author

Yes, I think that would be nice if possible:
Ability to overwrite whole/parts of the activity options in the test (when ActivityOptions are defined inside the tested workflow code).

I think this could help also with some scenarios where for example you have ActivityOptions that specify a different task queue and you have to create extra workers for it in your tests, so ability to default all task queues for tested workflow and all its activities to the one created automatically by TestWorkflowRule, but that might be asking too much, idk :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement User experience
Projects
None yet
Development

No branches or pull requests

3 participants