Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test environment hangs if activity is not defined, and scheduleToStartTimeout is ignored #2305

Closed
rocketraman opened this issue Oct 31, 2024 · 7 comments
Labels
enhancement User experience

Comments

@rocketraman
Copy link

Is your feature request related to a problem? Please describe.
When using the Temporal test environment, if an activity is run which is not registered, the test hangs, and apparently forever. I've set the scheduleToStartTimeout to a few seconds but apparently the test Temporal server ignores it.

Describe the solution you'd like
Tests should fail immediately when an activity is called that is not registered. Alternatively, or in addition, the scheduleToStartTimeout should be respected.

Describe alternatives you've considered
None

Additional context
Temporal SDK 1.26.1

@rocketraman rocketraman added the enhancement User experience label Oct 31, 2024
@rocketraman rocketraman changed the title Support scheduleToStartTimeout in test environment Test environment hangs if activity is not defined, and scheduleToStartTimeout is ignored Oct 31, 2024
@Quinn-With-Two-Ns
Copy link
Contributor

Can confirm Test server does support scheduleToStartTimeout timeout

public void scheduleToStartTimeout(boolean local) throws InterruptedException {

scheduleToStartTimeout is not the right timeout to set if you want the activity to fail in this scenario since even if an activity is not registered on the worker, the server can still attempt to start it as long as a worker is listening on the underlying task queue. You want to set a *ToCloseTimeout. The behaviour you are describing is the same behaviour as the real Temporal Server. We do have another feature request for using a different retry policies when under test #626.

@rocketraman
Copy link
Author

rocketraman commented Oct 31, 2024

scheduleToStartTimeout is not the right timeout to set if you want the activity to fail in this scenario since even if an activity is not registered on the worker, the server can still attempt to start it as long as a worker is listening on the underlying task queue.

Ah, really? Looking at this blog post, ScheduleToStart should be a smaller timeout than any of the ToClose timeouts.

The documentation says this:

Time that the Activity Task can stay in the Task Queue before it is picked up by a Worker.

So if the activity task remains in the task queue (despite workers listening), why wouldn't the ScheduleToStart timeout apply?

If what you are saying is true, how would I configure the case in which my workflow can take hours or days of real time (and in my test environment I would use time skipping), but if its not started very quickly I want to bail out?

@Quinn-With-Two-Ns
Copy link
Contributor

how would I configure the case in which my workflow can take hours or days of real time, but if its not started very quickly I want to bail out?

You would use ScheduleToStart , but as I said an activity type not being registered does NOT cause any issue scheduling an activity, it only causes an issue when executing the activity.

@rocketraman
Copy link
Author

rocketraman commented Oct 31, 2024

how would I configure the case in which my workflow can take hours or days of real time, but if its not started very quickly I want to bail out?

You would use ScheduleToStart , but as I said an activity type not being registered does NOT cause any issue scheduling an activity, it only causes an issue when executing the activity.

Ok. Isn't that weird and confusing? If an activity type is not registered and therefore an activity is scheduled, but not started, why would ScheduleToStart which subsumes both endpoints of this time, not apply? Is there a tracking issue or something I can follow for a timeout that covers the case of an activity not being started because no activity is registered?

@rocketraman
Copy link
Author

Closing this because the behavior is working as designed (though the design seems questionable). I've opened this community form post to explore this topic.

@Quinn-With-Two-Ns
Copy link
Contributor

Quinn-With-Two-Ns commented Oct 31, 2024

I think there is a misunderstanding here, to be clear, If an activity type is not registered, but was scheduled it will still be started if any worker is listenting on the task queue.

This is how the protocol between the Temporal worker and server works and is important to safely roll out new Workflow/Activity types. Workers poll for work by task queue , not by activity type, if a worker is listening on a task queue it can get any tasks assigned to that task queue. Once a worker picks up a task the server considers that task as started. If the worker does not know how to handle that activity type then the worker will fail that activity attempt, but from the servers perspective the task was started. This behaviour is important when rolling out new types of activities/workflows that old workers may not understand

@rocketraman
Copy link
Author

Thank you for the explanation about what "started" actually means. I think it would be helpful if the docs included this information because this technical definition of "started" would not match most people's mental model of an activity being "started".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement User experience
Projects
None yet
Development

No branches or pull requests

2 participants