-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nexus WF Machines & Lang Support #857
base: nexus
Are you sure you want to change the base?
Conversation
sdk-core-protos/protos/local/temporal/sdk/core/workflow_activation/workflow_activation.proto
Outdated
Show resolved
Hide resolved
// A request to cancel a nexus operation resolved. | ||
ResolveCancelNexusOperation resolve_cancel_nexus_operation = 17; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, I guess this doesn't mean a Nexus operation was cancelled, but rather that the cancel was sent? I do wonder what situation we need this as an awaitable thing that can come back. I would similarly not expect a ResolveCancelChildWorkflowExecution
since it effectively is always expected to succeed. But I do see the NexusOperationCancelRequestedEventAttributes
event. Hrmm.
The issue from a lang POV is that cancellation of things you start often does not have a path for "resolving" of the cancellation alone. Because, unlike external workflow cancellation, we often use cancellation token like things when starting things, we don't expose a fallible/awaitable cancellation. If we supported cancelling Nexus operations you didn't start, definitely would be worth the extra wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to be awaitable. Support will be added later for ensuring cancel was received, so we need to have them be awaitable. This is a lot like how child workflows get cancelled via the external workflow cancel command.
And yeah, cancelling ones you didn't start likely will be supported at some point too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the cancel token thing, yeah, in this case you're just going to end up internally seeing the resolution and maybe not showing it to the user. But, likely the type that represents a started nexus operation will also have an explicit awaitable cancel method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to be awaitable
I don't see how this can be done with cancellation via something you pass in to start often using language-native cancellation mechanisms that are not awaitable.
This is a lot like how child workflows get cancelled via the external workflow cancel command.
But there is no separate activation for awaiting a child cancel received right? Not sure we have ever had to wait for a cancel to be received by something we've started, be it activities, child workflows, etc.
cancelling ones you didn't start likely will be supported at some point too.
I think, just as lang would not use cancellation token approach but instead an explicit awaitable cancel call, it would deserve a separate command/activation pair.
in this case you're just going to end up internally seeing the resolution and maybe not showing it to the user.
I wonder why we even care about the resolution from lang side then. Not sure it has any lang value. I'd have to see a lang sample that shows where it is even able to be used I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this can be done with cancellation via something you pass in to start often using language-native cancellation mechanisms that are not awaitable.
They are conceptually awaitable. That doesn't mean the user literally has to wait on them. Like I said you can just track the sequence number and resolution internally for cancel-token-like cases.
it would deserve a separate command/activation pair.
That is exactly what's happening. You send the RequestCancelStartedNexusOperation
command, and then you wait for that to be resolved.
I wonder why we even care about the resolution from lang side then. Not sure it has any lang value. I'd have to see a lang sample that shows where it is even able to be used I think.
This is present in the integ tests - you can see how the Rust SDK is doing this.
@bergundy Suggested I make the cancel explicitly awaitable because of what I mentioned re:
Support will be added later for ensuring cancel was received
Personally, I don't hugely care. So I'll let him make the case for that if he feels it's important to deliver cancel resolution separately from operation resolution. I would be OK internalizing the after-started cancel, but probably the command will always need to exist (or will have to be reintroduced) when we support cancelling operations you didn't start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement WAIT_CANCELLATION_REQUESTED, you'll need to notify lang when the cancelation was requested.
That's a cancellation type not a separate cancel call. I agree if we need to expose a Nexus cancellation type that's fine, but still not sure how that translates to a separate activation job. Langs do not have a way to wait on the child workflow cancellation to be received separately from where they are waiting on result. If we get to external Nexus cancel operation, I agree that has to be a request/response, but for start-in-this-workflow items (children, activities, and Nexus operations), cancellation is a one-way operation. Languages don't even have a way to wait for cancellation separate from result.
Can either of y'all show what waiting on a Nexus operation cancel looks like in any of our SDK languages (not to be confused with what waiting on a result looks like)? And if 0 SDK languages can support this activation job anyways, why would a language ever even handle it? And if 0 SDK languages handle this resulting activation job, why have it effectively as a dead message in the proto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean, you don't wait for the cancelation itself in lang, we're expecting core to resolve the nexus operation as canceled with WAIT_CANCELLATION_REQUESTED
, it's not the result of the cancel command.
I definitely see external nexus operations being a thing and allowing waiting on its cancelation, that may be a separate thing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe we do want a "Nexus cancellation type"? How does one choose to wait (or not) for a Nexus cancellation in Go today? Go child and activity options both have WaitForCancellation
booleans, should we do the same for Nexus operation options? And for Core-based SDKs, should we have a cancellation type like child and activity have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to add a cancellation type for Nexus too. To fully support it, we need to add a history event to notify the SDK that a cancelation request has been delivered (or permanently failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll have operations you started have a non-awaitable cancel operation then I suppose. That's not quite what you mentioned @bergundy when I brought it up in the Nexus channel earlier:
Make it an awaitable, we are going to add support for waiting for cancellation to be delivered in the future.
It’s basically like child workflow cancellation
But, I can change it back. I might just keep the external cancel command around if we're going to bring that back later.
Note
Note that this isn't going into main, but a nexus branch
What was changed
Add ability for langs to perform nexus operations from workflows
Why?
Nexus is cool yo
Checklist
Closes
How was this tested:
Added integration tests
Any docs updates needed?