Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus WF Machines & Lang Support #857

Open
wants to merge 9 commits into
base: nexus
Choose a base branch
from
Open

Conversation

Sushisource
Copy link
Member

@Sushisource Sushisource commented Dec 19, 2024

Note

Note that this isn't going into main, but a nexus branch

What was changed

Add ability for langs to perform nexus operations from workflows

Why?

Nexus is cool yo

Checklist

  1. Closes

  2. How was this tested:
    Added integration tests

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner December 19, 2024 19:46
Comment on lines 128 to 129
// A request to cancel a nexus operation resolved.
ResolveCancelNexusOperation resolve_cancel_nexus_operation = 17;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, I guess this doesn't mean a Nexus operation was cancelled, but rather that the cancel was sent? I do wonder what situation we need this as an awaitable thing that can come back. I would similarly not expect a ResolveCancelChildWorkflowExecution since it effectively is always expected to succeed. But I do see the NexusOperationCancelRequestedEventAttributes event. Hrmm.

The issue from a lang POV is that cancellation of things you start often does not have a path for "resolving" of the cancellation alone. Because, unlike external workflow cancellation, we often use cancellation token like things when starting things, we don't expose a fallible/awaitable cancellation. If we supported cancelling Nexus operations you didn't start, definitely would be worth the extra wait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be awaitable. Support will be added later for ensuring cancel was received, so we need to have them be awaitable. This is a lot like how child workflows get cancelled via the external workflow cancel command.

And yeah, cancelling ones you didn't start likely will be supported at some point too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the cancel token thing, yeah, in this case you're just going to end up internally seeing the resolution and maybe not showing it to the user. But, likely the type that represents a started nexus operation will also have an explicit awaitable cancel method.

Copy link
Member

@cretz cretz Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be awaitable

I don't see how this can be done with cancellation via something you pass in to start often using language-native cancellation mechanisms that are not awaitable.

This is a lot like how child workflows get cancelled via the external workflow cancel command.

But there is no separate activation for awaiting a child cancel received right? Not sure we have ever had to wait for a cancel to be received by something we've started, be it activities, child workflows, etc.

cancelling ones you didn't start likely will be supported at some point too.

I think, just as lang would not use cancellation token approach but instead an explicit awaitable cancel call, it would deserve a separate command/activation pair.

in this case you're just going to end up internally seeing the resolution and maybe not showing it to the user.

I wonder why we even care about the resolution from lang side then. Not sure it has any lang value. I'd have to see a lang sample that shows where it is even able to be used I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this can be done with cancellation via something you pass in to start often using language-native cancellation mechanisms that are not awaitable.

They are conceptually awaitable. That doesn't mean the user literally has to wait on them. Like I said you can just track the sequence number and resolution internally for cancel-token-like cases.

it would deserve a separate command/activation pair.

That is exactly what's happening. You send the RequestCancelStartedNexusOperation command, and then you wait for that to be resolved.

I wonder why we even care about the resolution from lang side then. Not sure it has any lang value. I'd have to see a lang sample that shows where it is even able to be used I think.

This is present in the integ tests - you can see how the Rust SDK is doing this.

@bergundy Suggested I make the cancel explicitly awaitable because of what I mentioned re:

Support will be added later for ensuring cancel was received

Personally, I don't hugely care. So I'll let him make the case for that if he feels it's important to deliver cancel resolution separately from operation resolution. I would be OK internalizing the after-started cancel, but probably the command will always need to exist (or will have to be reintroduced) when we support cancelling operations you didn't start.

Copy link
Member

@cretz cretz Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To implement WAIT_CANCELLATION_REQUESTED, you'll need to notify lang when the cancelation was requested.

That's a cancellation type not a separate cancel call. I agree if we need to expose a Nexus cancellation type that's fine, but still not sure how that translates to a separate activation job. Langs do not have a way to wait on the child workflow cancellation to be received separately from where they are waiting on result. If we get to external Nexus cancel operation, I agree that has to be a request/response, but for start-in-this-workflow items (children, activities, and Nexus operations), cancellation is a one-way operation. Languages don't even have a way to wait for cancellation separate from result.

Can either of y'all show what waiting on a Nexus operation cancel looks like in any of our SDK languages (not to be confused with what waiting on a result looks like)? And if 0 SDK languages can support this activation job anyways, why would a language ever even handle it? And if 0 SDK languages handle this resulting activation job, why have it effectively as a dead message in the proto?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean, you don't wait for the cancelation itself in lang, we're expecting core to resolve the nexus operation as canceled with WAIT_CANCELLATION_REQUESTED, it's not the result of the cancel command.
I definitely see external nexus operations being a thing and allowing waiting on its cancelation, that may be a separate thing though.

Copy link
Member

@cretz cretz Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we do want a "Nexus cancellation type"? How does one choose to wait (or not) for a Nexus cancellation in Go today? Go child and activity options both have WaitForCancellation booleans, should we do the same for Nexus operation options? And for Core-based SDKs, should we have a cancellation type like child and activity have?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to add a cancellation type for Nexus too. To fully support it, we need to add a history event to notify the SDK that a cancelation request has been delivered (or permanently failed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll have operations you started have a non-awaitable cancel operation then I suppose. That's not quite what you mentioned @bergundy when I brought it up in the Nexus channel earlier:

Make it an awaitable, we are going to add support for waiting for cancellation to be delivered in the future.
It’s basically like child workflow cancellation

But, I can change it back. I might just keep the external cancel command around if we're going to bring that back later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants