Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed search attributes #366

Merged
merged 12 commits into from
Nov 2, 2023
Merged

Conversation

cretz
Copy link
Member

@cretz cretz commented Aug 10, 2023

What was changed

Implementation of typed search attributes. Specifically:

  • Marked all existing user-facing search attribute usage as deprecated (passing them issues runtime deprecation warnings)
  • Added TypedSearchAttributes, SearchAttributeKey, SearchAttributePair, and SearchAttributeUpdate abstractions to temporalio.common module
  • Update client side to accept either typed form or old form, and updated description/schedule objects to have a separate typed_search_attributes property
  • Added typed_search_attributes property to workflow info
  • Updated upsert_search_attributes to accept either typed form or old form and have it update both info entries when set from the other (with some caveats setting typed from untyped)
  • Tests for mutating in both ways
  • 💥 BREAKING CHANGE: Replaced the rarely used ScheduleActionStartWorkflow.search_attributes with ScheduleActionStartWorkflow.typed_search_attributes since that object is used in both directions and we can't determine user intent. User using the old field name in either direction will see an exception immediately on client side (i.e. no behavior changes or accidental misuse). We also added an untyped_search_attributes field to this class to let untyped ones stay present on update. Release notes will be very clear here.

@cretz cretz marked this pull request as ready for review August 10, 2023 21:40
@cretz cretz requested a review from a team as a code owner August 10, 2023 21:40
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also test the schedules client to get the missing coverage.

temporalio/common.py Show resolved Hide resolved
"""Get a single search attribute value by key or fail with
``KeyError``.
"""
ret = next((v for k, v in self if k == key), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not have made more sense to make the search attribute key indexable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is key indexable from the user POV. But the underlying collection is best as a collection of type-safe pairs IMO instead of a dict which can't really be genericized on a per entry basis.

Comment on lines 1310 to 1311
if len(val) != 1:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we skipping this?
Is this behavior consistent with Java (too lazy to check myself).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. I was told this should never happen with newer ones and we want to be strict here. I will set a reminder to confirm w/ Java behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in Java. For the strict interface if we get a list for non-keyword-list and it isn't a single value, server contract is wrong and we ignore.

parser = _get_iso_datetime_parser()
# We will let this throw
val = parser(val)
# If the value isn't the right type, we need to ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do we ignore in Java too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this allows new types to come on board for older SDKs

start_signal: Optional[str] = None,
start_signal_args: Sequence[Any] = [],
rpc_metadata: Mapping[str, str] = {},
rpc_timeout: Optional[timedelta] = None,
stack_level: int = 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this left over?

Copy link
Member Author

@cretz cretz Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's intentional as a way to set warning stack depth to report the bad line at. Arguably I shouldn't expose to users, but if they are a wrapper library/utility, they may want their warnings to be on outer stack too.

@cretz
Copy link
Member Author

cretz commented Aug 21, 2023

Schedule Workflow Action Typed Search Attribute Conundrum

This has mostly gone smoothly with the exception of one place: schedule workflow action search attributes. There are two outstanding issues that have to be solved here:

Mutually Exclusive Fields

Today, we have the following property for schedule action start workflow:

@dataclass
class ScheduleActionStartWorkflow(ScheduleAction):

    # ...

    search_attributes: temporalio.common.SearchAttributes

Where temporalio.common.SearchAttributes is basically Mapping[str, List]. This class is used for reading/describing schedules as well as creating/updating schedules. The class is immutable (all updates are done via copy-on-write type mutations). Usually we would add the following mutually exclusive field:

    typed_search_attributes: temporalio.common.SearchAttributes

And then tell users to use one or the other. The way a schedule update occurs is we first describe the schedule, then pass it to their update function, then they can use my_new_action = dataclasses.replace(my_action, search_attributes={"foo": ["bar"]}) to create a new one. But how do we know which field they are using on update when converting to proto?

  • We can't always assume one field is set or the other because we have backwards compatibility to retain
  • We can't convert one to the other (see next section), but even if we could, how do we know which is the true source the user is intending to set on update?

Options:

  1. Make search_attributes be a Union of typed and untyped
    • So is it the untyped form or the typed form set on describe?
  2. Make typed_search_attributes optional
    • When would we ever set it on describe then?
  3. Store the original proto/search-attribute values on the object somewhere
    • There is no rule saying they have to use the original object to update their schedule, it may be a new object
  4. Use the absence of one of the fields to determine whether to update
    • Then how do you unset all search attributes?
  5. Use getters and setters to determine what was provided by the user
    • Users use __init__ only to set fields on this object
  6. Use unset/None on __init__ to know which one the user wants to set
    • What if they use dataclasses.replace(...)? That uses __init__ and will set both fields
  7. Just don't support typed search attributes on schedules at this time
  8. Make a backwards incompatible break and remove search_attributes from schedules

Any other options?

Type Metadata Missing

Usually, all uses of search attributes on the server are validated on the way in and are guaranteed to have type metadata set on the way out. The only exception is schedule workflow action search attributes which, since it's basically a template, just store as you give. A bug has been opened to address this at temporalio/temporal#4787, but SDKs must work with all servers, old and new.

So untyped search attributes for workflow schedule actions are missing this type metadata which means typed search attributes will never be accurate. So what do we do? Just leave typed search attributes unset? Remember, this can be the result of a describe from other languages who may set that metadata.

So what do we do when we are given an object to update that has old-style search attributes? What do we do when we are given a proto with only new-style typed attributes but they update it with old-style untyped attributes?

EDIT: Conclusion

Option 8 was chosen - make backwards incompatible change to replace field.

@cretz
Copy link
Member Author

cretz commented Aug 24, 2023

(moving back to draft until this is figured out)

@cretz cretz marked this pull request as draft August 24, 2023 15:18
cretz added 3 commits October 26, 2023 15:40
…attributes

# Conflicts:
#	temporalio/common.py
#	temporalio/worker/_workflow_instance.py
#	tests/helpers/__init__.py
#	tests/worker/test_workflow.py
…attributes

# Conflicts:
#	temporalio/client.py
#	temporalio/workflow.py
Comment on lines 2966 to 3039
search_attributes: temporalio.common.SearchAttributes
typed_search_attributes: Optional[temporalio.common.TypedSearchAttributes]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💥 This is the backwards incompatible change

We cannot have both of these here and use this same object for input and output because there are is no good way to determine user intent. Users using the old field in either direction will get an error. This was deemed an acceptable risk and we will make it clear in release notes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you handling the case where a user describes a schedule in python created by the CLI/Go SDK that does not have type data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion, what I am going to do is put untyped_search_attributes on this class, which is meant to be read-only on create, but technically mutable on update (but typed will override it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@cretz cretz marked this pull request as ready for review October 27, 2023 13:19
@cretz
Copy link
Member Author

cretz commented Oct 27, 2023

Marking ready for review now that schedule start workflow action issue mentioned above has been resolved via a 💥 breaking change to replace the rarely used ScheduleActionStartWorkflow.search_attributes with ScheduleActionStartWorkflow.typed_search_attributes (release notes will be very clear here)

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I never submitted this review. Anyway, the comments are not blocking.

Comment on lines +280 to +283
Union[
temporalio.common.TypedSearchAttributes,
temporalio.common.SearchAttributes,
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just make an alias for this so it's not repeated so much

temporalio/common.py Show resolved Hide resolved
@cretz cretz merged commit 7c0a464 into temporalio:main Nov 2, 2023
11 checks passed
@cretz cretz deleted the typed-search-attributes branch November 2, 2023 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants