-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pydantic contrib module #757
base: main
Are you sure you want to change the base?
Conversation
34d1ae2
to
9f5b771
Compare
# constructor. Return the field value. | ||
if len(args) == 1: | ||
return args[0] | ||
return super().__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: pydantic is calling the _RestrictedProxy
constructor with a single (non-wrapped) datetime.datetime
argument. Returning args[0]
doesn't seem correct, but does work. I need to better understand exactly why pydantic is making this call. The stack looks like
-> o1, _ = make_pydantic_objects()
/Users/dan/src/temporalio/sdk-python/tests/contrib/test_pydantic.py(41)make_pydantic_objects()
-> MyPydanticModel(
/Users/dan/src/temporalio/sdk-python/.venv/lib/python3.9/site-packages/pydantic/main.py(214)__init__()
-> validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
> /Users/dan/src/temporalio/sdk-python/temporalio/worker/workflow_sandbox/_restrictions.py(958)__new__()
Pydantic v2 "validation" (i.e. model instance construction) is implemented in Rust, and since intervening stack frames are missing, I think that pydantic/main.py(214)
is calling into pydantic's Rust layer. But I would like to understand why pydantic is calling the constructor with the datetime value as the single argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to better understand exactly why pydantic is making this call
Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__
and/or the __init__
looking a certain way may make Pydantic think it it's a wrapper type.
When pydantic instantiates a model containing a field whose type annotation is proxied
With other create_model
code in this PR, pydantic should never be instantiating a model with proxied annotations right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was happening because the PR was implementing __get_pydantic_core_schema__
incorrectly. Fixed at efcd456
fa2c8b5
to
a1cdf1b
Compare
@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any: | |||
|
|||
def _is_restrictable(v: Any) -> bool: | |||
return v is not None and not isinstance( | |||
v, (bool, int, float, complex, str, bytes, bytearray) | |||
v, (bool, int, float, complex, str, bytes, bytearray, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: As things stand this means that users will be able to do my_datetime.now()
. But datetime.datetime.now()
will still be protected against, which is the usual way to call it. One option is to make this change only when conditional on HAVE_PYDANTIC
.
If we get rid of this change and allow datetime
instances to be restricted, then Pydantic validation fails. This failure is in Pydantic's Rust layer: https://github.com/pydantic/pydantic-core/blob/main/src/errors/types.rs#L706-L707
datetime_field
Input should be a valid datetime [type=datetime_type, input_value=datetime.datetime(2000, 1, 2, 3, 4, 5), input_type=_RestrictedProxy]
For further information visit https://errors.pydantic.dev/2.10/v/datetime_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date
, datetime.time
, and datetime.timedelta
objects, do we need explicit support for them too (at least the former)?
# building their validators causing it to use date instead | ||
# TODO(cretz): https://github.com/temporalio/sdk-python/issues/207 | ||
if pydantic.compiled: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pydantic.compiled
is not present in v2
if pydantic.compiled: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, date) | ||
else: | ||
assert isinstance(PydanticMessage(content=workflow.now()).content, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tested in test_pydantic.py
json_payload_converter = JSONPlainPayloadConverter( | ||
encoder=PydanticJSONEncoder, | ||
custom_type_converters=[PydanticModelTypeConverter()], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our pydantic sample was implementing to_payload
. This is slightly incorrect because it means that our AdvancedJSONEncoder
was not being used. In practice, pydantic provides a function (called to_jsonable_python
in v2) with similar functionality to our AdvancedJSONEncoder
, but nevertheless, we should ensure that we fall back to AdvancedJSONEncoder
in case we add to it in the future. That's what we do here, by using JSONPlainPayloadConverter
directly.
README.md
Outdated
@@ -325,6 +319,14 @@ This notably doesn't include any `date`, `time`, or `datetime` objects as they m | |||
Users are strongly encouraged to use a single `dataclass` for parameter and return types so fields with defaults can be | |||
easily added without breaking compatibility. | |||
|
|||
To use pydantic model instances (or python objects containing pydantic model instances), use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this a whole subsection, e.g. ##### Pydantic
pyproject.toml
Outdated
@@ -35,6 +35,7 @@ grpcio = {version = "^1.48.2", optional = true} | |||
opentelemetry-api = { version = "^1.11.1", optional = true } | |||
opentelemetry-sdk = { version = "^1.11.1", optional = true } | |||
protobuf = ">=3.20" | |||
pydantic = { version = "^2.10.6", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that pydantic v1 users aren't bound by our version constraint here if they never add the extra? Maybe the easiest way to confirm is somehow have a sample or test that confirms existing pydantic support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the bound to
pydantic = { version = ">=1.10.0,<3.0.0", optional = true }
and tested using the v1
sample: https://github.com/temporalio/samples-python/pull/163/files#diff-80f59f3959404e07adb4ae966fa2c502346ee73a3e7e22b2a958a70db13f8526
temporalio/contrib/pydantic.py
Outdated
# pydantic v1 | ||
from pydantic.json import pydantic_encoder as to_jsonable_python # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we saying this contrib library works even with pydantic v1? If so, we should change our version constraint to be >=
whatever the absolute lowest we support is. If not, we should probably error if they try to import this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dropped any attempt to support v1 in the new contrib module, and also reverted any attempt to support v2 via the pre-contrib-module v1 hacks we have. So the PR is now more or less a strict addition of a v2-only contrib module, retaining the existing v1 hacks.
temporalio/contrib/pydantic.py
Outdated
if temporalio.workflow.unsafe.in_sandbox(): | ||
# Unwrap proxied model field types so that Pydantic can call their constructors | ||
model = pydantic.create_model( | ||
model.__name__, | ||
**{ # type: ignore | ||
name: (RestrictionContext.unwrap_if_proxied(f.annotation), f) | ||
for name, f in model.model_fields.items() | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions here just due to my lack of knowledge...
- Arguably models should not be defined/loaded in the sandbox right? That means they are reloaded for every workflow run. They should be imported via passthrough. I wonder if we should have a warning if we detect any model fields are proxied? Or maybe a warning if we see Pydantic models being loaded in the sandbox instead of passed through? EDIT: Seeing test I can see situations where people define their models in the sandbox, even if we discourage it.
- Is
create_model
cheap? Meaning is this what Pydantic often uses anyways to instantiate a model instance, or do they somehow cache this definition and manually doingcreate_model
isn't meant for each time you need the instance? If it is not cheap, we should both consider caching and consider not calling it if none of the types are proxied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happily we don't need to worry about this. The problem I was solving here was fixed by the fixed implementation of __get_pydantic_core_schema__
.
temporalio/contrib/pydantic.py
Outdated
}, | ||
) | ||
if hasattr(model, "model_validate"): | ||
return model.model_validate(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this raises a Pydantic-specific exception that a user could add to workflow failure exception types if they wanted? (if not, we should consider a usability improvement to have an option to wrap this in application error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it raises pydantic.ValidationError
@@ -943,26 +952,37 @@ def r_op(obj: Any, other: Any) -> Any: | |||
|
|||
def _is_restrictable(v: Any) -> bool: | |||
return v is not None and not isinstance( | |||
v, (bool, int, float, complex, str, bytes, bytearray) | |||
v, (bool, int, float, complex, str, bytes, bytearray, datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we never meant for the instances to be restricted just the static methods, so this is fine IMO. What about datetime.date
, datetime.time
, and datetime.timedelta
objects, do we need explicit support for them too (at least the former)?
if HAVE_PYDANTIC and __name == "__get_pydantic_core_schema__": | ||
return object.__getattribute__(self, "__get_pydantic_core_schema__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, trying to think if we even want this generic to void pydantic code inside our sandbox. I guess all ways to do it would require that the contrib library is at least imported by the user somewhere in the process, but maybe that is ok? If that is ok, I wonder if a side-effect of importing the contrib is ideal over altering the sandbox even if pydantic
is even available? Though I admit I kinda like the idea of this patch if pydantic
is even available regardless of contrib lib usage, so maybe this should remain as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wonder about this a bit. I don't see much downside with the way it is currently.
tests/contrib/test_pydantic.py
Outdated
|
||
class MyPydanticModel(BaseModel): | ||
ip_field: IPv4Address | ||
datetime_field: datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add datetime.date
, datetime.time
, and datetime.timedelta
in here? And maybe uuid.UUID
while we're at it since we have restrictions in that module too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, the test suite has been significantly expanded.
tests/contrib/test_pydantic.py
Outdated
ShortSequence = Annotated[SequenceType, Len(max_length=2)] | ||
|
||
|
||
class MyPydanticModel(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say for many users they will be importing this model from a different module on passthrough. This test confirms the model works when defined in the sandbox, but I wonder if we also need at least a little coverage for models defined out of the sandbox (and passed through).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've structured the test suite so that models.py
(containing several model definitions) is imported in-sandbox, but a few of the models are defined in a different file and imported under the unsafe.imports_passed_through
context manager.
# constructor. Return the field value. | ||
if len(args) == 1: | ||
return args[0] | ||
return super().__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to better understand exactly why pydantic is making this call
Agreed, I am also curious why Pydantic is instantiating the class with a single argument. ChatGPT mentions that with "type adaption", the absence of __get_pydantic_core_schema__
and/or the __init__
looking a certain way may make Pydantic think it it's a wrapper type.
When pydantic instantiates a model containing a field whose type annotation is proxied
With other create_model
code in this PR, pydantic should never be instantiating a model with proxied annotations right?
tests/contrib/test_pydantic.py
Outdated
assert self.str_short_sequence == ["my-string-1", "my-string-2"] | ||
|
||
|
||
class MyPydanticDatetimeModel(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make us feel better, can we toss timedelta in here or somewhere and make sure it works (I'm sure it does since we don't proxy it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was just doing that in fact! ac3583c
d83249b
to
8b1ae77
Compare
# assertion rewriter | ||
"typing", | ||
"typing_extensions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this TypedDict
fields are broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypedDict fields are broken for Pydantic only or in general in the SDK? For the former, I can see that if they are using typing_extensions
, but for the latter that'd be a bit strange unless you're using typing_extensions
as a user. Also, can you make a comment above this line just saying why it is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for Pydantic. It caused a problem with pytest's assertion rewriter, as you noted for typing
on the line above. There's no drawback to passing it through, right?
Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, no real drawback here (it's effectively part of stdlib IMO). We technically separated the restriction sets as "minimum", "with temporal", and "maximum" just in case people wanted to be more restrictive, but the default is "maximum" which is basically what everyone uses. You could move this from minimum to one of the others, but it doesn't really matter.
@@ -1032,7 +1065,7 @@ def __getitem__(self, key: Any) -> Any: | |||
) | |||
__str__ = _RestrictedProxyLookup(str) # type: ignore | |||
__bytes__ = _RestrictedProxyLookup(bytes) | |||
__format__ = _RestrictedProxyLookup() # type: ignore | |||
__format__ = _RestrictedProxyLookup(format) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a small bug in the existing sandboxing. Without this, test_restricted_proxy_dunder_methods
fails for pathlib.Path
fields.
5c2c452
to
0ce4836
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, only minor things
Pydantic v1 is not supported by this data converter. If you are not yet able to upgrade from Pydantic v1, see | ||
https://github.com/temporalio/samples-python/tree/main/pydantic_converter/v1 for limited v1 support. | ||
|
||
Do not use pydantic's [strict mode](https://docs.pydantic.dev/latest/concepts/strict_mode/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a quick phrase/sentence to the end on why? Maybe "Strict mode does not work with the Python SDK sandbox because of X"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good call. I think this comment should really have been marked TODO. At the very least it needs to be changed to say that it's only proxied types that can't be used as fields with strict=True
, but I need to look into this again to see what the correct course of action is.
pyproject.toml
Outdated
@@ -35,6 +35,7 @@ grpcio = {version = "^1.48.2", optional = true} | |||
opentelemetry-api = { version = "^1.11.1", optional = true } | |||
opentelemetry-sdk = { version = "^1.11.1", optional = true } | |||
protobuf = ">=3.20" | |||
pydantic = { version = ">=1.10.0,<3.0.0", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, so I suggested the >=1.x
because I thought we were going to support v1 with our contrib converter. But now that it sounds like we're not, might as well do a 2.x
version constraint correct? Or does this version constraint somehow apply even when not using the extra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I thought a ^2.x
bound would cause pip's dependency resolver to reject attempts to install 1.x
along with temporalio
. Changed to
pydantic = { version = "^2.0.0", optional = true }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were going to support v1 with our contrib converter. But now that it sounds like we're not
Yes, I backed out of that because (a) I think v2 is widely used already, (b) we want people to use v2 because v1 doesn't work with our sandbox, (c) it's conceptually clearer to have our new thing v2 only and the old hacks v1 only, and (d) not a huge challenge but we'd have to write a backport of to_jsonable_python
for v1, or similar shim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me
# assertion rewriter | ||
"typing", | ||
"typing_extensions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypedDict fields are broken for Pydantic only or in general in the SDK? For the former, I can see that if they are using typing_extensions
, but for the latter that'd be a bit strange unless you're using typing_extensions
as a user. Also, can you make a comment above this line just saying why it is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing blocking, though curious why datetime.date
is no longer a restricted proxy instance
assert ( | ||
type(date(2010, 1, 20)) | ||
== temporalio.worker.workflow_sandbox._restrictions._RestrictedProxy | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why this assertion no longer holds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because we no longer proxy instances of date
, just classes and methods. You agreed with that course of action:
I think we never meant for the instances to be restricted just the static methods
Shall we double-check this decision? Why were we proxing them originally -- just to prevent things like my_datetime.now()
? (which would be unusual Python that I can't think of a use case for, but is technically a sandbox hole that we're opening up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now, thanks! So type(type(date))
I guess is a _RestrictedProxy
heh
Fixes #621, #496
temporalio.contrib.pydantic
.v2
, but we do retain the existing code that should allowv1
to still be used.datetime.datetime
class, anddatetime.datetime
instances, with proxy objects, we have to unwrap proxied objects to allow pydantic to work correctly.See also #627, #207
TODO:
v1
usage via sampleAccompanying samples PR: temporalio/samples-python#163