-
-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame
or typing.pyspark_sql.DataFrame
Generic
#1447
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
* Disable irrelevant pylint warnings Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing. |
Signed-off-by: Brayan Jaramillo <[email protected]>
Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks! |
I'm getting the same failed tests locally for the |
Hi @brayan07 sorry for the delayed review on this! I believe the test errors are coming from |
In this PR we resolve the issue reported in #1446, where any Pydantic model with a
pandera.typing.pyspark.DataFrame
orpandera.typing.pyspark_sql.DataFrame
always throws a confusingValidationError
.For clarity, we want to make sure the following leads to the expected behavior:
We do this by creating a
_PydanticIntegrationMixIn
that can be used by bothpandera.typing.pyspark_sql.DataFrame
andpandera.typing.pyspark.DataFrame
.The content of the mixin is a variation of the methods used in
pandera.typing.pandas.DataFrame
.Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.