-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Check Python version when deserializing UDFs #19175
Conversation
let pickle = PyModule::import_bound(py, "cloudpickle") | ||
.or_else(|_| PyModule::import_bound(py, "pickle")) | ||
.expect("Unable to import 'pickle'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloudpickle
simply re-exports pickle.loads
, so trying to import cloudpickle during deserialization is a waste.
1f3bd17
to
1c03027
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19175 +/- ##
=======================================
Coverage 79.80% 79.81%
=======================================
Files 1532 1532
Lines 208500 208539 +39
Branches 2418 2418
=======================================
+ Hits 166399 166436 +37
- Misses 41554 41556 +2
Partials 547 547 ☔ View full report in Codecov by Sentry. |
Looking... |
Ah, so, Shall I do that as a separate PR? |
@itamarst sounds great, thanks! |
@@ -298,3 +334,17 @@ impl Expr { | |||
} | |||
} | |||
} | |||
|
|||
/// Get the minor Python version from the `sys` module. | |||
fn get_python_minor_version() -> u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cache this behind a lazylock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done!
Unfortunately, now I can no longer patch the sys.version_info
from the Python side, so we can no longer really test this. If you have an idea how to test it, let me know. Otherwise, we'll just have to trust the code 😬
cloudpickle does not support serde across Python versions:
Currently, failing to adhere to this restriction can cause code execution to hang.
So we should only use
cloudpickle
when necessary, and when we do, we should encode the Python version during serialization and check it during deserialization. I chose to only encode the minor version, as we only support Python 3 anyway and the patch version doesn't seem to matter.