Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Check Python version when deserializing UDFs #19175

Merged
merged 8 commits into from
Oct 10, 2024
Merged

Conversation

stinodego
Copy link
Contributor

@stinodego stinodego commented Oct 10, 2024

cloudpickle does not support serde across Python versions:

Cloudpickle can only be used to send objects between the exact same version of Python.

Currently, failing to adhere to this restriction can cause code execution to hang.

So we should only use cloudpickle when necessary, and when we do, we should encode the Python version during serialization and check it during deserialization. I chose to only encode the minor version, as we only support Python 3 anyway and the patch version doesn't seem to matter.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Oct 10, 2024
Comment on lines -89 to -91
let pickle = PyModule::import_bound(py, "cloudpickle")
.or_else(|_| PyModule::import_bound(py, "pickle"))
.expect("Unable to import 'pickle'")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloudpickle simply re-exports pickle.loads, so trying to import cloudpickle during deserialization is a waste.

@stinodego stinodego marked this pull request as ready for review October 10, 2024 09:11
@stinodego stinodego force-pushed the cloudpickle-protocol branch from 1f3bd17 to 1c03027 Compare October 10, 2024 09:21
@stinodego
Copy link
Contributor Author

stinodego commented Oct 10, 2024

Seems the test_fork_safety test is failing for this branch - though it seems unrelated to my changes.

@itamarst Could you have a look what could be going on? I think it was introduced in your recent PR (#19149)

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

Attention: Patch coverage is 98.36066% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.81%. Comparing base (48bc09b) to head (6dd7865).
Report is 32 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-plan/src/dsl/python_udf.rs 98.36% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #19175   +/-   ##
=======================================
  Coverage   79.80%   79.81%           
=======================================
  Files        1532     1532           
  Lines      208500   208539   +39     
  Branches     2418     2418           
=======================================
+ Hits       166399   166436   +37     
- Misses      41554    41556    +2     
  Partials      547      547           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stinodego stinodego marked this pull request as draft October 10, 2024 10:56
@itamarst
Copy link
Contributor

Looking...

@itamarst
Copy link
Contributor

itamarst commented Oct 10, 2024

Ah, so, jax has the same limitations as polars in that it doesn't like fork(), so they installed a fork() handler that does a warning, and a test that uses fork() then triggers that warning if jax happened to be used or maybe just imported first. So solution might be... suppress warnings inside that test? E.g. adding a @pytest.mark.filterwarnings("ignore") on that particular test would probably work.

Shall I do that as a separate PR?

@stinodego
Copy link
Contributor Author

Shall I do that as a separate PR?

@itamarst sounds great, thanks!

@stinodego stinodego marked this pull request as ready for review October 10, 2024 13:03
@@ -298,3 +334,17 @@ impl Expr {
}
}
}

/// Get the minor Python version from the `sys` module.
fn get_python_minor_version() -> u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cache this behind a lazylock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done!

Unfortunately, now I can no longer patch the sys.version_info from the Python side, so we can no longer really test this. If you have an idea how to test it, let me know. Otherwise, we'll just have to trust the code 😬

@stinodego stinodego merged commit 9162f67 into main Oct 10, 2024
24 of 25 checks passed
@stinodego stinodego deleted the cloudpickle-protocol branch October 10, 2024 14:26
@c-peters c-peters added the accepted Ready for implementation label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants