Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify our unpickling logic and add tests for Python 2 pickles #2160

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Jan 31, 2025

This is fixing the issue that Alex pointed out here #2156 (comment) .

We do not need the str case - the stuff we are decoding is always bytes - otherwise we would not be in the pickling fallback. Nor do we need to specify a bytes encoding when decoding Python 2 objects. The "latin-1" fallback case is handled by pd.read_pickle so it's redundant for us to have it too.

The tests I added also pass without my changes to _normalization.py and I think suffice to show there is no behaviour change here.

TODO add tests with a non-standard encoding?

Old Logic

Old logic for pickles inside the msgpack was:

  • If pickled in python 2, unpack msgpack with raw=True, pickle.loads with encoding="ascii".
  • If pickled in python 3, unpack msgpack with raw=False, pickle.loads with encoding="bytes".

Where raw means:

        """    :param bool raw:
                If true, unpack msgpack raw to Python bytes.
                Otherwise, unpack to Python str by decoding with UTF-8 encoding (default).

        """

Base automatically changed from metadata-pickling-docs to master January 31, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant