-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Enabled path normalization when enable_best_effort_determinism set to… #36617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… True for cloudpickle
Summary of ChangesHello @praneetnadella, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial enhancement to the Apache Beam Python SDK by enabling path normalization for code objects during Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
R: @claudevdm |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
| """A test module for source file path normalization.""" | ||
|
|
||
|
|
||
| def func_for_path_test(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define this in cloudpickle_pickler_test instead of creating a new file?
| enable_stable_code_identifier_pickling=False, | ||
| config: cloudpickle.CloudPickleConfig = DEFAULT_CONFIG) -> bytes: | ||
|
|
||
| if enable_stable_code_identifier_pickling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the difference between these options.
Is enable_best_effort_determinism meant to include the stable identifier change too? Or is that not part of best effort determinisn?
| _LOGGER.warning( | ||
| 'Ignoring unsupported option: enable_best_effort_determinism. ' | ||
| 'This has only been implemented for dill.') | ||
| config_kwargs['filepath_interceptor'] = cloudpickle.get_relative_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rather explicitly define a BEST_EFFORT_DETERMINISM config at the top of this file and use that? Its easier to see exact settings BEST_EFFORT_DETERMINISM requires and no copying configs required
… True for cloudpickle
This change fixes a source of non-deterministic serialization in the Beam Python SDK by ensuring that file paths associated with code objects (
co_filename) are normalized before being pickled bycloudpickle.This PR implements path normalization by tying it to the existing
enable_best_effort_determinismflag incloudpickle_pickler.py. When this flag is enabled, afilepath_interceptoris used to replace the absolute path with a deterministic, relative path.Tested:
test_code_object_path_is_normalized, a new unit test that specifically verifies that a function'sco_filenameis correctly normalized when theenable_best_effort_determinismflag is used.test_best_effort_determinism_not_implemented(renamed totest_best_effort_determinism_is_partially_supported) to check for the new, more informative log message that is emitted when the flag is enabled.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.