Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop the _adapt_batch() from changing the batch in-place #306

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adityasuthar
Copy link

Pull Request

Description

The _adapt_batch() method changes the batch in place. This is bad practice. We are updating the method so that it returns a new batch without changing the old one.

Fixes #210

How Has This Been Tested?

I tested by running the python -m pytest tests command.

  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu
Copy link
Member

dfulu commented Feb 12, 2025

Hi @adityasuthar, thanks for doing this. I think this won't fully cover the issue. The batch object is a nested dictionary (max depth of 2) so the line you have in to copy it only creates a shallow copy, and the deeper part of the dict can is still changed.

The problem is probably best shown with an example:

# A sample dict
batch_dict = dict(
    a = [1,2,3],
    b = dict(
        b1 = [3,4,5],
        b2 = [6,7,8],
    )
)

# The line to make a new copy of the dict
new_batch_dict = {key: value.copy() for key, value in batch_dict.items()}

# We modify the new dict
new_batch_dict["a"][0] = 0
new_batch_dict["b"]["b1"][0] = 0

# But the nested part of the old dict is still changed
print(batch_dict)
>>> {'a': [1, 2, 3], 'b': {'b1': [0, 4, 5], 'b2': [6, 7, 8]}}

@dfulu dfulu self-requested a review February 12, 2025 17:11
@adityasuthar
Copy link
Author

Hi @dfulu,

Thanks for reviewing the code and pointing out the issue with the shallow copy. I have updated the approach to handle the nested dictionary structure in batch using deepcopy.

To correctly copy both the first-level and nested structures, I updated the approach to:

new_batch_dict = {key: copy.deepcopy(value) for key, value in batch_dict.items()}

This ensures all nested elements are fully copied, preventing unwanted modifications.

Example with Fix Applied:

batch_dict = {
    "a": [1, 2, 3],
    "b": {
        "b1": [3, 4, 5], 
        "b2": [6, 7, 8],
    }
}

new_batch_dict = {key: copy.deepcopy(value) for key, value in batch_dict.items()}

# Modify new dict
new_batch_dict["a"][0] = 0
new_batch_dict["b"]["b1"][0] = 0

print(batch_dict)
>>> {'a': [1, 2, 3], 'b': {'b1': [3, 4, 5], 'b2': [6, 7, 8]}}  # Original batch remains unchanged

print(new_batch_dict)
>>> {'a': [0, 2, 3], 'b': {'b1': [0, 4, 5], 'b2': [6, 7, 8]}}  # Only new batch is modified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop the _adapt_batch() from changing the batch in-place
2 participants