Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds --verify #195

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Adds --verify #195

wants to merge 4 commits into from

Conversation

jspaezp
Copy link

@jspaezp jspaezp commented Feb 18, 2024

Discussed here #153

The idea is to have a cli flag that works as --dry-run BUT returns a status 1 if any change would have been made on the files.

This PR is still a work in progress, right now its compatible with the behavior of the rest of the flags but I think we need to add a test case that verifies its behavior directly (2 files, one that gets changed, one that doesnt and make sure they exit with the correct error) an additional test I can think of is that no file should return status 1 if it has already been stripped (so .... step 1, verify it returns status 1, make sure no changes are made; step 2 run bstripout, step 3 verify that it returns status 0 on the modified file).

if args.mode == 'zeppelin':
nb = json.load(input_stream, object_pairs_hook=collections.OrderedDict)
nb_str_orig = json.dumps(nb, indent=2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for writing to a JSON formatted string here and below? Could you also just compare the dicts directly? Not necessarily objecting, just trying to understand why you've implemented it this way :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was afraid of instances where the dict is different but the serialization is the same; for instance if for some reason the strip zepellin converts a list to a tuple. Both are equivalent in json but not as python dicts. (It might also prevent some edge cases where copying vs deep cloning might be an issue)

Example

import json

d1 = {"a": 1, "b": [1,]}
d2 = {"a": 1, "b": (1,)}

print(json.dumps(d1) == json.dumps(d2))
# True
print(d2 == d1)
# False
print(json.dumps(d2))
# {"a": 1, "b": [1]}

(I am not 100% sure if my concerns are totally grounded, but felt better).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are indeed correct that both strip_output and strip_zeppelin_output mutate the existing dict, so we'd probably need to create a deep copy for comparison purposes. I'd prefer that over the JSON serialization.

nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
Comment on lines +529 to +530
any_local_change = process_notebook(input_stream, output_stream, args, extra_keys)
any_change = any_change or any_local_change
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
any_local_change = process_notebook(input_stream, output_stream, args, extra_keys)
any_change = any_change or any_local_change
if process_notebook(input_stream, output_stream, args, extra_keys):
any_change = True

Wouldn't this be easier?

with open(NOTEBOOKS_FOLDER / expected_file, mode="r") as f:
expected = f.read()
expected_str = json.dumps(json.loads(expected), indent=2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question: why do you need to convert to JSON?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside the test my main concern was that line endings might be different between the input and the output files. (If my memory is not failing me...)

Although to be fair this might have been an overstep. since it actually changes the test ... it tests whether the content is equivalent instead of testing that the file is actually the same. (lmk if I should revert it here)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should indeed not be required. When I investigated this I was going crazy cause there was a persistently failing test, until I realized this was due to some extra leading whitespace in one of the test notebook input files, which I fixed in 5077ab4. Can you please rebase your PR on main and then make those changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix, will do.

jspaezp and others added 3 commits March 18, 2024 02:28
Co-authored-by: Florian Rathgeber <[email protected]>
Co-authored-by: Florian Rathgeber <[email protected]>
Co-authored-by: Florian Rathgeber <[email protected]>
if args.mode == 'zeppelin':
nb = json.load(input_stream, object_pairs_hook=collections.OrderedDict)
nb_str_orig = json.dumps(nb, indent=2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are indeed correct that both strip_output and strip_zeppelin_output mutate the existing dict, so we'd probably need to create a deep copy for comparison purposes. I'd prefer that over the JSON serialization.

with open(NOTEBOOKS_FOLDER / expected_file, mode="r") as f:
expected = f.read()
expected_str = json.dumps(json.loads(expected), indent=2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should indeed not be required. When I investigated this I was going crazy cause there was a persistently failing test, until I realized this was due to some extra leading whitespace in one of the test notebook input files, which I fixed in 5077ab4. Can you please rebase your PR on main and then make those changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants