Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support git-filter-repo #193

Closed
LunarLanding opened this issue Feb 16, 2024 · 2 comments · Fixed by #194
Closed

Support git-filter-repo #193

LunarLanding opened this issue Feb 16, 2024 · 2 comments · Fixed by #194

Comments

@LunarLanding
Copy link
Contributor

LunarLanding commented Feb 16, 2024

https://github.com/newren/git-filter-repo is pointed to by git-filter-branch via a scary warning:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
	 rewrites.  Hit Ctrl-C before proceeding to abort, then use an
	 alternative filtering tool such as 'git filter-repo'
	 (https://github.com/newren/git-filter-repo/) instead.  See the
	 filter-branch manual page for more details; to squelch this warning,
	 set FILTER_BRANCH_SQUELCH_WARNING=1.

Apparently git-filter-repo is a python tool that can run python code directly on the text objects.
Like so (found here, edited so it works for my case)

git filter-repo --path-glob '**/*.ipynb' --blob-callback '
import json
try:
    notebook = json.loads(blob.data)
    cleaned=False
    if (type(notebook) is dict) and ("cells" in notebook) and type(notebook["cells"]) is list:
        for cell in notebook["cells"]:
            if type(cell) is dict and "outputs" in cell and cell["outputs"]:
                cell["outputs"] = []
                cleaned=True
        if cleaned:
            print("cleaned")
            blob.data = (json.dumps(notebook, ensure_ascii=False, indent=1,
                                sort_keys=True) + "\n").encode("utf-8")
except json.JSONDecodeError as ex:
    pass
except UnicodeDecodeError as ex:
    pass
'

It would be nice to have something like this but doing the rewriting with nbstripout.

@kynan kynan added type:enhancement help wanted type:documentation state:needs follow up This issue has interesting suggestions / ideas worth following up on labels Feb 17, 2024
@kynan
Copy link
Owner

kynan commented Feb 17, 2024

That is a great suggestion and one that has also crossed my mind before, in particular since we currently still mention git filter-branch in the README.

@LunarLanding Interested in sending a PR to document your approach in the README, potentially replacing the current recipe for git filter-branch?

@LunarLanding
Copy link
Contributor Author

@kynan just did, because I wanted to keep everything in the same interpreter and minimize write/read to disk for performance, it is slightly involved, but still useful information I think.

@kynan kynan linked a pull request Mar 17, 2024 that will close this issue
@kynan kynan modified the milestones: Backlog, 0.8.0 Mar 17, 2024
@kynan kynan added state:pr pending This issue has a pending pull request and removed help wanted state:needs follow up This issue has interesting suggestions / ideas worth following up on labels Mar 17, 2024
kynan added a commit that referenced this issue Mar 23, 2024
Update README.md with git-filter-repo example

Fixes #193.

---------

Co-authored-by: Florian Rathgeber <[email protected]>
@kynan kynan added resolution:fixed and removed state:pr pending This issue has a pending pull request labels Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants