-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue #33
Comments
I'm not sure how the prompt (of your shell?) can be affected by nbstripoutput, but if you want fast, try jq: http://janschulz.github.io/windows-dev-environment.html -> "proper diffs and commits for notebooks" (sorry, my homepage seems to not put in anchors in headlines :-( ) |
oh, that's because my prompt (zsh->garrett) includes git status information, and the status is defined by git getting active, using the nbstripout filter to define the status for each notebook. |
Huih, I've currently 38 ipynb in a folder with a jq based stripoutput filter ( |
are u aware that git apparently is able to cache the status somehow? It's only slow for me for the first time entering the folder after I-dont-know-what-interval-or-action. After that it's seemless as well. 66 notebooks here though. ;) |
Thats sounds more like a filesystem cache (the git status manpage doesn't say anything about a cache for status information)? Do you have a ssd or hdd? |
ssd. |
@michaelaye Do you have
@JanSchulz Thanks for the pointer to |
FYI, I uploaded two versions of much faster nbstripout-like tools. They focus on being fast git strip out filters, without much else.
As of this writing They aren't exactly the same as These were written for fastai - we ended up using the pure python version as it was the fastest. Enjoy. |
Thanks, @stas00! Feel free to send a PR for the |
@kynan, the python rewrite is very different internally - so PR won't be possible. It doesn't use any of the nbconvert/nbformat stuff. it's faster than rapidjson or jq. On the other hand it only does the stripping out (no installation tools, validation, etc.). I first tried to get Given that most likely the strip out has to be as fast as possible for using git with many notebooks, i'd say the original Also, And yes, feel free to incorporate p.s. if you decide to integrate/adopt the code the -d option keeps outputs and also preserves a few other metadata fields that we use for documentation notebooks, which differs from code notebooks. Obviously this won't be useful for a general user. |
Hi, This is a really excellent project, but like above I found it had too much overhead on larger repos. I wrote a pure rust version (with python bindings so it can be pip installed) located at https://github.com/deshaw/nbstripout-fast (happed to chose the same name as @stas00). My testing shows this is ~200x faster. Really intersted to hear your thoughts @kynan. This is not a true a 1:1 match, but all key features should be included and a few more added. Is there a way in which this project could let users choose to use the rust version? Your install/setups scripts are great and clearly this project is very very popular. I think some sort of linkage there would be a net postive for the community. |
Do you mean this is not a true 1:1 match?
In theory yes, however doesn't it feel a little odd to have one tool install another? Can you give more detail on what exactly you're thinking of? How do you see |
Yes - sorry!
Opened #179 |
Everytime i go into a repo with lots of notebooks, it takes several seconds before i get my prompt back, which can be annoying...
I'm wondering if the official PreProcessor of the notebook tools is faster than your manual filtering?
The text was updated successfully, but these errors were encountered: