Skip to content
This repository has been archived by the owner on Oct 8, 2024. It is now read-only.

Performace problem when many ignored files are present in the repo #61

Open
prusse-martin opened this issue Jan 29, 2021 · 4 comments
Open

Comments

@prusse-martin
Copy link
Member

Related to #58

Listing all ignored files to later filter out those can have a big impact when we have some ignored folder with many files.
Maybe we should call ls-tree to obtain the file list:

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | head -n 15
.coveragerc
.project
.pydevproject
.vscode/launch.json
.vscode/settings.json
.vscode/tasks.json
alfasim_gui.spec.yml
dist/all/.gitignore
docs/.gitignore
docs/ALFAsim_Technical_Manual___EN_US.pdf
docs/conf.py
docs/gui.rst
docs/images/advanced.png
docs/images/advanced_options.png
docs/images/advanced_options_model_explorer.png

In my local machine:

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git status --ignored --untracked-files=all  --porcelain=2 | wc -l
41613

W:\alfasim\Projects\alfasim\alfasim_gui (fb-ASIM-3738-transient-input -> origin/current)
(_alfasim_gui-win64-py36) λ git ls-tree -r HEAD --name-only | wc -l
858

Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).

Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?

@prusse-martin prusse-martin changed the title Performace problem when many ignored files are present in thje repo Performace problem when many ignored files are present in the repo Jan 29, 2021
@nicoddemus
Copy link
Member

Having git list all untracked files (41k) to later filter the files from the repo was a bad idea (my bad).

Can you post some timings as well? I wouldn't think listing 41k files would take too long (we're talking minutes if I recall our discussion in RC).

Does calling git ls-tree -r HEAD --name-only to get the name of the tracked files plus parsing the output of git status --untracked-files=all to get the untracked files letting git it self filter out the ignored files a good approach?

Not sure, why would that be faster? I mean currently we do a single git call (#59), you think is the parsing of that output that is showing a slowdown?

@prusse-martin
Copy link
Member Author

@ggrbill was having a 20min delay when executing that one single call ( git status --ignored --untracked-files=all --porcelain=2), his ignored "tmp" folder had over 2GB and the file count was well over 4 000 000 000.
Asking git to list will allow it to better handle the "ignored" files.

@nicoddemus
Copy link
Member

ahh ok, got it, thanks.

So the proposal is to only execute ff on tracked files except untracked files, instead of executing on all files except ignored ones?

@prusse-martin
Copy link
Member Author

prusse-martin commented Feb 1, 2021

(tracked files) + (untracked but not ignored)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants