-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize git-fat #34
Comments
I don't think parallelizing git-fat would really help to be frank. The major commands are all io bound. push/pull are network, filters are called by git, and anything moving files around are likely disk bound. I'd be delighted to be proven wrong though, so if you'd like to investigate and submit a pull request with some numbers showing improvements, I'd welcome it. |
With data that goes over ssh (eg rsync over ssh) it might be worth thinking about. Ssh itself is hugely not throughput oriented (and not multi-threaded AFAIK), so fast networks using it synchronously generally don't get anywhere near wire speed. Running parallel transfers with it does get to about wire speed (as done with lftp and others). Parallelizing at least that part of git-fat might be a significant win for large repo's with rsync-over-ssh remote stores |
So i've actually been looking at that very function (checkout()). I wanted to assess this by just trying to backgrounding all the git checkout-index calls and seeing how it performed, but the problem I ran into was the following line: Also see my issue here for more of my performance notes about checkout(): |
Hmm, windows performance will be one tough cookie to crack it seems. Just a heads up though if we do end up doing something about it, I'd like to keep all dependencies optional and configurable. The thing that attracted me to git-fat in the first place was the fact it was only one file and used rsync. |
It will be tough. The more I look at it, the more I'm thinking that it cannot be truly addressed without changes to git itself. At this point I'm not convinced the smudge/clean filter approach is very good for performance in general until that happens. Git annex has a long post on this: |
Hi all,
Has anyone looked into parallelizing some of the git-fat commands such as the one here? I was thinking about trying this myself but would first like to see if anyone here had any thoughts on the matter.
Thanks
The text was updated successfully, but these errors were encountered: