-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very slow and memory taxing clone of a big repo #147
Comments
How much RAM are you working with? dgit indexing is slow for me, but I've never had it run out of memory. I started refactoring things to add support for the git v2 protocol yesterday which should help with bigger repos once it's done, but if you're getting to the indexing stage then you're already past the part where it would have helped. I think there's two performance issues with the indexing that need to be tackled for this: one is that git's implementation is multithreaded while dgit's isn't and two is that dgit keeps an in memory cache of objects that it's found to help resolve deltas. (The latter is probably the main culprit.) |
This retrieves objects from a pack file by using an io.ReaderAt of the packfile rather an in-memory cache of objects that have been found so far. The result is that the amount of memory required to do a fetch or clone is proportional to the number of objects, rather than proportional to the total fully resolved object size of everything in the pack. With this change, I was able to dgit clone https://github.com/golang/go using 311Mb of RAM (according to top) in 12 minutes. (The clone then paniced while trying to reset the index, but I was able to manually do a "dgit reset --hard" and get a fully checked out copy of the Go repo.) Partially resolves #147.
This retrieves objects from a pack file by using an io.ReaderAt of the packfile rather an in-memory cache of objects that have been found so far. The result is that the amount of memory required to do a fetch or clone is proportional to the number of objects, rather than proportional to the total fully resolved object size of everything in the pack. With this change, I was able to dgit clone https://github.com/golang/go using 311Mb of RAM (according to top) in 12 minutes. (The clone then paniced while trying to reset the index, but I was able to manually do a "dgit reset --hard" and get a fully checked out copy of the Go repo.) Partially resolves #147.
This retrieves objects from a pack file by using an io.ReaderAt of the packfile rather an in-memory cache of objects that have been found so far. The result is that the amount of memory required to do a fetch or clone is proportional to the number of objects, rather than proportional to the total fully resolved object size of everything in the pack. With this change, I was able to dgit clone https://github.com/golang/go using 311Mb of RAM (according to top) in 12 minutes. (The clone then paniced while trying to reset the index, but I was able to manually do a "dgit reset --hard" and get a fully checked out copy of the Go repo.) Partially resolves #147.
The commit message said partially resolves, not resolves.. |
The speed should further improved by #267 (it's still more memory intensive than it should be and not at the speed of git/git, but it's getting better.) |
This is on 9front, go 1.11. Cloning small/medium size repos works fine;
cloning big(ger) repos is impossible to see through completion due
to time and space limitations of my digital computer:
It took half an hour to get to the above state, after which I killed
the process to avoid running out of memory.
For comparison, same network, similar machine, running OpenBSD,
with standard git:
and with dgit:
The text was updated successfully, but these errors were encountered: