Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow and memory taxing clone of a big repo #147

Open
ghost opened this issue Sep 22, 2018 · 4 comments · Fixed by #160
Open

very slow and memory taxing clone of a big repo #147

ghost opened this issue Sep 22, 2018 · 4 comments · Fixed by #160

Comments

@ghost
Copy link

ghost commented Sep 22, 2018

This is on 9front, go 1.11. Cloning small/medium size repos works fine;
cloning big(ger) repos is impossible to see through completion due
to time and space limitations of my digital computer:

; dgit clone https://github.com/golang/go
...
Indexing objects: 20% (71561/356126)
; ps | grep dgit
glenda         8739    0:12   0:39  1437232K Pread    dgit

It took half an hour to get to the above state, after which I killed
the process to avoid running out of memory.

For comparison, same network, similar machine, running OpenBSD,
with standard git:

$ time git clone https://github.com/golang/go
...
    1m38.04s real     0m32.31s user     0m09.08s system

and with dgit:

$ time dgit clone https://github.com/golang/go
...
Indexing objects: 23% (81037/356126)
fatal error: runtime: out of memory
... [ stack dump ] ...
    2m48.82s real    0m51.92s user    0m41.69s system
@driusan
Copy link
Owner

driusan commented Sep 23, 2018

How much RAM are you working with? dgit indexing is slow for me, but I've never had it run out of memory.

I started refactoring things to add support for the git v2 protocol yesterday which should help with bigger repos once it's done, but if you're getting to the indexing stage then you're already past the part where it would have helped.

I think there's two performance issues with the indexing that need to be tackled for this: one is that git's implementation is multithreaded while dgit's isn't and two is that dgit keeps an in memory cache of objects that it's found to help resolve deltas. (The latter is probably the main culprit.)

driusan added a commit that referenced this issue Oct 10, 2018
This retrieves objects from a pack file by using an io.ReaderAt
of the packfile rather an in-memory cache of objects that have
been found so far. The result is that the amount of memory required
to do a fetch or clone is proportional to the number of objects,
rather than proportional to the total fully resolved object size of
everything in the pack.

With this change, I was able to dgit clone https://github.com/golang/go
using 311Mb of RAM (according to top) in 12 minutes. (The clone then
paniced while trying to reset the index, but I was able to manually do
a "dgit reset --hard" and get a fully checked out copy of the Go repo.)

Partially resolves #147.
driusan added a commit that referenced this issue Oct 10, 2018
This retrieves objects from a pack file by using an io.ReaderAt
of the packfile rather an in-memory cache of objects that have
been found so far. The result is that the amount of memory required
to do a fetch or clone is proportional to the number of objects,
rather than proportional to the total fully resolved object size of
everything in the pack.

With this change, I was able to dgit clone https://github.com/golang/go
using 311Mb of RAM (according to top) in 12 minutes. (The clone then
paniced while trying to reset the index, but I was able to manually do
a "dgit reset --hard" and get a fully checked out copy of the Go repo.)

Partially resolves #147.
driusan added a commit that referenced this issue Oct 11, 2018
This retrieves objects from a pack file by using an io.ReaderAt
of the packfile rather an in-memory cache of objects that have
been found so far. The result is that the amount of memory required
to do a fetch or clone is proportional to the number of objects,
rather than proportional to the total fully resolved object size of
everything in the pack.

With this change, I was able to dgit clone https://github.com/golang/go
using 311Mb of RAM (according to top) in 12 minutes. (The clone then
paniced while trying to reset the index, but I was able to manually do
a "dgit reset --hard" and get a fully checked out copy of the Go repo.)

Partially resolves #147.
@driusan
Copy link
Owner

driusan commented Oct 11, 2018

The commit message said partially resolves, not resolves..

@driusan driusan reopened this Oct 11, 2018
@driusan
Copy link
Owner

driusan commented Oct 15, 2018

I was able to clone it after #160, #161 and #163. (It took about 45 minutes, but it ran to completion)

@driusan
Copy link
Owner

driusan commented May 17, 2020

The speed should further improved by #267 (it's still more memory intensive than it should be and not at the speed of git/git, but it's getting better.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant