very slow and memory taxing clone of a big repo #147

ghost · 2018-09-22T14:30:59Z

This is on 9front, go 1.11. Cloning small/medium size repos works fine;
cloning big(ger) repos is impossible to see through completion due
to time and space limitations of my digital computer:

; dgit clone https://github.com/golang/go
...
Indexing objects: 20% (71561/356126)

; ps | grep dgit
glenda         8739    0:12   0:39  1437232K Pread    dgit

It took half an hour to get to the above state, after which I killed
the process to avoid running out of memory.

For comparison, same network, similar machine, running OpenBSD,
with standard git:

$ time git clone https://github.com/golang/go
...
    1m38.04s real     0m32.31s user     0m09.08s system

and with dgit:

$ time dgit clone https://github.com/golang/go
...
Indexing objects: 23% (81037/356126)
fatal error: runtime: out of memory
... [ stack dump ] ...
    2m48.82s real    0m51.92s user    0m41.69s system

The text was updated successfully, but these errors were encountered:

driusan · 2018-09-23T12:25:16Z

How much RAM are you working with? dgit indexing is slow for me, but I've never had it run out of memory.

I started refactoring things to add support for the git v2 protocol yesterday which should help with bigger repos once it's done, but if you're getting to the indexing stage then you're already past the part where it would have helped.

I think there's two performance issues with the indexing that need to be tackled for this: one is that git's implementation is multithreaded while dgit's isn't and two is that dgit keeps an in memory cache of objects that it's found to help resolve deltas. (The latter is probably the main culprit.)

This retrieves objects from a pack file by using an io.ReaderAt of the packfile rather an in-memory cache of objects that have been found so far. The result is that the amount of memory required to do a fetch or clone is proportional to the number of objects, rather than proportional to the total fully resolved object size of everything in the pack. With this change, I was able to dgit clone https://github.com/golang/go using 311Mb of RAM (according to top) in 12 minutes. (The clone then paniced while trying to reset the index, but I was able to manually do a "dgit reset --hard" and get a fully checked out copy of the Go repo.) Partially resolves #147.

driusan · 2018-10-11T00:18:17Z

The commit message said partially resolves, not resolves..

driusan · 2018-10-15T22:49:48Z

I was able to clone it after #160, #161 and #163. (It took about 45 minutes, but it ran to completion)

driusan · 2020-05-17T12:44:04Z

The speed should further improved by #267 (it's still more memory intensive than it should be and not at the speed of git/git, but it's getting better.)

driusan mentioned this issue Oct 10, 2018

Use a ReaderAt instead of an in-memory cache to resolve deltas #160

Merged

driusan closed this as completed in #160 Oct 11, 2018

driusan reopened this Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow and memory taxing clone of a big repo #147

very slow and memory taxing clone of a big repo #147

ghost commented Sep 22, 2018

driusan commented Sep 23, 2018

driusan commented Oct 11, 2018

driusan commented Oct 15, 2018

driusan commented May 17, 2020

very slow and memory taxing clone of a big repo #147

very slow and memory taxing clone of a big repo #147

Comments

ghost commented Sep 22, 2018

driusan commented Sep 23, 2018

driusan commented Oct 11, 2018

driusan commented Oct 15, 2018

driusan commented May 17, 2020