Reduce memory consumption of graph (pipe sets) #4498

stefanhaller · 2025-04-20T16:06:09Z

PR Description

As a followup to #2533, reduce the memory consumption some more by optimizing the storage of the pipe sets used for the commit graph.

Some coarse measurements (taken by opening the respective repo, typing 4 >, and waiting for all commits to be loaded, and then reading the Memory column in Activity Monitor):

	master	this PR
git	2.5 GB	1.0 GB
our work repo	2.3 GB	1.3 GB
linux kernel	94.8 GB	38.0 GB

It's still not really usable for the linux kernel, but for all other repos that I come across in my daily use of lazygit, it works quite well now.

codacy-production · 2025-04-20T16:08:12Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for `aa68885`¹	✅ 93.42%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`aa68885`)	Report Missing	Report Missing	Report Missing
Head commit (`38ca388`)	55874	48469	86.75%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#4498)	228	213	93.42%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct. ↩

jesseduffield

This is some substantial memory saving which is great. I'll need to test this out a bit because I'm worried about nil pointer exceptions.

It's a shame this change requires so many sweeping changes but I can't think of a way to not require sweeping changes of some form. I did some investigating into alternative approaches e.g.:

using [20]byte instead of string (much better than string, but doesn't address duplicates)
Having an immutable commit struct which itself lives in a pool. This would represent the actual commit and nothing independent of it like tags. Then each commit in the state can have a pointer to the corresponding immutable commit. You could have the immutable commit have its parents as just pointers to other immutable commits.
Storing a reference to the pool inside the commit and using .Hash() to encapsulate how to resolve the actual string. Unfortunately means bloating each commit with an 8 byte pointer.

Did you have any thoughts on the above?

stefanhaller · 2025-04-23T09:59:19Z

It's a shame this change requires so many sweeping changes but I can't think of a way to not require sweeping changes of some form.

I did consider making the Hash property of Commit private (lowercase) and exposing a getter. When doing this in an extra commit, the actual change of turning it into a pointer would require much fewer changes. I didn't do that just because currently all fields are public, but I'm totally open to changing this either just for this field, or for all.

Having an immutable commit struct which itself lives in a pool.

I don't see how this addresses the problem of the duplicate hash strings; unless you change the Pipes to have pointers to commits instead of pointers to hashes, is that what you meant? But then we need a map of commits by hash, so that we can look up the commit if we only have the hash. I'm not sure I like this direction.

Storing a reference to the pool inside the commit and using .Hash() to encapsulate how to resolve the actual string. Unfortunately means bloating each commit with an 8 byte pointer.

I'm not sure what you mean by "resolve the actual string", and why you need a reference to the pool for that. The .Hash() getter would just return the pointer; that's basically what I said above.

jesseduffield · 2025-04-23T10:18:18Z

I did consider making the Hash property of Commit private (lowercase) and exposing a getter. When doing this in an extra commit, the actual change of turning it into a pointer would require much fewer changes. I didn't do that just because currently all fields are public, but I'm totally open to changing this either just for this field, or for all.

I'm not sure what you mean by "resolve the actual string", and why you need a reference to the pool for that. The .Hash() getter would just return the pointer; that's basically what I said above.

I meant that the .Hash() method could just return the string directly so that you didn't need to de-reference it afterwards. But that would require holding a reference to the pool. On reflection I don't actually think it's a good idea.

I don't see how this addresses the problem of the duplicate hash strings; unless you change the Pipes to have pointers to commits instead of pointers to hashes, is that what you meant? But then we need a map of commits by hash, so that we can look up the commit if we only have the hash. I'm not sure I like this direction.

Yep that's what I meant: a map of hash to commit. I like the idea because it would allow us to do some more stuff in memory without having to shell out to git but it's a big piece of work and I'm not clear on the overall value.

stefanhaller · 2025-04-23T10:21:12Z

I'm not sure what you mean by "resolve the actual string", and why you need a reference to the pool for that. The .Hash() getter would just return the pointer; that's basically what I said above.

I meant that the .Hash() method could just return the string directly so that you didn't need to de-reference it afterwards. But that would require holding a reference to the pool. On reflection I don't actually think it's a good idea.

Still don't get it. We'd have a hash *string field and a Hash() string method that dereferences the field and returns it. Why do we need the pool for that?

stefanhaller · 2025-04-23T10:22:51Z

When I said above "The .Hash() getter would just return the pointer", I meant the dereferenced pointer of course.

jesseduffield · 2025-04-23T10:23:06Z

Still don't get it. We'd have a hash *string field and a Hash() string method that dereferences the field and returns it. Why do we need the pool for that?

Ah yes, I mistakenly thought you'd need to access the pool to deference. Well in that case I actually very much like the idea of having a Hash() method.

codacy-production · 2025-04-27T06:34:28Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for `d87bdaf`¹	✅ 94.53%

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`d87bdaf`)	Report Missing	Report Missing	Report Missing
Head commit (`6ca627d`)	55965	48566	86.78%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#4498)	274	259	94.53%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct. ↩

stefanhaller · 2025-04-27T06:35:10Z

@jesseduffield I reworked the branch completely, please have another look. I'm quite happy with this now.

jesseduffield

Massive improvement using the Hash() method. Nice one. Just one question

jesseduffield · 2025-04-29T09:33:33Z

pkg/gui/presentation/graph/graph.go

@@ -329,7 +322,7 @@ func renderPipeSet(
 	// we don't want to highlight two commits if they're contiguous. We only want
 	// to highlight multiple things if there's an actual visible pipe involved.
 	highlight := true
-	if prevCommit != nil && equalHashes(prevCommit.Hash, selectedCommitHash) {
+	if prevCommit != nil && equalHashes(prevCommit.HashPtr(), selectedCommitHash) {


It's a bit confusing to read that we're comparing a hash pointer with a hash, when in fact selectedCommitHash is not a hash: it's a hash pointer. Although that's clear enough when you're looking at the argument's type in the function signature, it's not clear when you read it in isolation e.g. in this line here. I think we should have a rule that if a variable contains a hash pointer, it should be end in HashPtr instead of just Hash.

Playing devils advocate against myself: often we pass around pointers to structs without needing to specify that it's a pointer in the variable name, so why the double standard when it comes to hashes? I think the answer is that scalar values (e.g. strings, integers, etc) are just expected not to be pointers and if they actually are pointers the default assumption is that they're only pointers for the sake of being possibly null.

Having said that, does it actually make a measurable difference to compare the pointers vs comparing the values themselves? Cos it would make the code simpler if we could fully encapsulate the pointer stuff inside the Hash() method and not have to sometimes pass hash pointers through functions.

Very interesting discussion. At my workplace we have the convention to name all pointer variables with a "p" prefix, i.e. pSelectedCommitHash, this would solve it. Some of my coworkers object to that though, and find it redundant. I kind of like it myself, but I'm not proposing we adopt the convention in lazygit. 😄

But I'm happy to rename selectedCommitHash to selectedCommitHashPtr (see b438e07).

Having said that, does it actually make a measurable difference to compare the pointers vs comparing the values themselves? Cos it would make the code simpler if we could fully encapsulate the pointer stuff inside the Hash() method and not have to sometimes pass hash pointers through functions.

I would agree if equalHashes were a public function, but it's a private implementation detail, so I don't think it matters as much. And it is sometimes called with pipe.toHash as an argument, which is a pointer, so we would have to dereference it then; I don't feel this would be an improvement.

But I'm happy to rename selectedCommitHash to selectedCommitHashPtr (see b438e07).

Looks good to me. I much prefer having it as a 'Ptr' suffix to a 'p' prefix.

I would agree if equalHashes were a public function, but it's a private implementation detail, so I don't think it matters as much. And it is sometimes called with pipe.toHash as an argument, which is a pointer, so we would have to dereference it then; I don't feel this would be an improvement.

Although that's true, RenderAux and RenderCommitGraph are public and they require a pointer for the sake of passing to equalHashes meaning we do get some blast radius from this performance (as we see in your commit). It's not so bad, but I hope there is a good performance gain we're getting in return.

jesseduffield

LGTM

See https://go.dev/blog/testing-b-loop

These are not the expected commits.

The "// merge commit" comment was plain wrong, this is any commit that has a parent, merge or not. The "else if" condition was unnecessary, a plain "else" would have been enough. But the code in the two blocks was almost identical, so extract the one thing that was different and unify it. And while we're at it, use IsFirstCommit() instead of counting parents.

We want to unexport Parents in a later commit.

This is in preparation for turning the hash into pointer to a string.

This in itself is not an improvement, because hashes are unique (they are shared between real commits and rebase todos, but there are so few of those that it doesn't matter). However, it becomes an improvement once we also store parent hashes in the same pool; but the real motivation for this change is to also reuse the hash pointers in Pipe objects later in the branch. This will be a big win because in a merge-heavy git repo there are many more Pipe instances than commits.

This is exactly the same as what we did for Hash earlier. And for the same reason: we want to turn the parents field into a slice of pointers.

We need to pass %P instead of %p in the format string of the git log command, so that the parent hashes have the full length and can be shared with the real hashes.

Change the base type of some of our enums from int to uint8, and reorder fields for better packing. This reduces the size of models.Commit from 152 to 132 bytes on my machine. This doesn't improve overall memory usage significantly, but why not save a little bit of memory if it's easy.

Now that commit hashes are stored in a pool and referenced by pointer by the commits, we can use those same pointers in the pipes.

Now that all hashes that we deal with are stored in the same pool, we can simply compare their addresses.

This saves some memory at the cost of a slight performance increase (I suppose reallocting the slice when adding new Pipes is slightly more expensive now). Performance of the BenchmarkRenderCommitGraph benchmark is 130μs before, 175μs after. I'm guessing this is still acceptable.

The instances are held by the AuthorStyle cache.

…king Hopefully, graphs will never get wider than 32768 characters. (They would get kind of hard to navigate if they did...) This reduces the size of the Pipe struct from 48 to 32 bytes, which makes a significant difference when there are many millions of instances.

stefanhaller added the enhancement New feature or request label Apr 20, 2025

stefanhaller requested a review from jesseduffield April 20, 2025 16:06

stefanhaller force-pushed the reduce-pipe-memory branch from 42280e9 to 38ca388 Compare April 22, 2025 13:17

jesseduffield reviewed Apr 23, 2025

View reviewed changes

stefanhaller force-pushed the reduce-pipe-memory branch from 38ca388 to ff3d700 Compare April 27, 2025 06:32

jesseduffield reviewed Apr 29, 2025

View reviewed changes

stefanhaller force-pushed the reduce-pipe-memory branch from ff3d700 to dd74851 Compare April 29, 2025 10:04

jesseduffield approved these changes Apr 29, 2025

View reviewed changes

stefanhaller added 14 commits April 29, 2025 14:57

Bump git-todo-parser

14eb4c2

Modernize benchmarks

4cfa6e0

See https://go.dev/blog/testing-b-loop

Remove unused function

9d202cf

Fix confusing variable name

5b109b2

These are not the expected commits.

Cleanup: use IsMerge instead of counting Parents

5844ec6

Rewrite generateCommits to avoid write access to commit.Parents

97aa7a0

We want to unexport Parents in a later commit.

Make Commit.Hash a getter for an unexported hash field

1037371

This is in preparation for turning the hash into pointer to a string.

Make Commit.Parents a getter for an unexported parents field

0f1f455

This is exactly the same as what we did for Hash earlier. And for the same reason: we want to turn the parents field into a slice of pointers.

Store Commit.Parents in the pool too

722cc85

We need to pass %P instead of %p in the format string of the git log command, so that the parent hashes have the full length and can be shared with the real hashes.

Store fromHash/toHash in Pipe struct as pointers

13c2136

Now that commit hashes are stored in a pool and referenced by pointer by the commits, we can use those same pointers in the pipes.

Simplify equalHashes

18e5b0a

Now that all hashes that we deal with are stored in the same pool, we can simply compare their addresses.

stefanhaller added 3 commits April 29, 2025 14:57

Store TextStyle in Pipe struct as pointer

e63abf8

The instances are held by the AuthorStyle cache.

stefanhaller force-pushed the reduce-pipe-memory branch from dd74851 to 6ca627d Compare April 29, 2025 12:57

stefanhaller enabled auto-merge April 29, 2025 12:58

stefanhaller merged commit 66caa25 into master Apr 29, 2025
14 checks passed

stefanhaller deleted the reduce-pipe-memory branch April 29, 2025 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory consumption of graph (pipe sets) #4498

Reduce memory consumption of graph (pipe sets) #4498

stefanhaller commented Apr 20, 2025

codacy-production bot commented Apr 20, 2025 •

edited

Loading

jesseduffield left a comment

stefanhaller commented Apr 23, 2025

jesseduffield commented Apr 23, 2025

stefanhaller commented Apr 23, 2025

stefanhaller commented Apr 23, 2025

jesseduffield commented Apr 23, 2025

codacy-production bot commented Apr 27, 2025 •

edited

Loading

stefanhaller commented Apr 27, 2025

jesseduffield left a comment

jesseduffield Apr 29, 2025

stefanhaller Apr 29, 2025

jesseduffield Apr 29, 2025

jesseduffield left a comment

Reduce memory consumption of graph (pipe sets) #4498

Reduce memory consumption of graph (pipe sets) #4498

Conversation

stefanhaller commented Apr 20, 2025

codacy-production bot commented Apr 20, 2025 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Footnotes

jesseduffield left a comment

Choose a reason for hiding this comment

stefanhaller commented Apr 23, 2025

jesseduffield commented Apr 23, 2025

stefanhaller commented Apr 23, 2025

stefanhaller commented Apr 23, 2025

jesseduffield commented Apr 23, 2025

codacy-production bot commented Apr 27, 2025 • edited Loading

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Footnotes

stefanhaller commented Apr 27, 2025

jesseduffield left a comment

Choose a reason for hiding this comment

jesseduffield Apr 29, 2025

Choose a reason for hiding this comment

stefanhaller Apr 29, 2025

Choose a reason for hiding this comment

jesseduffield Apr 29, 2025

Choose a reason for hiding this comment

jesseduffield left a comment

Choose a reason for hiding this comment

codacy-production bot commented Apr 20, 2025 •

edited

Loading

codacy-production bot commented Apr 27, 2025 •

edited

Loading