Reading for 10/24: Unified GC Theory #391

obhalerao · 2023-10-16T18:11:30Z

obhalerao
Oct 16, 2023

Hey, everyone,

This is the discussion thread for the Unified Theory of Garbage Collection discussion on Tuesday, October 24. The paper is here!

Please post your thoughts, comments, and questions below before the discussion.

20ashah · 2023-10-23T17:35:22Z

20ashah
Oct 23, 2023

I have done most of my coding in Java, and so the extent that I knew about garbage collection prior to the lesson and this paper was that Java automatically takes care of it for me and that I didn't have to worry about it. Learning about the different garbage collection methods was really interesting at a high level even though the details of some of the algorithms in the paper were a bit confusing for me. After learning this, I did have a general question about garbage collection. In languages like C/C++ there is no automatic garbage collection like in Java, but it is rather up to the user to prevent memory leaks. Is there a particular reason / situation where this would be preferable over Java's automatic garbage collection? Is there a situation in which we would actually want to manually alloc / free memory rather than have it all be done behind the scenes for us? This is a little off topic to the paper, but it was something that I was curious about while reading. Something that came to mind initially is if we were dealing with a real time operating system, in which case having deterministic behavior from manual memory management is potentially better than an automatic garbage collector where we don't know the specifics of how it is being performed. Besides this, are there any other big reasons?

10 replies

Enochen Oct 24, 2023

I can imagine security to also be a reason you might want tighter control over how/when you free things. For example, you may not want to work with a collector that delays or batches up frees when dealing with sensitive information. Instead, you might want to immediately get rid of that data after its use and even perhaps scramble it to make sure it can no longer be recovered from RAM. Of course, I don't see any reason why these kinds of properties can be implemented as part of a garbage collector as well, but I would think it's possibly easier to build trust in a (reasonably small) system that has this implemented manually.

emwangs Oct 24, 2023

I love C++! I do think the biggest advantage of C++ is that you have full control over memory as a resource, and you can choose when and where to allocate things as needed. I feel like for not super hyper-optimized code, there really is no reason to be manually managing memory at all. C++ has the previously aforementioned smart pointers to assist, and you can essentially code without thinking of memory management and garbage collection. While a shared_ptr is the most flexible and models a "normal pointer" the closest, it has overhead not only because it stores reference count, but because it also guarantees some thread-safe behavior. Thus adding new shared pointer references requires incrementing and decrementing the reference count atomically, and in multi-threaded programs I think this overhead actually becomes pretty significant. Thus the point of still using new and delete even when unique_ptr and smart pointers is encouraged is to squeeze every last drop of performance -- but the above discussion did raise a question to me. Very few programs need to be super-hyper-optimized, and most of the time, safety guarantees trump any performance gain from using custom allocators rather than given ones. When is prioritizing safety guarantees over performance gains the more prudent choice?

jiahanxie353 Oct 24, 2023

Saw a bunch of discussion around C++ above and I love C++ as well!
One thing I've noticed when I'm learning LLVM is that LLVM seems to be not using smart pointer a lot. If we look at the source files for one of the most important class, Value, there is only one occurrence of unique_ptr; and for the StringMap class, there's no smart pointer at all. I'm curious why is this the case.

zachary-kent Oct 24, 2023

Many LLVM APIs are quite old; newer APIs, like ORC do indeed make extensive use of smart pointers.

sampsyo Oct 24, 2023
Maintainer

FWIW, there is research out there on quantifying the cost (in space and time) of GC. Here's my favorite:
https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In that study, the overhead is surprisingly high:

These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational collector with a non-copying mature space matches the performance of reachability-based explicit memory management. With only three times as much memory, the collector runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection degrades performance by nearly 70%.

AliceSzzze · 2023-10-23T21:37:20Z

AliceSzzze
Oct 23, 2023

This paper has given me new perspectives on the two classic flavors of garbage collection. It hadn't occurred to me that tracing is the greatest fix-point and ref-counting is the least fix-point of the assignment of reference counts ρ(v) to vertices, but it makes a lot of sense when you consider how tracing assumes everything is dead until proven live, and vice versa for ref-counting. (The exact definition of "greatest"/"least" here seems a little vague since the flow functions are not the same.) The difference between the two points is the cyclic garbage, and there can be many fix-points between them. The paper demonstrates this by illustrating how the two methods can be combined and tweaked in a variety of other collectors, even though some of these hybrids are not practical (e.g. partial tracing). Depending on how ref-counting & tracing are combined, the resultant algorithm can suffer/benefit from the original algorithm's pros and cons in different places. For example, intra-car cycles in the train can be collected because they are traced, but not the inter-car or inter-train ones because they are reference-counted.
The trial deletion algorithm sounds like more work than normal tracing, unless the heuristics actually reduce the potential candidates for cyclic garbage significantly. This paper says that it's a widely used solution but is it?
Unlike the other papers we have read, this paper doesn't compare the different collectors with benchmarks. I guess it might not be possible or easy to swap out a language's native garbage collector and replace it with a custom collector.

5 replies

willwng Oct 24, 2023

I did a quick skim through Ref. 6, where they describe one of their algorithms for finding candidates; they claim that it runs in O(V + E) time and "competitive with tracing with tracing garbage collectors." I'm not too sure I follow exactly what the algorithm does, and it seems pretty sophisticated since it involves node coloring with 7 different possible colors. However, it looks a big improvement is by taking types into consideration (reminds me of the type-based alias analysis) - if we know that some objects (e.g., scalars, arrays) can never be part of cycles then there's no need to check these. Seems like keeping type information around is very handy.
I'm also curious how difficult it would be to hot-swap out garbage collectors? I assume swapping between a GC that needs to run for every root modification versus one that runs periodically would cause the biggest change (I think we would then need to know when to do these periodic scans) - in that case how would be know what would be a fair comparison?

collinzrj Oct 24, 2023

On the third point, I find myself curious about whether it's easy to switch the garbage collectors of a language. It seems that most languages have a built-in garbage collector and the users can't change the garbage collector themselves. One mainstream language that allows user to choose garbage collectors themselves is java, and it seems that most of them are tracing based. As claimed in this paper, all practical garbage collectors are a combination of tracing and reference counting methods. It might be interesting to list the garbage collectors of java on a spectrum from tracing based to reference counting based and benchmark their performance.
Why it isn't the common practice for a language to make it garbage collector modular? Is the garbage collector design closely-coupled with the design of a language?

SanjitBasker Oct 24, 2023

I know that Python documents its GC along with ways of interacting with it like weak references. I think that adding reflection features to a language would make it tougher to provide a modularized GC interface, because the reflection mechanisms are probably tailored to the specific implementation.

Maybe one could also try to abstract the state of a garbage collector and use it to define a common interface, like the over/underestimates of the reference counts described in the paper. My intuition is that actually supporting the interface might be too costly for a secondary, optimized GC (or a GC that the user writes which is tuned for their programs), but it could be a really cool as a debugging tool or a way of assessing the cost models described in this paper.

rcplane Oct 24, 2023

Borrowing from the example of Go garbage collection, a language specification may be explicitly vague about garbage collection requirements to allow for modular swapping of garbage collection in a language runtime implementation.
However, due to the complexity of the task of implementing and verifying a new garbage collector across a representative benchmark suite, application developers might rather spend effort to focus on the more proximal task of managing memory manually (down in the C / cgo layer) for their own application whose value lifetime needs can be well understood and annotated to achieve associated performance benefits up to 5-10x throughput.

sampsyo Oct 24, 2023
Maintainer

@collinzrj, it's a good point that JVMs are the only mainstream runtimes I can think of that have multiple GC options. The Oracle JDK has many GCs, and they all have a million configuration options. This may be because the JVM has historically been the most "serious" platform out there about high-performance tracing GC... most other runtimes have historically just stuck with one early decision they made and not attempted a wholesale replacement. With the exception of JavaScript runtimes, which are typically not configurable for obvious reasons (users do not want to reconfigure their browser for different webpages!).

bennyrubin · 2023-10-24T00:18:55Z

bennyrubin
Oct 24, 2023

This paper reminded me of a systems paper I read (On the Duality of Operating System Structure) that compared shared variable based concurrent processes with message passing systems. The paper introduced the idea that two styles are duals of each other, despite being treated as vastly different approaches by the community. So often research feels like incremental improvements to the same approach, it is satisfying to see a paper that takes a new fundamental look at the problem, offering some insight that the community was lacking before - or perhaps just a new perspective.

I was especially intrigued by the conclusion that as each garbage collection approach gets optimized, they take on more characteristics of each other. Despite the (occasionally) heinous number of definitions in the paper, I liked the formalisms they introduced allowing the “fair” comparison of seemingly different gc approaches, essentially laying the groundwork for the community to formally study and compare them in the future.

1 reply

alifarahbakhsh Oct 24, 2023

A true student of 6410!

Joking aside, this paper reminded me of the paper you mentioned. The difference is that this paper actually shows a duality grounded in an intuitive mathematical formulation, whereas the other one is more handwavy and informal. I really like the fixed-point formulation presented here, despite the fact that I am usually doubtful of the depth of the insights presented in such "equivalence" papers. It is just important not to overestimate things. I read the equivalence here as saying "you are solving a graph problem that needs some metadata, so treat it formally as a graph problem". The value lies in the scholarly activity of categorizing things and pruning knowledge, not in naming intuitive concepts as "matter" and "antimatter".

keikun555 · 2023-10-24T00:43:36Z

keikun555
Oct 24, 2023

The paper writes that their methodology "may help enable the dynamic construction of garbage collection algorithms." I wonder if there's been work that uses this methodology to automatically generate garbage collectors given a program at runtime, or less ambitiously, given program characteristic parameters.

6 replies

bcarlet Oct 24, 2023

I had a similar question. In particular, I wonder how useful their cost models are in practice. I've always assumed that evaluating a garbage collection strategy was an inherently empirical exercise due to the dependence on the dynamic behavior of the program. The authors claim that their framework could be used to tune collection strategies to applications, but it's not clear to me how. Although their framework does allow one to more systematically explore the space of tradeoffs, so some sort of naive auto-tuning seems possible.

sampsyo Oct 24, 2023
Maintainer

Yeah, I'm also not sure how feasible this kind of analytical study really is. Clearly, the design of a GC has to be at least mostly empirical, since it depends so deeply on ineffable aspects of the workloads you want to run. But I do wonder if this kind of thinking could at least help navigate the design space... that is, once you have pinned down some important aspect of the workload you want to optimize for, it could maybe instruct you about what GC tweak to try next.

keikun555 Oct 24, 2023

On a related note, I wonder if in a closed world assumption it's possible to statically analyze where we can/should add a manual free. This feels like a hard problem without static garbage collection like in rust, so maybe something like "may-free" in liu of alias analysis.

Edit: it's probably better to do "may-reference" instead of may-free so that we can opportunistically free the object as @sampsyo writes.

sampsyo Oct 24, 2023
Maintainer

@keikun555, what's common to do here is to do this opportunistically, when the compiler is "lucky" with aliasing results and such. This gives it the freedom to always fall back on dynamic memory management. (We talked briefly about Swift in class on Thursday. Swift cares a lot about doing this as much as possible, in what it calls "automatic reference counting" (ARC).)

keikun555 Oct 24, 2023

It would be funny to see a programming language where you can specify what kind of garbage collection you'd want to use for each assignment call so that we can "Divide memory (heap, stack, and global variables) into a set of partitions, within which different strategies may be applied." Though perhaps impractical, this would give programmers more control over what gets collected in what way.

Or maybe a DSL for garbage collection, and we can maybe synthesize a (near) optimal gc given a program?

evanmwilliams · 2023-10-24T00:59:30Z

evanmwilliams
Oct 24, 2023

Interesting paper! I'm not super familiar with optimization theory, but I think I would've liked to see a bit more rigor in the claim that the two problems are duals. While it was interesting to see a qualitative comparison of the two approaches, and the analogy of "matter" versus "anti-matter" was also very useful, I think most theorists would liked to see a bit more mathematical certainty. That being said, I liked the space/time analyses of the different collectors.

I particularly enjoyed the discussion on the unified approach to garbage collection (e.g. taking tips from both approaches is bound to work better than purely implementing one or the other). I especially found it interesting in the context of the Train Algorithm. The algorithm has incremental collection (similar to reference counting) which allows it to achieve short pause times. Further, the algorithm is able to handle long-lived references because it is structured around moving, inter-car references. Even though it's not the primary method used for garbage collection today, I think it does a pretty good job of highlighting the fact that modern garbage collectors need both tracing and reference counting aspects to outperform what our current capabilities are.

This paper also made me think about what other problems in compilers are duals of each other. This made me think of problems like register allocation and register spilling which also can be viewed as duals. A lot of the problems in compilers are NP-Hard or worse, so thinking about it from an optimization theory perspective could introduce novel insights into how to better solve these problems.

2 replies

ryanwmao Oct 24, 2023

I very much agree with your comment; while the paper does an excellent job of presenting a qualitative comparison and analogy, a more formal mathematical underpinning to the dual description would have added extra weight to the argument. Nonetheless, it serves as a valuable starting point, and I'm curious whether this idea has been explored further.

The discussion on the unified approach to garbage collection is indeed a highlight of the paper. The Train Algorithm's ability to combine incremental collection with short pause times and effective handling of long-lived references is a testament to the power of integrating tracing and reference counting aspects. It's a great reminder that there's no one-size-fits-all solution in garbage collection, and modern collectors often benefit from a hybrid approach.

jiahanxie353 Oct 24, 2023

I also really enjoy the elegant approach of viewing these two fundamental concepts as unified duals. I believe this unified view point will save a lot of effort for researchers and engineers who have been focusing on one side of the dual, since they can start to look at the counterpart and learn from the progress/blocks. I thought it might be interesting if we make an analogy to the computational tractability, where if the "matter" part is intractable, the "anti-matter" part will also be intractable; whereas if the "matter" part is tractable, we can "reduce" the "anti-matter" to "matter" so that both problems can be solved. It's also great to see that they can extend to multi-heap, and that they provide insight into space-time tradeoffs, which can be a great guideline to design collectors methodologically.
The only part I'm not convinced is the cost analysis part, in which they made some assumptions that I'm confused about, such as "the allocation rate and fraction of the heap that is garbage are constants". And it'd be great if they can present case studies with quantitive results in this section as well. But it makes sense since this paper focuses more on theory and algorithm.

vivianyyd · 2023-10-24T01:08:09Z

vivianyyd
Oct 24, 2023

Reading this paper made me wish reference counting and tracing were always introduced as duals! It is a pretty idea.
I liked that the paper described how many optimizations on the two basic approaches are somewhere in the middle. Use of the word "dual" makes me wonder if the authors had ideas about a lattice containing all of these algorithms. It seems hinted towards, but never formalized. It would make sense for such a lattice to use orderings imposed by the 5 quantities relating to cost analysis.

3 replies

jdroob Oct 24, 2023

I agree that reference counting and tracing being introduced in this way is an elegant way to think of these two garbage collection algorithms. I always find it both surprising and cool when two seemingly disparate mathematical objects turn out to be duals of each other. Not the most important takeaway from the paper, but can reference counting still be thought of as the dual of tracing in the case of a mutator program that has cycles in its heap graph? Asking this question based on the below excerpt from the paper:

Formally, in the absence of cycles, reference counting computes the graph complement of tracing.

sampsyo Oct 24, 2023
Maintainer

Super interesting idea to consider actually ordering a bunch of GC algorithms in some sort of lattice. I really wonder what that relation would look like!

sampsyo Oct 24, 2023
Maintainer

@JohnDRubio, good point about this:

Not the most important takeaway from the paper, but can reference counting still be thought of as the dual of tracing in the case of a mutator program that has cycles in its heap graph?

I think the duality they are most emphatic about here is about fixed points (least vs. greatest fixed point of a "system of equations"). These two fixed points are almost graph complements (i.e., complements except for cycles).

stephenverderame · 2023-10-24T01:16:14Z

stephenverderame
Oct 24, 2023

The theory presented in the paper, and the cookie-cutter-ish style of the way the algorithms were presented seemed reminiscent of dataflow analysis. It seems possible that a language could have a really customizable garbage collector that allows users to specify things like write barriers, recurse conditions, tracing start points, etc. to tailor a GC towards optimizing the things the user cares most about.

The Dataflow analysis analogy also made me wonder if there was a similar paper that took a bunch of existing analyses and discovered a similar relationship or if analyses kind of started out as being part of a unified framework.

1 reply

obhalerao Oct 24, 2023
Author

After doing some digging, I found this paper from 1976 that seems to do just that: cite existing dataflow analyses as motivation for providing a unified general dataflow analysis framework. I didn't fully vet that this was the first paper that provided such a unified framework, but given the time and the general state of the field of compilers at that point, it seems likely. (Sidenote: the authors provide pictures of their implementation of the general framework, which was written in PL/I, which I found to be quite intriguing to read.)

he-andy · 2023-10-24T01:48:05Z

he-andy
Oct 24, 2023

I think it's really interesting that the two views of garbage collection presented in class actually converge to similar results as presented in the paper. However, this actually isn't too surprising considering when considering optimizations like delayed reference counting, which makes RC extremely similar to Tracing GCs in the context of "stopping the world" to perform GC. I wonder it could be viable to write compilers that are able to statically analyze the code (via some heuristic) or runtimes that can dynamically balance tradeoffs of tracing/rc collectors via different parameterizations of the hybrid model presented in the paper. Could this make GC'd languages more viable in embedded systems applications?

2 replies

xalbt Oct 24, 2023

The paper lists "Scheduling Garbage Collection in Embedded Systems" by Roger Henriksson as a reference, which is Henriksson's doctor thesis in the subject. Having briefly read parts of it, it suggests that garbage collectors are very possible in embedded systems without too much overhead. It suggests that a scheduling strategy can be constructed using the characters of control systems that ensures critical processes in embedded systems are never interrupted by garbage collection activities. In the results, the author implements the scheduling algorithm in a real-time environment that achieves predictable behavior with sub-millisecond worst-case delays, even in his non-optimized prototype garbage collector. This doesn't use techniques from Unified GC Theory paper, so it suggests that applying the proposed optimizations to this strategy can make the strategy and garbage collector even better and more reliable, and even reach viability in embedded systems.

rcplane Oct 24, 2023

Considering embedded systems and accelerators raises the point that wherever there is a cache in a (distributed) system, there is a memory management garbage collection problem. In particular, I was interested in looking at how Nvidia GPU CUDA PyTorch handles this problem, since once a deep learning training program is written it should be amenable to static analysis and optimization. Unfortunately, it seems like general practice, courtesy of a data scientist writing this June on Medium , includes at least 8 different ways to handle GPU out of memory errors but none include static analysis. They do recommend a semi-automated approach of explicitly deleting some variables and manually invoking a garbage collection pass. Unsatisfied, I thought the PyTorch Tuning Guide for GPU might have recommendations regarding tools for automatic memory management, but I was disappointed. I was able to find that CUDA memcheck can be applied to both CUDA C/C++ and Python programs such as this example. I am left wondering if this represents an opportunity to improve a common machine learning framework prized for rapid iteration and productivity with tighter efficiency and better programming experience from smarter memory handling much as numba can compile Python programs into faster executables.

zachary-kent · 2023-10-24T02:26:13Z

zachary-kent
Oct 24, 2023

To be honest, I found this paper and the way it presented its ideas a bit dry. I thought it was compelling up to and including the point where it discusses how tracing computes the least fixed point of the "garbage" function, and reference counting computes the "greatest," but after that it seems the paper became more of a survey on different garbage collection techniques. Indeed, there were references to how the "macro-nodes" in generational & multi-heap garbage collectors are reference-counted, but these later arguments felt more philosophical than grounded. More generally, I don't believe statements like the following make a very well-established argument:

In general, the presence of a write barrier is an indication of some sort of reference count-like behavior in an algorithm.

Maybe it's that I just don't find the duality of tracing and reference counting a particularly enthralling problem in of itself, but I also think that this duality could've led to deeper insights if it was further formalized as @vivianyyd discussed. In the end, I'm not sure how much of this intuitive "duality" bought in terms of results for the paper.

1 reply

matth2k Oct 24, 2023

I read section 3 of the paper pretty close, and I agree that the punchline was a little underwhelming. But I guess this paper still does an okay job of formalizing the intuition that tracing and reference counting will converge to the same thing, even if it is hand-wavy. Getting that part right is pretty important for motivating the feasibility of hybrid garbage collectors.

In any case, I'm surprised there were no figures to help visualize the space/time trade-offs of tracing versus reference counting. If every cost component is just abstracted away as linear in time or linear in space, its hard to get a sense of where the tipping point actually lies in the design space. I did not get a good intuition on this from the paper.

Reading for 10/24: Unified GC Theory #391

Replies: 9 comments · 31 replies

sampsyo Oct 24, 2023 Maintainer

sampsyo Oct 24, 2023 Maintainer

sampsyo Oct 24, 2023 Maintainer

sampsyo Oct 24, 2023 Maintainer

sampsyo Oct 24, 2023 Maintainer

sampsyo Oct 24, 2023 Maintainer

obhalerao Oct 24, 2023 Author

Replies: 9 comments 31 replies

sampsyo Oct 24, 2023
Maintainer

sampsyo Oct 24, 2023
Maintainer

sampsyo Oct 24, 2023
Maintainer

sampsyo Oct 24, 2023
Maintainer

sampsyo Oct 24, 2023
Maintainer

sampsyo Oct 24, 2023
Maintainer

obhalerao Oct 24, 2023
Author