Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to stop GC wasting CPU time on long-lived objects (GC generations or something else) #17131

Open
dktapps opened this issue Dec 12, 2024 · 2 comments

Comments

@dktapps
Copy link
Contributor

dktapps commented Dec 12, 2024

Description

PHP's GC is currently unsuited to long-running applications because of the lack of GC generations.

Any application which has long-lived objects with many descendents (such as PocketMine-MP's Server) will currently experience significant performance hits from GC runs due to the expense of repeatedly scanning complex dependencies of these objects.

I haven't dived deep into this problem, but I did verify that just the Server alone winding up in the GC root buffer in PocketMine-MP (which it should never do, being an object that stays alive for almost the entire process) costs 1.7 ms of GC time on a 12700k. An average GC run in PM during normal operation costs somewhere in the ballpark of 15ms, despite creating almost zero cycles.

This is a massive performance impact to PM on account of its update cycle taking place every 50 ms, so a 15ms GC pause is serious business.

Runtimes like .NET and Java have GC generations to avoid this problem, which avoid frequently scanning complex long-lived objects.

I understand that the benefit to PHP is relatively small, given that the primary use case is to serve short-lived web requests, but implementing generations in GC would positively affect performance of long-lived applications.

@arnaud-lb
Copy link
Member

arnaud-lb commented Dec 13, 2024

Do you have runable benchmarks to share? This would help to test GC improvements in the future.

The problem you describe is usually mitigated by the adaptive threshold: GC collections are triggered less often when they collect too few nodes. Basically the frequency of GC collections is a function of the frequency at which garbage cycles are created (and their size). However this a simple heuristic, and it doesn't take into account the cost of collections / the size of the scanned graph. Taking these into account would probably improve throughput in your case (but not pause time).

But I agree that switching to a generational GC may be worth a try.

Are you aware of generational cycle collection algorithms? I did some quick research in the past and wasn't able to find any.

There are systems with ref counting and a generational GC in the wild, but they use a tracing collector, not just a cycle collector (e.g. Python I believe).

Ref counting + generational tracing may still be beneficial compared to ref counting + non-generational cycle collector.

It would be nice if using a tracing collector allowed us to remove reference counting, but we can not as we rely on it for array CoW semantics (also structs potentially), at least, and people may rely on the predictable timing of destructor calls (in the absence of cycles) for RAII-like behaviors.

It should be noted that the implementation of a generational GC is not trivial as it requires to keep track of when inter-generational references are created, which potentially adds overhead to every write operation.

@dktapps
Copy link
Contributor Author

dktapps commented Dec 13, 2024

Do you have runable benchmarks to share? This would help to test GC improvements in the future.

I haven't, no. Unfortunately I can only tell you about my experience with long-running server applications.

The problem you describe is usually mitigated by the adaptive threshold: GC collections are triggered less often when they collect too few nodes.

Yes, but there are still significant pauses whenever the GC does kick in.

In addition, since GC_THRESHOLD_TRIGGER is fixed, the threshold will not change if the collector collects more than 100 cycles. This is a very easy number to hit with short-lived objects.

This creates a big incentive to manually destroy cycles to avoid costly GC runs. IMO this shouldn't be necessary considering that most shortlived cycles are likely cheap to clean up.

Acyclic objects also cause unnecessary collections, although these don't count towards cycles collected, so the threshold does mitigate these.

A lower-effort alternative to GC generations which would still give some of the benefits would be to allow user code to explicitly declare objects for the GC to skip, either by something like #[IgnoredByGC] or perhaps a gc_ignore(object $object) function. I attempted to implement the latter in an extension, but it didn't produce the desired effect due to the GC not respecting GC_NOT_COLLECTABLE when scanning object referents.

@dktapps dktapps changed the title GC generations A way to stop GC wasting CPU time on long-lived objects (GC generations or something else) Dec 13, 2024
dktapps added a commit to pmmp/PocketMine-MP that referenced this issue Dec 15, 2024
This PR replicates the mechanism by which PHP's own GC is triggered: using a dynamically adjusted threshold based on the number of roots and the number of destroyed cycles. This approach was chosen to minimize behavioural changes.

This currently only applies to the main thread. Doing this for other threads is a bit more complicated (and in the case of RakLib, possibly not necessary anyway).

By doing this, we can get more accurate performance profiling. Instead of GC happening in random pathways and throwing off GC numbers, we trigger it in a predictable place, where timings can record it.

This change may also produce minor performance improvements in code touching lots of objects (such as `CraftingDataPacket` encoding`), which previously might've triggered multiple GC runs within a single tick. Now that GC runs wait for `MemoryManager`, it can touch as many objects as it wants during a tick without paying a performance penalty.

While working on this change I came across a number of issues that should probably be addressed in the future:

1) Objects like Server, World and Player that can't possibly be GC'd repeatedly end up in the GC root buffer because the refcounts fluctuate frequently. Because of the dependency chains in these objects, they all drag each other into GC, causing an almost guaranteed parasitic performance cost to GC. This is discussed in php/php-src#17131, as the proper solution to this is probably generational GC, or perhaps some way to explicitly mark objects to be ignored by GC.
2) World's `blockCache` blows up the GC root threshold due to poor size management. This leads to infrequent, but extremely expensive GC runs due to the sheer number of objects being scanned. We could avoid a lot of this cost by managing caches like this more effectively.
3) StringToItemParser and many of the pocketmine\data classes which make heavy use of closures are afflicted by thousands of reference cycles. This doesn't present a major performance issue in most cases because the cycles are simple, but this could easily be fixed with some simple refactors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants