Avoiding passing size and alignment to dealloc when allocator doesn't need them #120

kornelski · 2024-01-31T15:09:14Z

Currently Rust requires passing Layout to the deallocator. However, Rust is frequently used with the System allocator that wraps libc::free, which doesn't need this information. This means that often all the extra size and layout tracking on the Rust side is wasted effort. This is a tiny code bloat, but deallocation code paths are nearly everywhere, so it adds up.

https://rust.godbolt.org/z/K3xTd9MEr

I wonder if this could be somehow made optional in the Allocator trait?

The text was updated successfully, but these errors were encountered:

Amanieu · 2024-02-01T05:10:15Z

The counter-argument is that since this information is always available at compile-time, why shouldn't we pass it to the allocator? This is an important optimization for allocators since the size of an allocation may not be stored near the allocation itself.

C++ actually added sized deallocation functions in C++14 (https://open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3536.html) for this reason, but they have the problem of dealing with backwards-compatibility.

zakarumych · 2024-02-01T09:02:14Z

It's only available in compile-time for things like Box and Vec where type and count can be used to reconstruct same Layout.
This is not universally true though, sometimes data structure has to carry Layout. Typically size is still required for some other operations, but alignment is only for deallocation.

But IMO it's counting pennies. I can't imagine a situation where data structure is required to keep track of allocation alignment causing problems.

kornelski · 2024-02-01T11:54:30Z

I know it's very useful for some allocators, but not for all of them. Specifically, for the very common case of executables using the System allocator based on libc malloc/free it's not an optimization, and only redundant unused information that makes Rust executables larger.

Memory overhead for collections isn't a problem (they have to track sizes anyway), but it's a problem of code bloat. Deallocation calls in Rust are ubiquitous due to unwinding and use of ?, and it adds up. Even when the layout is a compile-time constant, it doesn't get optimized out. It still needs to be materialized and passed as an argument at run-time. Freeing a Box<T> takes twice as many instructions as free().

I'd expect dyn Allocator to always require Layout. However, how about borrowing idea from std::mem::needs_drop, and allowing allocator users to generate smaller drop code where possible. It could be something like an associated const A::NEEDS_LAYOUT (or const fn where Self: Sized? Or maybe just specialization on System?) that would allow code such as:

fn drop(&mut self) {
    if A::NEEDS_LAYOUT { 
        A::dealloc(self.ptr, Layout::new(self.size, etc)) 
    } else { 
        A::layoutless_dealloc(self.ptr) 
    }
}

zakarumych · 2024-02-01T16:04:06Z

But your example passes layout into __rust_dealloc call.
It would require Global to be aware of NEEDS_LAYOUT of whatever allocator is set as global one.

Amanieu · 2024-02-11T08:45:17Z

I'd expect dyn Allocator to always require Layout. However, how about borrowing idea from std::mem::needs_drop, and allowing allocator users to generate smaller drop code where possible. It could be something like an associated const A::NEEDS_LAYOUT (or const fn where Self: Sized? Or maybe just specialization on System?) that would allow code such as:

Doesn't this just boil down to inlining A::dealloc? That would allow the compiler to see that the layout is unused.

Regarding the system allocator we actually do make use of the alignment information for deallocation in some cases, for example on Windows: https://github.com/rust-lang/rust/blob/9aa232ecc7bb006a1fad404f437b049482021a3a/library/std/src/sys/pal/windows/alloc.rs#L210.

I still think that we should keep the layout for deallocation, but I could be convinced otherwise if you can show a significant size saving in a real program (e.g. ripgrep).

ChrisDenton · 2024-02-12T07:34:15Z

Regarding the system allocator we actually do make use of the alignment information for deallocation in some cases, for example on Windows: https://github.com/rust-lang/rust/blob/9aa232ecc7bb006a1fad404f437b049482021a3a/library/std/src/sys/pal/windows/alloc.rs#L210.

To be precise, it doesn't need alignment info per se. The issue is that it's like two allocators in one. The first simply wraps HeapAlloc/HeapFree the second additionally adds inline metadata. So it's only necessary to store one bit to tell it which dealloc to use (i.e. with or without the inline header). But admittedly there's no good place to store that bit, other than testing the alignment again.

kornelski · 2024-02-12T11:59:12Z

layout.alloc is almost always a compile-time constant, so even the Windows' two-allocators-in-one could potentially be optimized, if only the allocator could have some inlineable method at the deallocation call site.

Unfortunately, the special __rust_dealloc indirection makes it hard ;(

The implementation of #[global_allocator] relies on linking global symbols at the linker level, so the crates using Global allocator don't know what they're calling.
__rust_dealloc doesn't seem to be inlined, even with LTO. I've found an LLVM discussion on the same issue. They would also like to optimize allocator calls for the specific allocator implementation, but it seems the problem is that function attributes that make alloc/dealloc functions special are lost when inlining.

So I think for this to happen one of these needs to change. For example,

What if the custom global allocator wasn't an attribute you set anywhere, but an explicit crate dependency, where liballoc and everything else depends on user-provider allocator-implementing crate? I think that would enable normal monomorphisation and cross-crate inlining. Probably tricky from ABI perspective to avoid recompiling liballoc?
I guess there could be an extra LLVM pass that happens after allocation-aware passes, which removes the alloc-family attributes from __rust_alloc functions and redoes inlining and constant propagation. That'd be expensive, but maybe ok if reserved for opt-level=3?

the8472 · 2024-03-15T11:01:43Z

Tangentially, I think it would be useful if allocators had an associated constant of flags describing their properties, such as whether they support alignment-changing realloc (posix realloc doesn't, jemalloc does), whether their realloc avoids memcpy (doing virtual memory schenanigans), whether deallocation does anything (vs. an arena that drops all allocations at once) and similar things.

Containers can then choose to optimize their behavior based on those properties.

the8472 mentioned this issue Apr 25, 2024

Const flags describing allocator properties #124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding passing size and alignment to dealloc when allocator doesn't need them #120

Avoiding passing size and alignment to dealloc when allocator doesn't need them #120

kornelski commented Jan 31, 2024

Amanieu commented Feb 1, 2024

zakarumych commented Feb 1, 2024

kornelski commented Feb 1, 2024 •

edited

Loading

zakarumych commented Feb 1, 2024 •

edited

Loading

Amanieu commented Feb 11, 2024

ChrisDenton commented Feb 12, 2024

kornelski commented Feb 12, 2024

the8472 commented Mar 15, 2024

Avoiding passing size and alignment to dealloc when allocator doesn't need them #120

Avoiding passing size and alignment to dealloc when allocator doesn't need them #120

Comments

kornelski commented Jan 31, 2024

Amanieu commented Feb 1, 2024

zakarumych commented Feb 1, 2024

kornelski commented Feb 1, 2024 • edited Loading

zakarumych commented Feb 1, 2024 • edited Loading

Amanieu commented Feb 11, 2024

ChrisDenton commented Feb 12, 2024

kornelski commented Feb 12, 2024

the8472 commented Mar 15, 2024

kornelski commented Feb 1, 2024 •

edited

Loading

zakarumych commented Feb 1, 2024 •

edited

Loading