Description
Summary
When a closure is defined in a generic context like a generic function or a method of a generic class, its type includes, in some sense or another, the parent's generic parameters. If the closure doesn't depend on some of those generic parameters, it can lead to the instantiation of mono items which are functionally identical, but distinct by virtue of false dependencies on unused generic parameters of the parent. I'll call these mono items "duplicates".
These duplicates create needless work and code bloat. At the very least, they go through the codegen and optimization pipeline, and if not optimized away, are included in the binary. I imagine there is other overhead, such as creation of more distinct types.
Example 1
fn foo<T: std::fmt::Display>(t: T) {
let _ = vec!["hello".to_string()].iter().map(|x| x);
println!("{}", t);
}
fn main() {
foo(0i32);
foo(0i64);
}
The map
call is passed a closure |x| x
which does not depend the parent function's generic parameter T
. The code for that closure is identical in both foo::<i32>
and foo::<i64>
, so we should be able to only generate one map
instantiation and Map
type (the return type of map
, which is generic over the closure that map
was passed), and use it in both foo
instances. Yet this creates two map
instantiations, and two Map::new
instantiations, because the closure type is polluted by the parent's T
parameter, which, again, the closure doesn't use in any way.
Excerpt from -Z print-mono-items=lazy
:
MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]
MONO_ITEM fn <std::slice::Iter<std::string::String> as std::iter::Iterator>::map::<&std::string::String, [closure@src/main.rs:2:50: 2:55]> @@ lzylnwmetevawv6[External]
MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]
MONO_ITEM fn std::iter::Map::<std::slice::Iter<std::string::String>, [closure@src/main.rs:2:50: 2:55]>::new @@ 290s40pyalavf195[External]
(I don't know why, but there weren't any mono items in the output for the closure itself in this example.)
If we actually did anything interesting with the Map
s, it would create yet more duplicate instantiations. And if the closure captured anything by value that needed dropping, it would create duplicate drop_in_place
instantiations for the closure and Map
types.
Example 2
This isn't a rare event. Here's an example from the standard library which was quickly and easily found. This case is different, because the parent generics come from the struct for which the function is implemented:
impl<K, V, S> PartialEq for HashMap<K, V, S>
where
K: Eq + Hash,
V: PartialEq,
S: BuildHasher,
{
fn eq(&self, other: &HashMap<K, V, S>) -> bool {
if self.len() != other.len() {
return false;
}
self.iter().all(|(key, value)| other.get(key).map_or(false, |v| *value == *v))
}
}
The closure passed to map_or
really only depends on V
, yet it has false dependencies on K
and S
. Creating two HashMap
s with the same value type will cause duplicate mono items to be created for this closure and map_or
:
use std::collections::HashMap;
fn main() {
let m1 = HashMap::<String, u8>::new();
assert_eq!(m1, m1);
let m2 = HashMap::<i32, u8>::new();
assert_eq!(m2, m2);
}
Excerpt from -Z print-mono-items=lazy
:
MONO_ITEM fn <std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]
MONO_ITEM fn <std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0} @@ i6hou2d2a3r4xaf[External]
MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<i32, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]
MONO_ITEM fn std::option::Option::<&u8>::map_or::<bool, [closure@<std::collections::HashMap<std::string::String, u8> as std::cmp::PartialEq>::eq::{closure#0}::{closure#0}]> @@ 583fmf0zweyhx5vp[External]
Impact
Okay, so there are some duplicate mono items. Maybe even a lot in a real codebase. But is it impactful? I haven't measured, but I suspect it is. Take a look at the Rust and assembly for Option::map_or
(for the above closure) in debug mode, and consider the above example:
pub fn map_or<U, F: FnOnce(T) -> U>(self, default: U, f: F) -> U {
match self {
Some(t) => f(t),
None => default,
}
}
Assembly
<<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>>:
sub $0x58,%rsp
mov %rdi,0x18(%rsp)
mov %sil,%al
and $0x1,%al
mov %al,0x37(%rsp)
mov %rdx,0x38(%rsp)
movb $0x0,0x36(%rsp)
movb $0x0,0x35(%rsp)
movb $0x1,0x36(%rsp)
movb $0x1,0x35(%rsp)
mov 0x18(%rsp),%rcx
test %rcx,%rcx
setne %al
movzbl %al,%r8d
mov %r8d,%ecx
mov %rdx,0x10(%rsp)
mov %sil,0xf(%rsp)
/----- je <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4d>
| /-- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x4b>
/--|--\-X jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x60>
| \----> movb $0x0,0x36(%rsp)
| mov 0xf(%rsp),%al
| and $0x1,%al
| mov %al,0x27(%rsp)
/-----|-------- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
| | ud2
| \-------> mov 0x18(%rsp),%rax
| mov %rax,0x40(%rsp)
| movb $0x0,0x35(%rsp)
| mov %rax,0x28(%rsp)
| mov 0x28(%rsp),%rsi
| mov 0x10(%rsp),%rdi
| callq <<std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>
| mov %al,0xe(%rsp)
| /-- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x89>
| \-> mov 0xe(%rsp),%al
| and $0x1,%al
| mov %al,0x27(%rsp)
+-------------- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xac>
| /----------> testb $0x1,0x36(%rsp)
| | /-------- jne <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb5>
| | | /----> mov 0x27(%rsp),%al
| | | | and $0x1,%al
| | | | movzbl %al,%eax
| | | | add $0x58,%rsp
| | | | retq
| +--|--|--/-X jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
\--|--|--|--|-> testb $0x1,0x35(%rsp)
| | | \-- jne <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xaa>
\--|--|----- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x95>
\--\----X jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0x9c>
/--/-X jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xc0>
/--|--|-> testb $0x1,0x36(%rsp)
| | \-- jne <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb7>
| \----> mov 0x48(%rsp),%rdi
| callq <_Unwind_Resume@plt>
| ud2
| mov %rax,0x48(%rsp)
| mov %edx,0x50(%rsp)
\-------- jmp <<core::option::Option<&u8>>::map_or::<bool, <std::collections::hash::map::HashMap<alloc::string::String, u8> as core::cmp::PartialEq>::eq::{closure#0}::{closure#0}>+0xb9>
nopw 0x0(%rax,%rax,1)
(Wow, that's surprisingly complicated.)
In debug mode, this code is included in the binary twice, once for each closure. In release mode, the code may be optimized out, but it's still processed for some part of compilation. This was a trivial example. Imagine if map_or
was a complicated function!
If there are hundreds or thousands of duplicate mono items in moderate to large codebases, which seems quite possible, then this may have a noticeable impact on compile times and binary sizes.
Solution
If possible, we should make closure types depend only on the parent generics which they actually use.
The ClosureSubsts
comment explains why parent generics are included:
/// All right, you say, but why include the type parameters from the
/// original function then? The answer is that codegen may need them
/// when monomorphizing, and they may not appear in the upvars. A
/// closure could capture no variables but still make use of some
/// in-scope type parameter with a bound (e.g., if our example above
/// had an extra `U: Default`, and the closure called `U::default()`).
///
/// There is another reason. This design (implicitly) prohibits
/// closures from capturing themselves (except via a trait
/// object). This simplifies closure inference considerably, since it
/// means that when we infer the kind of a closure or its upvars, we
/// don't have to handle cycles where the decisions we make for
/// closure C wind up influencing the decisions we ought to make for
/// closure C (which would then require fixed point iteration to
/// handle). Plus it fixes an ICE. :P
See #27087, where this was introduced. #27086 may also be relevant. Here and here are FIXMEs for that issue, later removed.
I have a feeling the case where the parent generics come from a generic struct on which the function including the closure is defined (example 2) may be harder to solve.
@rustbot label T-compiler A-typesystem I-compiletime I-heavy