diff --git a/text/3519-arbitrary-self-types-v2.md b/text/3519-arbitrary-self-types-v2.md new file mode 100644 index 00000000000..6ffc600ba12 --- /dev/null +++ b/text/3519-arbitrary-self-types-v2.md @@ -0,0 +1,632 @@ +- Feature Name: `arbitrary_self_types` +- Start Date: 2023-05-04 +- RFC PR: [rust-lang/rfcs#3519](https://github.com/rust-lang/rfcs/pull/3519) +- Tracking Issue: [rust-lang/rust#44874](https://github.com/rust-lang/rust/issues/44874) + +# Summary +[summary]: #summary + +Allow types that implement the new `trait Receiver` to be the receiver of a method. + +# Motivation +[motivation]: #motivation + +Today, methods can only be received by value, by reference, or by one of a few blessed smart pointer types from `core`, `alloc` and `std` (`Arc`, `Box`, `Pin

` and `Rc`). + +It's been assumed that this will eventually be generalized to support any smart pointer, such as an `CustomPtr`. Since late 2017, it has been available on nightly under the `arbitrary_self_types` feature for types that implement `Deref` and for raw pointers. + +This RFC proposes some changes to the existing nightly feature based on the experience gained, with a view towards stabilizing the feature in the relatively near future. + +## Motivation for the arbitrary self types feature overall + +The Rust async work identified a need to allow `self` types of `Pin<&mut Self>` (and similar). At that time, certain types - `Pin`, `Rc`, `Box` etc. - became hard coded in stable Rust as valid `self` types. That's been sufficient for many use-cases including async Rust, but this special power is currently restricted to these hard-coded types. + +Since then, other use-cases have become clear where crates need to make their own smart pointer types with similar powers. + +One use-case is cross-language interop (JavaScript, Python, C++). In many cases, automatic code generation tools need to represent foreign language pointers or references somehow in Rust, and often, we want to call methods on such types. But, other languages' references can’t guarantee the aliasing and exclusivity semantics required of a Rust reference. For example, the C++ `this` pointer can't be practically or safely represented as a Rust reference because C++ may retain other pointers to the data and it might mutate at any time. + +What is a code generator to do? Its options in current stable Rust are poor: + +* It can represent foreign pointers/references as `&T`, with a virtual certainty of undefined behavior due to different guarantees in different languages +* It can represent foreign pointers/references as `*const T` or `*mut T` but can't attach methods. +* It can represent foreign pointers/references as a smart pointer type (`CppRef` or `CppPtr`) but can't attach methods. + + With "arbitrary self types", smart pointer types can be created which obey foreign-language semantics and yet allow method calls: + +```rust +#[repr(transparent)] +#[derive(Clone)] +/// A C++ reference. Obeys C++ reference semantics, not Rust reference semantics. +/// There is no exclusivity; the underlying data may mutate, etc. +/// (This is an abridged example: a real CppRef type would fully document invariants +/// here.) +pub struct CppRef { + ptr: *const T, +} + +impl Receiver for CppRef { + type Target = T; +} + +// generated by bindings generator +struct ConcreteCppType { + // ... +} + +// all generated by bindings generator; mostly calls into C++ +// In this example these are not marked "unsafe" because we do not directly use +// CppRef::ptr in Rust. This example assumes that the corresponding C++ functions +// do not themselves have unsafe behavior and thus can be presented to Rust as safe. +// Safety of FFI is orthogonal to this RFC. +impl ConcreteCppType { + fn some_cpp_method(self: CppRef) {} + fn get_int_field(self: &CppRef) -> u32 {} + fn get_more_complex_field(self: &CppRef) -> CppRef {} + fn equals(self: &CppRef) -> bool {} +} + +// generated by bindings generator +fn get_cpp_reference() -> CppRef { + // also calls into C++ +} + +fn main() { + // Rust code manipulating C++ objects via C++-semantics references + let cpp_obj_reference: CppRef = get_cpp_reference(); + // cpp_obj_reference does not obey Rust reference semantics. Other + // "references" to the same data may exist in the Rust or C++ domain. + // But it can effectively be used as an opaque token to pass safely + // through Rust back into C++ + let some_value: u32 = cpp_obj_reference.get_int_field(); + let some_field = cpp_obj_reference.get_more_complex_field(); + cpp_obj_reference.equals(&get_cpp_reference()); +} +``` + +(fuller example [here](https://github.com/google/autocxx/blob/main/src/reference_wrapper.rs#L117), with various [trait-based attempts](#not-do-it) to work around the lack of arbitrary self types.) + +Another case is when the existence of a reference is, itself, semantically important — for example, reference counting, or if relayout of a UI should occur each time a mutable reference ceases to exist. In these cases it's not OK to allow a regular Rust reference to exist, and yet sometimes we still want to be able to call methods on a reference-like thing. + +A third motivation is that taking smart pointer types as `self` parameters can enable functions to act on the smart pointer type, not just the underlying data. For example, taking `&Arc` allows the functions to both clone the smart pointer (noting that the underlying `T` might not implement `Clone`) in addition to access the data inside the type, which is useful for some methods; this also makes it ergonomic in more cases to make `Arc` explicit rather than having `SomeType` contain an `Arc` internally and have `Arc`-like `clone` semantics. Also, being able to change a method from accepting `&self` to `self: &Arc` can be done in a mostly frictionless way, whereas changing from `&self` to a static method accepting `&Arc` will always require some amount of refactoring. These options are currently open only to Rust's built-in smart pointer types, not to custom smart pointer types. + +Finally, there's just a matter of symmetry with Rust's own smart pointer types. [The Rust for Linux project, for instance, requires a custom `Arc` type](https://rust-for-linux.com/arc-in-the-linux-kernel#arbitrary-self-types). In theory, users can define their own smart pointers. In practice, they're second-class citizens compared to the smart pointers in Rust's standard library. A type `T` can accept method calls using smart pointers as the `self` type only if they're one of Rust's built-in smart pointers. + +This RFC proposes to loosen this restriction to allow custom smart pointer types to be accepted as a `self` type just like for the standard library types. + +See also [this blog post](https://medium.com/@adetaylor/the-case-for-stabilizing-arbitrary-self-types-b07bab22bb45), especially for a list of more specific use-cases. + +## Motivation for the v2 changes + +Unstable Rust contains an implementation of arbitrary self types based around the `Deref` trait. Naturally, that trait also provides a means to create a `&T`. Example: + +```rust +#[feature(arbitrary_self_types)] + +struct SmartPtr(*const T); + +impl Deref for SmartPtr { + type Target = T; + fn deref(&self) -> &Self::Target { + // never called, but smart pointers need to implement this method + // sometimes it's just not safe to create a reference to self.0 + } +} + +struct ConcreteType; + +impl ConcreteType { + fn some_method(self: SmartPtr) { + + } +} + +fn main() { + let concrete: SmartPtr = ...; + concrete.some_method(); +} +``` + +This works well for some smart pointer types where it's OK to create `&T` (but not necessarily `&mut T`). This includes `Pin` and the reference counted pointers. For that reason, the original arbitrary self types feature could be based around `Deref`. But in other smart pointer use-cases (especially those relating to foreign language semantics) it's not OK to create even `&T`. + +The arbitrary self types feature should be enhanced so it works even when we can't allow `&T`. As noted above, that's most commonly because of semantic differences to pointers in other languages, but it might be because references have special meaning or behavior in some pure Rust domain. Either way, it may not be OK to create a Rust reference `&T`, yet we may want to allow methods to be called on some reference-like thing. + +For this reason, implementing `Deref::deref` is problematic for many of the likely users of this "arbitrary self types" feature. + +If you're implementing a smart pointer `P`, and you need to allow `impl T { fn method(self: P) { ... }}`, yet you can't allow a reference `&T` to exist, any option for implementing `Deref::deref` has drawbacks: + +* Specify `Deref::Target=T` and panic in `Deref::deref`. Not good. +* Specify `Deref::Target=*const T`. This is only possible if your smart pointer type contains a `*const T` which you can reference - this isn't the case for (for instance) weak pointers or types containing `NonNull`. + +Therefore, the current Arbitrary Self Types v2 provides a separate `Receiver` trait, so that there's no need to provide an awkward `Deref::deref` implementation. + +This v2 version has two other differences relative to the existing unstable `arbitrary_self_type` feature: +* We won't allow raw pointer receivers, yet. It's highly desirable that we do so in future - this is discussed under the [enable for pointers](#enable-for-pointers) section. +* We will block generic receivers. See the [diagnostics section for reasoning](#diagnostics). + +Aside from these differences, Arbitrary Self Types v2 is similar to the existing unstable `arbitrary_self_types` feature. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +When declaring a method, users can also declare the type of the `self` receiver to be any type `T` where `T: Receiver`, in addition to using `Self` by value or reference. + +The `Receiver` trait is simple and only requires specifying the `Target` type: + +```rust +trait Receiver { + type Target: ?Sized; +} +``` + +The `Receiver` trait is already implemented for many standard library types: +- smart pointers in the standard library: `Rc`, `Arc`, `Box`, and `Pin>` (and in fact, any type which implements `Deref`) +- references: `&Self` and `&mut Self` + +Shorthand exists for references, so that `self` with no ascription is of type `Self`, `&self` is of type `&Self` and `&mut self` is of type `&mut Self`. + +All of the following self types are valid: + +```rust +impl Foo { + fn by_value(self /* self: Self */); + fn by_ref(&self /* self: &Self */); + fn by_ref_mut(&mut self /* self: &mut Self */); + fn by_box(self: Box); + fn by_rc(self: Rc); + fn by_custom_ptr(self: CustomPtr); +} + +struct CustomPtr(*const T); + +impl Receiver for CustomPtr { + type Target = T; +} +``` + +## Recursive arbitrary receivers + +Receivers are recursive and therefore allowed to be nested. If type `T` implements `Receiver`, and type `U` implements `Receiver`, `T` is a valid receiver (and so on outward). This is the behavior for the current special-cased self types (`Pin`, `Box` etc.), so as we remove the special-casing, we need to retain this property. + +For example, this self type is valid: + +```rust +impl MyType { + fn by_rc_to_box(self: Rc>) { ... } +} +``` + +The Rust language doesn't provide a way for user code to use this recursive property in generics or iteration, so this trait is unlikely to be useful except to the compiler. Nevertheless, we don't intend to _prevent_ use of the `Receiver` trait by user code: since the same recursive property applies to `Deref` yet it's been occasionally useful to [introduce `Deref` bounds](https://doc.rust-lang.org/std/pin/struct.Pin.html#method.new_unchecked). + +## Implementing methods on smart pointers + +If your smart pointer type implements `Receiver`, you should not add methods to that smart pointer type after its initial creation. As soon as anyone is using your smart pointer type outside of your crate, they may add methods on a contained type; for example: + +```rust +impl SomeType { + fn do_something(self: your_crate::SmartPointer) {} +} +``` + +If you then add `SmartPointer::do_something`, this is a conflict, and the compiler will produce an error. It's therefore considered to be a compatibility break to add additional methods to `your_crate::SmartPointer`. It's OK to add methods at the outset when you create `SmartPointer`, until the point at which other people start using it. + +This principle has been followed for the types in Rust's standard library which implement `Receiver`; for instance, `Box` and `Rc`. Mostly they offer associated functions rather than methods. + +In the future there might be a deshadowing algorithm that can relax this rule - see the [method shadowing section below](#method-shadowing) for discussion. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## `core` libs changes + +The `Receiver` trait is made public (removing its `#[doc(hidden)])` attribute), exposing it under `core::ops`. It gains a `Target` associated type. + +This trait marks types that can be used as receivers other than the `Self` type of an impl or trait definition. + +```rust +pub trait Receiver { + type Target: ?Sized; +} +``` + +A blanket implementation is provided for any type that implements `Deref`: + +```rust +impl Receiver for P +where + P: Deref, +{ + type Target =

::Target; +} +``` + +(See [alternatives](#no-blanket-implementation-for-deref) for discussion of the tradeoffs here.) + +It is also implemented for `&T` and `&mut T`. + +## Compiler changes: method probing + +The existing Rust [reference section for method calls describes the algorithm for assembling method call candidates](https://doc.rust-lang.org/reference/expressions/method-call-expr.html), and there's more detail in the [rustc dev guide](https://rustc-dev-guide.rust-lang.org/method-lookup.html). + +The key part of the first page is this: + +> The first step is to build a list of **candidate receiver types**. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate `T`, add `&T` and `&mut T` to the list immediately after `T`. + +> Then, for each candidate type `T`, search for a visible method with a receiver of that type in the following places: +> - `T`'s inherent methods (methods implemented directly on `T`). +> Any of the methods provided by a visible trait implemented by `T`. + +We'll call this second list the **candidate methods**. + +With this RFC, the candidate receiver types are assembled the same way - nothing changes. But, the **candidate methods** are assembled in a different way. Specifically, instead of iterating the candidate receiver types, we assemble a new list of types by following the chain of `Receiver` implementations. As `Receiver` is implemented for all types that implement `Deref`, this may be the same list or a longer list. Aside from following a different trait, the list is assembled the same way, including the insertion of equivalent reference types. + +We then search each type for inherent methods or trait methods in the existing fashion - the only change is that we search a potentially longer list of types. + +It's particularly important to emphasize also that the list of candidate receiver types _does not change_. But, a wider set of locations is searched for methods with those receiver types. + +For instance, suppose `SmartPtr` implements `Receiver` but not `Deref`. Imagine you have `let t: SmartPtr = /* obtain */; t.some_method();`. We will now search `impl SomeStruct {}` blocks for an implementation of `fn some_method(self: SmartPtr)`, `fn some_method(self: &SmartPtr)`, etc. The possible self types in the method call expression are unchanged - they're still obtained by searching the `Deref` chain for `t` - but we'll look in more places for methods with those valid `self` types. + +## Compiler changes: deshadowing +[compiler-changes-deshadowing]: #compiler-changes-deshadowing + +The major functional change to the compiler is described above, but a couple of extra adjustments are necessary to avoid future compatibility breaks by method shadowing. + +Specifically, that page also states: + +> If this results in multiple possible candidates, then it is an error, and the receiver must be converted to an appropriate receiver type to make the method call. + +With arbitrary self types v2, the compiler will actively search for additional conflicts in order to produce this error in more cases. Specifically, it will consider whether autoreffed candidates conflict with by-value candidates, in order to produce an error in situations like this: + +```rust +struct Foo; +struct SmartPtr(T): // implements Receiver + +impl SmartPtr { + fn a(&self) {} // by reference +} + +impl Foo { + fn a(self: SmartPtr) {} // by value +} + +fn main() { + let a = SmartPtr(Foo); + a.a(); // produces an error +} +``` + +To be precise, the compiler will: +* Search for the best by-value pick +* Search for the best autoreffed pick +* Search for the best autorefmut pick +* For each pair from the above list, consider the first to be the 'shadowing' pick and the second to be the 'shadowed' pick. Show an error if: + * The same number of autoderefs has been applied (confirming the `self` type is identical, aside from any autoreffing) + * One is further along the chain of `Receiver` than another (confirms that it's arbitrary self types causing the conflcit) + * The shadowing pick is an inherent impl (we are concerned about the case that a smart pointer is adding inherent methods shadowing inner types, not cases where traits bring further methods into play) + * The picks don't refer to the same resulting item (which could happen with things like blanket impls for any type) +* Otherwise, choose the pick in order of by-value, autoreffered, autorefmut, or const ptr as it does now. + +Aside from production of errors in more cases, there is no change to method picking here. That said, the production of errors requires us to interrogate more candidates to look for potential conflicts, so this could have a compile-time performance penalty which we should measure. + +(The current reference doesn't describe it, but the current algorithm also searches for method receivers of type `*const Self` and handles them explicitly in case the receiver type was `*mut Self`. We do not check for cases where a new `self: *mut Self` method on an outer type might shadow an existing `self: *const SomePtr` method on an inner type. Although this is a theoretical risk, such compatibility breaks should be easy to avoid because `self: *mut Self` are rare. It's not readily possible to produce errors in these cases, because we already intentionally shadow `*const::cast` with `*mut::cast`.) + +## Object safety + +Receivers are object safe if they implement the (unstable) `core::ops::DispatchFromDyn` trait. + +As not all receivers might want to permit object safety or are unable to support it, object safety should remain being encoded in a different trait than the here proposed `Receiver` trait, likely `DispatchFromDyn`. + +This RFC does not propose any changes to `DispatchFromDyn`. Since `DispatchFromDyn` is unstable at the moment, object-safe receivers might be delayed until `DispatchFromDyn` is stabilized. `Receiver` is not blocked on further `DispatchFromDyn` work, since non-object-safe receivers already cover a big chunk of the use-cases. + +It's been proposed that, instead of `DispatchFromDyn`, a `#[derive(SmartPointer)]` mechanism may be stabilized instead. Again, this doesn't block our work on `Receiver`. There are some use cases for `Receiver` that won't suit either `DispatchFromDyn` nor `#[derive(SmartPointer)]`, most notably the [Rust for Linux `Wrapper` type described here](https://rust-for-linux.com/arc-in-the-linux-kernel#nextprev-pointers-and-dynamic-dispatch). + +## Lifetime elision + +Arbitrary `self` parameters may involve lifetimes. + +Even in existing stable Rust, there are [bugs in lifetime elision for complex `Self` types such as `&Box`](https://github.com/rust-lang/rust/issues/117715). We're aiming to fix them whether or not this RFC is accepted. The net rules will be: + +* If a parameter is the first parameter, and +* Called `self`, and +* Its type involves `Self` anywhere, and +* Its type contains _exactly one_ lifetime anywhere + +then that lifetime may be used to elide lifetimes on return types, and will take precedence over any lifetimes in other parameters. + +If this seems wrong, please discuss this over on [the linked bug](https://github.com/rust-lang/rust/issues/117715) rather than here in this RFC, because none of that should change with this RFC (though it does make it more likely users will run into the current inconsistencies). We'll try to keep this RFC up to date with the outcome of those discussions. + +## Diagnostics +[diagnostics]: #diagnostics + +The existing branches in the compiler for "arbitrary self types" already emit excellent diagnostics. We will largely re-use them, with the following improvements: + +- In the case where a self type is invalid because it doesn't implement `Receiver`, the existing excellent error message will be updated. +- An easy mistake is to implement `Receiver` for `P`, forgetting to specify `T: ?Sized`. `P` then only works as a `self` parameter in traits `where Self: Sized`, an unusual stipulation. It's not obvious that `Sized`ness is the problem here, so we will identify this case specifically and produce an error giving that hint. +- There are certain types which feel like they "should" implement `Receiver` but do not: `Weak` and `NotNull`. If these are encountered as a self type, we should produce a specific diagnostic explaining that they do not implement `Receiver` and suggesting that they could be wrapped in a newtype wrapper if method calls are important. We hope this can be achieved with [diagnostic items](https://rustc-dev-guide.rust-lang.org/diagnostics/diagnostic-items.html). +- The current unstable arbitrary self types feature allows generic receivers. For instance, + ```rust + impl Foo { + fn a>(self: R) { } + } + ``` + We don't know a use-case for this. There are several cases where this can result in misleading diagnostics. (For instance, if such a method is called with an incorrect type (for example `smart_ptr.a::<&Foo>()` instead of `smart_ptr.a::()`). We could attempt to find and fix all those cases. However, we feel that generic receiver types might risk subtle interactions with method resolutions and other parts of the language. We think it is a safer choice to generate an error on any declaration of a generic `self` type. +- As noted in [the section about compiler changes for deshadowing](#compiler-changes-deshadowing) we will produce a "multiple method candidates" error if a method in an inner type is chosen in preference to a method in an outer type ("inner" = further along the `Receiver` chain) and the inner type is either `self: &T` or `self: &mut T` and we're choosing it in preference to `self: T` or `self: &T` in the outer type. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +- Deref coercions can already be confusing and unexpected. Adding a new `Receiver` trait could cause similar confusion. +- Custom smart pointers are a niche use case (but they're very important for cross-language interoperability.) + +## Method shadowing +[method-shadowing]: #method-shadowing + +For a smart pointer `P` that implements `Deref`, a method call `p.m()` might call a method `P::m` on the smart pointer type itself, or it might call `T::m`. If both methods are declared, this results in an error. + +Rust standard library smart pointers are designed with this shadowing behavior in mind: + +* `Box`, `Pin`, `Rc` and `Arc` heavily use associated functions rather than methods. +* Where they use methods, it's often with the _intention_ of shadowing a method in the inner type (e.g. `Arc::clone`). + +Furthermore, the `Deref` trait itself [documents this possible compatibility hazard](https://doc.rust-lang.org/nightly/std/ops/trait.Deref.html#when-to-implement-deref-or-derefmut), and the Rust API Guidelines has [a guideline about avoiding inherent methods on smart pointers](https://rust-lang.github.io/api-guidelines/predictability.html#smart-pointers-do-not-add-inherent-methods-c-smart-ptr). + +This RFC does not make things worse for types that implement `Deref`. + +_However_, this RFC allow types to implement `Receiver`. This would run the risk of breakage: + +```rust +struct Concrete; + +impl Concrete { + fn wardrobe(self: SmartPointerWhichImplementsReceiver) { } +} + +fn main() { + let concrete: SmartPointerWhichImplementsReceiver = /* obtain */; + concrete.wardrobe() +} +``` + +If `SmartPointerWhichImplementsReceiver` now adds `SmartPointerWhichImplementsReceiver::wardrobe(self)`, the above valid code would start to error. + +The same would apply in this slightly different circumstance: + +```rust +struct Concrete; + +impl Concrete { + fn wardrobe(self: &SmartPointerWhichImplementsReceiver) { } // this is now a reference +} + +fn main() { + let concrete: SmartPointerWhichImplementsReceiver = /* obtain */; + concrete.wardrobe() +} +``` + +If Rust added `SmartPointerWhichImplementsReceiver::wardrobe(&self)` we would start to produce an error here. If `SmartPointerWhichImplementsReceiver` added `SmartPointerWhichImplementsReceiver::wardrobe(self)` then it would be +even worse - code would start to call `SmartPointerWhichImplementsReceiver::wardrobe` where it had previously called `SmartPointerWhichImplementsReceiver::wardrobe`. + +The [deshadowing section of the compiler changes](#compiler-changes-deshadowing), describes how we avoid this. The compiler will take pains to identify any such ambiguities and it will show an error. + +We have (extensively) considered algorithms to pick the intended method instead - see [picking the shadowed method](#picking-the-shadowed-method), below. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +As this feature has been cooking since 2017, many alternative implementations have been discussed. + +## Deref-based +[deref-based]: #deref-based + +As noted in the rationale section, the currently nightly implementation implements arbitrary self types using the `Deref` trait. + +## No blanket implementation for `Deref` +[no-blanket-implementation]: #no-blanket-implementation + +Another major approach previously discussed is to have a `Receiver` trait, as proposed in this RFC, but without a blanket implementation for `T: Deref`. Blanket implementations are unusual for core Rust traits, but the authors of this RFC believe it's necessary in this case. + +Specifically, this RFC proposes that the existing method search algorithm is modified to search the `Receiver` chain _instead of_ the `Deref` chain. + +It's therefore a major compatibility break if existing `Deref` implementors cease to be usable as `self` parameters. Just in the standard library, we'd have to add `Receiver` implementations for `Cow`, `Ref`, `ManuallyDrop` and possibly many other existing implementors of `Deref`: third party libraries would have to do the same. Without that, method calls on these types would not be possible: + +```rust +fn main() { + let ref_cell = RefCell::new(/* something cloneable */); + ref_cell.borrow().clone(); // no longer possible if: + // 1) we cease to explore Deref in identifying method candidates + // 2) Ref doesn't implement Receiver. +} +``` + +This doesn't just break people previously using the unstable Rust `arbitrary_self_type` feature; it breaks stable Rust usages as well. Obviously this is not acceptable, so we believe the blanket implementation is necessary. + +In any case, we think a blanket implementation is desirable: + +* It prevents `Deref` and `Receiver` having different `Target`s. That could possible lead to confusion if it prompted the compiler to explore different chains for these two different purposes. +* If smart pointer type `P` is in a crate, users of `P` to create `P` will be able to use it as a `self` type for `MyConcreteType` without waiting for a new release of the `P` crate. + +We found that [some crates use `Deref` to express an is-a not a has-a relationship](https://gist.github.com/davidhewitt/d0ed031fb05f6db98ee249ae089b268e) and so, ideally, might have preferred the option of setting up `Deref` and `self` candidacy separately. But, on discussion, we concluded that traits would be a better way to model those relationships. + +## Explore both `Receiver` and `Deref` chains while identifying method candidates + +We could modify the method search algorithm to explore both `Deref` and `Receiver` targets when identifying method candidates. This would avoid breaking compatibility, yet would give the desired flexibility for folks who wish to implement `Receiver` but not `Deref`. + +We don't think this is such a good option because: + +* It's more confusing for users; +* It could lead to a worst-case O(n^2) number of method candidates to explore (though possibly this could be limited to O(2n) if we added restrictions); +* It's a more invasive change to the compiler; +* We don't know of any use-cases which the `Receiver` and blanket implementation for `Deref` do not allow. + +If some use-case presents itself where a type _must_ implement `Deref` but not `Receiver`; or a use-case presents itself where `Deref` and `Receiver` _must_ have different `Target`s then we will have to consider this more complex option. + +## Generic parameter + +Change the trait definition to have a generic parameter instead of an associated type. There might be permutations here which could allow a single smart pointer type to dispatch method calls to multiple possible receivers - but this would add complexity, no known use case exists, and it might cause worst-case O(n^2) performance on method lookup. + +## Enable for raw pointers (or `Weak` or `NonNull`) +[enable-for-pointers]: #enable-for-pointers + +This RFC, unlike the original Arbitrary Self Types nightly feature, does not allow raw pointer `self` types. We are led to believe that raw pointer receivers are quite important for the future of safe Rust, because stacked borrows makes it illegal to materialize references in many positions, and there are a lot of operations (like going from a raw pointer to a raw pointer to a field) where users don't need to or want to do that. + +On the other hand, we don't want to encourage the use of raw pointers, and would prefer rather that raw pointers are wrapped in a custom smart pointer that encodes and documents the invariants. + +The main problem, though, is that raw pointers _have methods_ and Rust wants to add more methods to them in future - especially around pointer provenance. As noted in the [deshadowing section](#compiler-changes-deshadowing), we would start to generate errors in arbitrary crates if ever we added such additional methods to raw pointers. That's clearly not OK. So, to add support for raw pointers as self types, we'd need to use a cleverer deshadowing algorithm. This is discussed in the next section, but overall has been judged to be too complicated _for now_. + +Instead, this version of Arbitrary Self Types is as conservative as possible, such that we ought to be able to adopt such an algorithm in a future enhancement. + +## Pick shadowed methods instead of erroring +[pick-shadowed-methods-instead-of-erroring]: #pick-shadowed-methods-instead-of-erroring + +As explained in the [deshadowing section](#compiler-changes-deshadowing), the Rust compiler will generate errors in case of a conflict between a method on a smart pointer and an inner type. For example: + +```rust +struct Foo; +struct SmartPtr(T): // implements Receiver + +impl SmartPtr { + fn a(self) {} +} + +impl Foo { + fn a(self: SmartPtr) {} +} + +fn main() { + let a = SmartPtr(Foo); + a.a(); // produces an error +} +``` + +There has been extensive discussion (and prototyping) about cleverer "deshadowing" algorithms here. The current leading contender is to: + +* If there are conflicts, + * Always pick the "inner" method; + * Show a warning, and ask the user to disambiguate using UFC syntax (or [future alternatives](https://internals.rust-lang.org/t/idea-paths-in-method-names/6834?u=scottmcm)). + +The rationale is that the author of the "inner" method is always aware of pre-existing methods on the "outer" (smart pointer) type. If a conflict arises, this means that the new method was added to the outer type, and therefore Rust can maintain existing behavior by picking the method on the inner type. (This logic falls down in the case of race conditions as crates are published, but it's broadly true.) This logic is believed to be sound, but it's counterintuitive: in all other circumstances Rust method probing works outside-in. This algorithm is also quite complex, and there's a risk of unknown unknowns. + +There has also been some discussion about broader changes to method resolution in future, for example a crate-by-crate approach or even a `name-resolution.lock` file. + +The decision has been taken, then, to restrict the current RFC to the most conserative possible version - one which errors on _any_ conflicts, and firmly advises the creators of smart pointers to avoid adding new methods. This gives us maximum flexibility in future to allow more possibilities by relaxing some of those errors to warnings. This is a high priority primarily because of the desire to allow method calls on raw pointers (see the previous section). + +## Not do it +[not-do-it]: #not-do-it + +As always there is the option to not do this. But this feature already kind of half-exists (we are talking about `Box`, `Pin` etc.) and it makes a lot of sense to also take the last step and therefore enable non-libstd types to be used as self types. + +There is the option of using traits to fill a similar role, e.g. + +```rust +trait ForeignLanguageRef { + type Pointee; + fn read(&self) -> *const Self::Pointee; + fn write(&mut self, value: *const Self::Pointee); +} + +// -------------------------------------------------------- + +struct ConcreteForeignLanguageRef(T); + +impl ForeignLanguageRef for ConcreteForeignLanguageRef { + type Pointee = T; + + fn read(&self) -> *const Self::Pointee { + todo!() + } + + fn write(&mut self, _value: *const Self::Pointee) { + todo!() + } +} + +// -------------------------------------------------------- + +struct SomeForeignLanguageType; + +impl ConcreteForeignLanguageRef { + fn m(&self) { + todo!() + } +} + +trait Tr { + type RustType; + + fn tm(self) + where + Self: ForeignLanguageRef; +} + +impl Tr for ConcreteForeignLanguageRef { + type RustType = SomeForeignLanguageType; + fn tm(self) {} +} + +fn main() { + let a = ConcreteForeignLanguageRef(SomeForeignLanguageType); + a.m(); + a.tm(); +} +``` + +This successfully allows method calls to `m()` and even `tm()` without a reference to a `SomeForeignLanguageType` ever existing. However, due to the orphan rule, this forces every crate to have its own equivalent of `ConcreteForeignLanguageRef`. This workaround has been used by some interop tools, but use across multiple crates requires many generic parameters (`impl ForeignLanguageRef`). + +## Always use `unsafe` when interacting with other languages + +One main motivation here is cross-language interoperability. As noted in the rationale, C++ references can't be _safely_ represented by Rust references. Many would say that all C++ interop is intrinsically unsafe and that `unsafe` blocks are required. Maybe true: but that just moves the problem - an `unsafe` block requires a human to assert preconditions are met, e.g. that there are no other C++ pointers to the same data. But those preconditions are almost never true, because other languages don't have those rules. This means that a C++ reference can never be a Rust reference, because neither human nor computer can promise things that aren't true. + +Only in the very simplest interop scenarios can we claim that a human could audit all the C++ code to eliminate the risk of other pointers existing. In complex projects, that's not possible. + +However, a C++ reference _can_ be passed through Rust safely as an opaque token such that method calls can be performed on it. Those method calls actually happen back in the C++ domain where aliasing and concurrent modification are permitted. + +For instance, + +```rust +struct ForeignLanguageRef; + +fn main() { + let some_foreign_language_reference: ForeignLanguageRef<_> = CallSomeForeignLanguageFunctionToGetAReference(); + // There may be other foreign language references to the referent, with concurrent + // modification, so some_foreign_language_reference can't be a &T + // But we still want to be able to do this + some_foreign_language_reference.SomeForeignLanguageMethod(); // executes in the foreign language. Data is not + // dereferenced at all in Rust. +} +``` + +Even if the reader takes the view that all calls into foreign languages are intrinsically unsafe and must be marked as such, hopefully the reader would support building abstractions using the Rust type system to minimize the practical risk of undefined behavior. That's what this RFC aims to enable. + +# Prior art +[prior-art]: #prior-art + +A previous PR based on the `Deref` alternative has been proposed before https://github.com/rust-lang/rfcs/pull/2362 and was postponed with the expectation that the lang team would [get back to `arbitrary_self_types` eventually](https://github.com/rust-lang/rfcs/pull/2362#issuecomment-527306157). + +# Future work + +As [discussed above](#pick-shadowed-methods-instead-of-erroring) we anticipate a future version which will relax some errors into warnings, and thus allow us to add support for raw pointers, `Weak` and `NonNull` as self types. + +Thereafter, we could consider implementing `Receiver` for other types, e.g. [`std::cell`](https://doc.rust-lang.org/std/cell/index.html) types, [`std::sync`](https://doc.rust-lang.org/std/sync/index.html) types, [`std::cmp::Reverse`](https://doc.rust-lang.org/std/cmp/struct.Reverse.html), [`std::num::Wrapping`](https://doc.rust-lang.org/nightly/std/num/struct.Wrapping.html), [`std::mem::MaybeUninit`](https://doc.rust-lang.org/std/mem/union.MaybeUninit.html), [`std::task::Poll`](https://doc.rust-lang.org/nightly/std/task/enum.Poll.html), and so on - possibly even for arrays, etc. + +There seems to be no disadvantage to doing this - taking `Cell` as an example, it would only have any effect on the behavior of code if somebody implemented a method taking `Cell` as a receiver. On the other hand, it's hard to imagine use-cases for some of these. For now, though, we should clearly restrict `Receiver` to those types for which there's a demonstrated need. + +# Feature gates + +This RFC is in an unusual position regarding feature gates. There are two existing gates: + +- `arbitrary_self_types` enables, roughly, the _semantics_ we're proposing, albeit [in a different way](#deref-based). It has been used by various projects. +- `receiver_trait` enables the specific trait we propose to use, albeit without the `Target` associated type. It has only been used within the Rust standard library, as far as we know. + +Although we presumably have no obligation to maintain compatibility for users of the unstable `arbitrary_self_types` feature, we should consider the least disruptive way to introduce this feature. + +The plan is: + +- the `receiver_trait` gate continues to control the existing `Receiver` trait used solely within the standard library, which is renamed to `LegacyReceiver` or `FixedReceiver` or something (and will be removed assuming we stabilize this feature) +- `arbitrary_self_types` comes to control the new behavior, with a new `Receiver` trait containing a `Target` associated type. As noted, this does not include raw pointers, though we hope to find a way to stabilize this in a future RFC. +- Add a new `arbitrary_self_types_pointers` feature gate which retains support for raw pointers. + +# Summary + +This RFC is an example of replacing special casing aka. compiler magic with clear and transparent definitions. We believe this is a good thing and should be done whenever possible.