-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the uniqueness guarantees of Box and Vec? #326
Comments
I think even that if |
Unique doesn't "have" to have special semantics. It exists in case it's useful for it to, and to mark all the types that are "like box" if we ever figure out ways to make box magic more available to other types. I don't have a good sense for how useful it is to make Box "smarter" but certainly we have a lot of anecdotal evidence that it leads to surprises/sadness. In particular I've been told that just moving a Box apparently invalidates even raw internal pointers, which, sounds miserable. Like I can get it from the strictest interpretation of "it's as if it was a local var" and certainly I wouldn't expect borrowck to ever let you get that cute, but it seems like unsafe code should be allowed to rely on Box/Vec having stable addresses since that's like, one of the major features of indirection! |
Okay -- I don't really care if I remember optimizing some tight |
I can imagine relaxing the |
Just to make sure my position is clear here, I am in favor of removing I think that And I want to call out that under Stacked Borrows, if a pointer type behaves like fn main() {
let b = Box::new(0u8);
let ptr = &*b as *const u8;
let _a = b;
unsafe {
println!("{}", *ptr);
}
} Exactly why it is UB depends on if you are considering Stacked Borrows with or without raw pointer tagging. But in any case, this is deeply surprising. What else is "A pointer type for heap allocation." (that's how the current docs describe it) good for, if not address stability? Also the |
What if the Next Aliasing Model made that not UB (by not doing "retagging" on With regards to keeping fn fun(x: Box<i32>, y: *mut i32) -> i32 {
unsafe { *y = 5; }
*x
}
let mut b = Box::new(16);
let ptr = *mut *b;
fun(b, ptr); |
I wish I knew if such code is written, because I suspect all crates that arrive in this situation are UB according to Stacked Borrows before they get to this call 😩 But from a more theoretical perspective, it's not clear to me that any author of Rust code would expect that function to be UB. I think we've done a pretty good job educating people via the docs that |
IMO it might be tolerable for a deref through the box to potentially invalidate pointers (ideally under looser rules based on the compiler "understanding" box and being able to do disjoint reborrows... so maybe it's semantically identical to &mut?), but moving the Box having side-effects is "too weird" without a really good explanation of why that might be problematic to allow. I would tbh be fine with totally dropping noalias here, but I am not a Performance person. |
Just leaving my 2¢: Much existing code does treat However, usage that interleaves However, I have a hard time saying that Ralf's example can be UB. I don't recall exactly how SB recurses into reference fields; if it wouldn't be UB with a definition of Making moving a box into a function special, but allowing it when moving around in a single stack frame seems like a horrible state, because then refactoring out a function semantically changes the AM semantics. My personal conclusion is that It gets even more interesting with |
It's an open question (Cc #125). We can totally satisfy your axiom by making the |
I kind of am. Or at least I fancy myself as. Or I try to be. I just grabbed a few crates and tried running their benchmarks with I looked at the regressions in The changes in The huge change in But overall the fact that we see such colossal regressions on microbenchmarks from providing more information to the compiler makes me question the wisdom of using current LLVM's ability to take advantage of It would be super cool if they were fixed soon, but I do not have the patience at the current time to minimize |
Actually, noting something for lccc's model specifically, it has
Of course, since rustc translates (For those who care: a derive operation is xlang's generalization of an SB reborrow - strictly speaking it produces a new pointer from some computation on an existing one, asserting it's non-trivial validity, such as it's valid range or aliasing behaviour, in the process). |
Interesting. To my mind, That said, I also see that it would be nice to be able to allocate a box and move it around as a way to "remember" that the pointer needs to be freed later. Tying the "uniqueness" to actually using the box may indeed be a reasonable compromise. |
I've definitely used box as a way to say "handle the stupid allocation crap for me", but otherwise I've wanted only the pointing part and not the uniqueness and invalidating other pointers and all that stuff. However, we could just add some simpler to use methods to our allocation system and "solve" the problem that way. |
The standard library doesn't provide an unsafe fn rebuild_boxed_slice(buf: *mut u8, offset: *const u8, len: usize) -> Box<[u8]> {
let cap = (offset as usize - buf as usize) + len;
Box::from_raw(slice::from_raw_parts_mut(buf, cap))
} fn decrement_refcount(&self, current_state: StateSnapshot) -> StateSnapshot {
let current_state = self.atomic_update_state(current_state, |mut state: StateSnapshot| {
state.refcount -= 1;
state
});
// Drop the State if it is not referenced anymore
if !current_state.has_refs() {
// Safety: `CancellationTokenState` is always stored in refcounted
// Boxes
let _ = unsafe { Box::from_raw(self as *const Self as *mut Self) };
}
current_state
} |
This is something that could be addressed in an edition though, since the compiler can choose to emit |
Can't we provide better tools in all editions? There's no obvious reason for AliasableBox to not be available in all editions. |
Of course we can, but we'd need to keep existing code that uses Box working on old editions. |
So wouldn't Box always be noalias, in all editions? And then AliasBox would be a new type people could that's also in all editions? And code could transfer to AliasBox if that's what they actually intend? It just seems exceedingly confusing to make such a subtle change over an edition when we know it's specifically affecting unsafe code. However it's set up, it should be the same across all editions. |
Well the problem with that is the examples of existing code that is only correct when Box is not |
I worry about that kind of characterization, because that could apply to any increase in information that rustc feeds to LLVM. We shouldn't block all additions of LLVM attributes behind an edition, just because they may produce surprising compilation results for some users. That is why I am focusing on whether or not we should expect users to be familiar with semantics that are communicated to LLVM. |
I did this same experiment but with the code that puts |
Would it be useful/make sense to add an unstable flag to disable noalias on everything but &mut? |
While I'm thinking about it: adopting something like the storages API would allow that |
Who would use such a flag? A skim over the search for |
Unsurprisingly it turns out that Vec is full of aliasing violations, if you interpret its |
Miri surely does not like that PR, but it's not entirely due to the current implementation of pub fn as_mut_ptr(&mut self) -> *mut T {
// We shadow the slice method of the same name to avoid going through
// `deref_mut`, which creates an intermediate reference.
- let ptr = self.buf.ptr();
+ let ptr = self.buf.as_mut_ptr().cast::<T>();
unsafe {
assume(!ptr.is_null());
} There's also an issue where the (but yes, among other things, the (oh whoops I read your comment here instead of commenting on that PR) sigh |
That example should be fine even under Stacked Borrows since everything is well-nested. You are basically "reborrowing" the |
@thomcc During an opsem meeting, someone mentioned that you are rather firmly against having any sort of aliasing restriction for
|
I know that crates like It relies on I know this crate is unsound because it isn't hardened against malicious code, but the basic premise is currently sound. |
Unlike with Box, we've never documented that Vec has this restriction, and it's unintuitive. It also breaks a lot of reasonable use cases. I think having it on Vec requires quite a bit more justification than with anything else.
My understanding was it was more an indication that these are types which semantically could be noalias. Certainly for the time I've been working on the stdlib, Not for nothing, but understanding of aliasing semantics back then was not very firm, so I don't know that legacy holds much weight in my mind, especially when
No. |
I don't think typed-arena would break. We have a prototype for Unique semantics, and while it doesn't yet entirely behave the way we'd like, at least our simple arena tests pass. That's why I was asking.
|
To elaborate a bit: retagging only happens when a pointer is moved by-value. For arenas, the moment the first element got created, the arena will not be moved any more -- everyone just has references to it. This means the Furthermore, let mut v = vec![0, 1];
let ptr1 = v.as_mut_ptr();
let mut v2 = v;
let ptr2 = v.as_mut_ptr().add(1);
ptr1.write(0);
ptr2.write(0); The aliasing requirements only start rejecting code when the same element gets accessed again with a freshly retagged pointer (e.g., if
That is fair.
That is the part I am disputing. At least I have not seen evidence of that. |
Change to use `NonNull` instead of `Box` due to the requirement that boxed values are not aliased. See rust-lang/unsafe-code-guidelines#326.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Yeah I don't think discussing editions is productive here. I don't think there is any future in which |
|
What makes that custom allocator example interesting is that even if we made Making |
This just sounds like anothrt argument against Although another fix is to make the Allocator trait special like the global Allocator. |
To me it sounds like another argument showing that custom allocators were added to
We haven't even figured out how to do that for custom global allocators yet, that's #442. But indeed maybe something similar could work here. |
For the record, I am at this point largely ambivalent on this question. The |
Box
andVec
have internally used theUnique
pointer type since Forever (TM). The intention was that that gives some some special aliasing rules.Box
is even emittingnoalias
to LLVM, so we might have optimizations already exploit those aliasing rules.However, it looks like none of this is actually surfaced in the documentation (Cc @saethlin), leading to a significant risk that code out there might violate whatever the aliasing rules end up being.
For whatever it's worth, Miri with Stacked Borrows does treat
Box
rather similar to&mut
, but does not do anything special withVec
. I have ideas for how the Next Aliasing Model can treatUnique
special (soBox
itself no longer needs to be special, at least if we decide to "look into" private fields when applying aliasing rules which is its own whole separate discussion), bot nothing concrete yet.For each of these types we basically have two options:
Box
, remove thenoalias
and the special treatment in Miri). This will make some people very sad. Cc @Gankra @nikomatsakisThe text was updated successfully, but these errors were encountered: