-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyze: proposal for simplified analysis and rewriting of allocations #1097
Comments
How about |
|
We could also use |
I'm generally in favor of using standard types, but this is a bit clunky. We could just use our own 3-state type e.g.
|
In some cases this would change behavior (due to the change in aliasing), and I think it would be hard to tell whether a given instance of this rewrite changes behavior or not.
The fact that it's unsafe would be a problem for our safety story. Currently, if all goes well with the analysis and rewriting, then the result has no unsafe code at all. This saves us from needing more complicated analyses, like figuring out whether all fields are actually initialized following
A downside of the 3-state type is that it's not compositional. But I agree it's clunky to write out the type in full. I could see having an alias like |
While implementing this, I ran into an issue involving let p = malloc(...) as *mut i32;
*p.offset(1) = 1;
let q = p;
free(q as *mut c_void);
Option 1: add support for the cast. For example, Option 2: unify Option 3: modify dataflow constraints. Assignments are normally allowed to discard any subset of permissions (the LHS permissions can be any subset of the RHS permissions). We could instead forbid assignments from discarding I've implemented option 3 for now, but it might be cleaner to switch to option 2. |
This branch adds rewrites to convert pointers with the `FREE` permission into `Box<T>`. This follows the dynamic ownership tracking proposal in #1097. Specific changes: * Adds a new `dyn_owned: bool` field to `TypeDesc`, which indicates that the pointer type should be wrapped in a dynamic ownership wrapper. * Extends `ZeroizeType` handling to support zero-initializing some pointer types (necessary to allow `malloc`/`calloc` of structs that contain pointers). * Adds support for `Box` and `dyn_owned` related casts in `mir_op` and `rewrite::expr::convert`. * Adds custom rewrite rules for `malloc`, `calloc`, `free`, and `realloc`.
This was implemented in #1106 |
Correct rewriting of
malloc
/free
toBox<T>
requires tracking ownership throughout the program. This proposal describes an approach that produces correct rewrites with minimal new analysis work, at the cost of introducing run-time panics when the C code does not follow Rust ownership discipline.The main idea is to rewrite owning pointers not to
Box<T>
but toOption<Box<T>>
. TheOption
isSome
if ownership still resides at this location, and it isNone
if ownership has been transferred to some other location. Accessing a location after ownership has been transferred away will cause a run-time panic due to unwrappingNone
.Definitions
A "pointer location" is a local variable or memory location of pointer type. MIR temporaries representing the results of intermediate expressions are included in this definition.
A "heap object" is a region of memory allocated by
malloc
or a similar function. A new object is "live" until it is freed by passing a pointer to the object tofree
or a similar function."Ownership" is a relation between objects and pointer locations. At all points during execution, each live object is owned by one or more pointer locations. (Single ownership of an object corresponds to
Box<T>
; multiple ownership corresponds toRc<T>
.) If a pointer location owns an object, then the value in that location is a pointer to the owned object. In some situations, copying a value from a source pointer location to a destination may also transfer or copy ownership from the source to the destination.Rewrites
This section describes the new rewrites we would perform under this proposal. The next section describes analysis changes that enable these rewrites.
Type rewrites: Many pointer locations function as "borrowed references" and never hold ownership of an object in any execution of the program. These are handled by existing rewrites. For locations that may hold ownership, we add a new rewrite that replaces
*const T
/*mut T
withOption<Box<T>>
.This approach to type rewrites means that for any pointer location that might hold ownership at some points in some executions, we assume that all assignments to that pointer location also grant ownership. We call such a pointer location an "owning pointer location". Note that an owning pointer location is not guaranteed to hold ownership throughout its lifetime; ownership may be transferred away during an assignment from the pointer location in question. This is why we must rewrite to
Option<Box<T>>
rather thanBox<T>
.Expression rewrites: In every (static) assignment operation
q = p
on pointers, the LHS and RHS can each be either an owning pointer location or a non-owning pointer location.Owning to owning: Ownership is transferred from
p
toq
. The assignment is rewritten toq = Some(p.take().unwrap())
, which panics ifp
does not currently hold ownership (that is, ifp
isNone
) and otherwise removes ownership fromp
and grants it toq
.An alternative rewrite for this case would be
q = p.take()
. This similarly transfers ownership ifp
currently holds it, but doesn't panic ifp
does not hold ownership. Instead, the lack of ownership is propagated toq
. This seems likely to delay panics in a way that will make issues harder to debug.Owning to non-owning: A borrowed reference is derived from
p
and assigned toq
. The assignment is rewritten toq = p.as_deref().unwrap()
or similar. This panics ifp
does not hold ownership, and otherwise produces a value of type&T
pointing to the object within theOption<Box<T>>
.Non-owning to owning: This is an error for the rewriter. There is no reasonable safe way to convert
&T
/&mut T
toBox<T>
. The analysis described below is designed such that this case should never occur.Non-owning to non-owning: This is handled by the existing rewrite rules.
Similar rewrites apply to pseudo-assignments. For example,
f(p)
may be rewritten tof(Some(p.take().unwrap()))
if the argument off
is an owning pointer location.We must also rewrite several libc functions:
malloc(size)
:malloc
returnsvoid*
, but we can recover the typeT
being allocated using the existing pointee type analysis. Ifsize
is a multiplen * size_of::<T>()
, then we rewrite themalloc
call either toBox::new(T::default())
(ifn == 1
and the return value lacks theOFFSET
permission) or toBox::new([T::default(); n]) as Box<[_]>
. In cases whereT
doesn't implementDefault
and/orCopy
, we can generate alternative rewrites such asBox::new(T { field1: 0, field2: None, ... })
oriter::repeat_with(T::default).take(n).collect::<Box<[_]>>()
.free(p)
: We rewrite this todrop(p.unwrap())
, which will deallocate the object pointed to byp
and will panic ifp
does not currently hold ownership of any object.As with owning-to-owning casts, this could be made not to panic by instead rewriting to
drop(p)
. However, removing this panic might hinder debugging. Whenfree(p)
is called andp
does not hold ownership, either the pointer that does hold ownership of the object was (or will be) freed elsewhere, or it will go out of scope without an explicitfree
call. The first represents a double free in the C code. The second represents a bug in our analysis: we rewrote the program in a way that transfers ownership away fromp
, but the ownership should have been left in place.realloc(p, size)
: We first infer the typeT
and countn
as in themalloc
case, then rewrite into something like the following:This moves all values in
p
to a new object (allocated as withmalloc(size)
), either truncating or padding withT::default()
to lengthn
, and stores the newBox<[T]>
intoq
. The old object is deallocated as withfree(p)
. Similar tofree
, attempting to use a pointer location that doesn't currently hold ownership will result in a panic.Analysis
For distinguishing owned and non-owned pointer locations, we can use the existing
FREE
permission.FREE
is set on the arguments of the libcfree()
function and is propagated backwards. This means all pointer locations from which a pointer might flow tofree
will be considered owning, and all others will be non-owning.Because
Box<T>
cannot be used for pointers to the stack (unlike&
/&mut
), it may be useful for debugging to add a simple analysis to identify pointer locations that may contain pointers to the stack. This can be implemented as a forward-propagatedHEAP
permission to identify pointers that must point to the heap (or be null). This can be implemented likeNON_NULL
: setHEAP
initially for all pointer locations, and remove it for pointers that result from&local
or a similar expression.In the future, we may want to implement a linearity analysis to detect cases where a single object is potentially owned by multiple pointer locations (and thus should be rewritten using
Rc<T>
). For this proposal, we don't use such an analysis and instead introduce panics in cases of multiple ownership.Interaction with
NON_NULL
In this proposal, we rewrite all potentially-owning pointer locations to
Option<Box<T>>
. This means a pointer location that is both owning and nullable would have the typeOption<Option<Box<T>>>
, which is confusing. For clarity, we may wish to use a different type, such asResult<Box<T>, OwnershipLost>
(whereOwnershipLost
is a zero-sized error type), for owning locations. Then a nullable owning pointer location would have typeOption<Result<Box<T>, OwnershipLost>>
, whereNone
means the pointer location was assigned null,Some(Ok(p))
means it was assigned a valid pointer and still has ownership of the object, andSome(Err(OwnershipLost))
means it was previously assigned a valid pointer but ownership was since transferred away.The text was updated successfully, but these errors were encountered: