-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast path for vec.clear/truncate #64375
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
The wording here suggests we will make it a commitment that this will stay O(1) forever. However, @bors try |
Guarantee vec.clear/truncate is O(1) for trivial types For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
This change came from an internal discussion, where a developer has avoided use of Why |
I refer you to the documentation of
|
I know the documentation, but it seems overly conservative. I wouldn't be surprised if it couldn't guarantee precision for trait objects or even structs, but for basic built-in types? |
I think it becomes a question of "what is the criteria?" then... Maybe we could guarantee it for |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Try build successful - checks-azure |
It's funny that LLVM optimized the whole loop better than my It's great that there are tests for it. |
Heh... guess we need another builder for it again then: @bors try |
⌛ Trying commit 223600a with merge 3a66acf73fc62246576fedf73a80249f523d0e3d... |
☀️ Try build successful - checks-azure |
@rust-timer build 3a66acf73fc62246576fedf73a80249f523d0e3d |
Success: Queued 3a66acf73fc62246576fedf73a80249f523d0e3d with parent fe6d05a, comparison URL. |
} | ||
} else if len <= self.len { | ||
self.len = len; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this function just implemented as:
if len >= self.len { return; }
let s = &mut self[len..] as *mut [_];
self.len = len;
ptr::drop_in_place(&mut *s);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @SimonSapin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know, maybe it’s older than drop_in_place
?
One difference however, as hinted by the code comment, is that the explicit loop + SetLenOnDrop
doesn’t leak other items when T::drop
panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know, maybe it’s older than drop_in_place?
Might be.
One difference however, as hinted by the code comment, is that the explicit loop + SetLenOnDrop doesn’t leak other items when T::drop panics.
Yes, I think this leaks as little as it possibly can - only some fields of the T
for which T::drop
panics might be leaked with the current implementation.
I don't see this guaranteed or documented anywhere, but it is a reasonable thing to do. We don't do this for, e.g., Vec<T>::drop
, but observing a partially dropped Vec
is hard, while if truncate
panics, then maybe observing a partially truncated Vec
is easier ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then maybe observing a partially truncated Vec is easier ?
Well yes, it is really easy:
let mut vec: Vec<T>;
catch_unwind(AssertUnwindSafe(|| vec.truncate(N)));
// partially-truncated vec easily observable
Does ptr::drop_in_place([T])
stop dropping elements the first time a panic happens ? Or does it continue dropping elements and re-raises the panic at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ptr::drop_in_place([T]) stop dropping elements the first time a panic happens ?
No.
Or does it continue dropping elements and re-raises the panic at the end?
It continues dropping elements, and re-raises the panic at the end, just like for all other aggregates. If a second element panics, then we have a double panic. See here.
So the main difference between this implementation and one using ptr::drop_in_place
, is that this one stops dropping elements the moment there is a panic, and with ptr::drop_in_place
, all elements are always dropped, except at most one if it panics, and if more than one panic the program aborts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorn3 so that example works with the current implementation, but it wouldn't work if we were to use ptr::drop_in_place
(it would abort due to a double panic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn’t realize drop_in_place
did that. So maybe it’s preferable to use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific implementation has more instructions for clear
, because it changes unconditional store 0
into a check. Changing it to if len > self.len { return; }
generates the same code for u8
. Drop cases are different of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Yes, with if len > self.len { return; }
the assembly for Vec::clear
looks much nicer: https://rust.godbolt.org/z/6pSj7K
Finished benchmarking try commit 3a66acf73fc62246576fedf73a80249f523d0e3d, comparison URL. |
Perf looks like mostly noise -- might be worth other forms of benchmarks to show this is useful... r? @rkruppe |
Yeah, I believe this mostly improves debug builds. Because the rustc used during benching by @rust-timer is build in release mode, it won't have much effect. I personally use https://github.com/ebobby/simple-raytracer to bench perf of cg_clif. In debug mode it takes ~15s to run, so you can easily run it many times for less noise using for example |
Insufficient permissions to issue commands to rust-timer. |
I wouldn't expect this to help at all when code is optimized, and perf runs a rustc compiled with optimization. I also don't need any benchmarks to be rather sure that this helps some programs compiled without optimizations. The only question is whether that is worth doing at all, which IIRC was a bit contentious in the past 🤷♀ Will wait a bit with r+ in case anyone objects on that basis (and to take some time to review the code properly) but let's hope not. |
To me the main motivation is not O0 performance, but a guarantee that Could this be solved by documenting that libstd commits to having |
@alexcrichton There is a different between "runtime improvements" and "algorithmic complexity improvements". Leaving "algorithmic complexity improvements" to LLVM feels risky, and having them depend on the optimization level makes the algorithms unusable for some. In this particular case, there is a proposal that's simpler, shorter, more maintainable, and has better semantics and algorithmic complexity than the current code. |
I won't pretend to speak for the libs team, but I would personally be ok with documenting that drop/clear/truncate/etc all are O(1) for |
For types without a The standard library cannot commit to this without a corresponding language level guarantee. Or in other words, if the standard library does commit to it, you have also forced it on the language because it's not just the rustc implementation the guarantee is being made for, but for all implementations. If you want such a guarantee for |
@Centril IIUC the intent is to provide this optimizations for types that, when dropped, do not invoke any cleanup code. Providing this guarantee for I do think that it is problematic that we don't even have a name for these types, much less facilities to detect them. I often see users confused about this - they think that's what So I would actually be very happy if the lang team could take a look at this issue. However, I would prefer to move the discussion about providing such guarantee somewhere else (maybe an issue the RFC repo?), and focus here on landing an incremental improvement instead. |
If by "modify the language" you mean "change code in the compiler" then no, you don't necessarily have to modify anything. However, the guarantee the standard library forwards needs to be backed by something at the language level (in this case
I agree (it's what I suggested in #64375 (comment)). |
After considering all the points raised here yet again, I've decided I will go ahead and merge this as-is. There's been a bunch of different discussion streams and directions so here's my rationale for merging as-is with respect to each of them:
@bors r+ |
📌 Commit 223600a has been approved by |
Fast path for vec.clear/truncate For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
Fast path for vec.clear/truncate For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
Fast path for vec.clear/truncate For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
Fast path for vec.clear/truncate For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
Fast path for vec.clear/truncate For trivial types like `u8`, `vec.truncate()`/`vec.clear()` relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer. Avoiding this busywork is exactly what `mem::needs_drop::<T>()` is for.
Rollup of 17 pull requests Successful merges: - #63846 (Added table containing the system calls used by Instant and SystemTime.) - #64116 (Fix minor typo in docs.) - #64203 (A few cosmetic improvements to code & comments in liballoc and libcore) - #64302 (Shrink `ObligationCauseCode`) - #64372 (use randSecure and randABytes) - #64374 (Box `DiagnosticBuilder`.) - #64375 (Fast path for vec.clear/truncate ) - #64378 (Fix inconsistent link formatting.) - #64384 (Trim rustc-workspace-hack) - #64393 ( declare EnvKey before use to fix build error) - #64420 (Inline `mark_neighbours_as_waiting_from`.) - #64422 (Remove raw string literal quotes from error index descriptions) - #64423 (Add self to .mailmap) - #64425 (typo fix) - #64431 (fn ptr is structural match) - #64435 (codegen: use "_N" (like for other locals) instead of "argN", for argument names.) - #64439 (fix #64430, confusing `owned_box` error message in no_std build) Failed merges: r? @ghost
Halfway decent "optimisation" (rather a lack of pessimisation) at O0/O1 is very important for a quite a few cases with the most prominent (maybe) being embedded: If you want to debug on a microcontroller you're mostly limited to None of the Rust competitors do have this particular problem, so it's really a problem hurting the reputation of Rust as a viable systems programming language and thus should be addressed, even with some priority. |
For trivial types like
u8
,vec.truncate()
/vec.clear()
relies on the optimizer to remove the loop. This means more work in debug builds, and more work for the optimizer.Avoiding this busywork is exactly what
mem::needs_drop::<T>()
is for.