-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add no_std once primitive for stdlib deserialization #1463
Add no_std once primitive for stdlib deserialization #1463
Conversation
@bobbinth I could only find these spin-lock based solutions for no_std lock: https://docs.rs/lock_api/0.4.12/lock_api/ Given that the deserialization took ~2ms on my M1, I think whether a spin lock is appropriate here depends on the type of access expected from I have muddied benchmarks somewhat for now. Just out of interest I wanted to see how it performed through the lock + clone:
Given the performance there, we might be fine with the clone. LMK if you would rather I turned it into a shared reference. Also LMK if I should keep the benchmarks as before, without the lock. LMK any other thoughts you have on this. 🙏 |
d80efd0
to
2ff529d
Compare
Thank you! Not a review yet but a few answers/comments:
I think cloning is fine, and also once #1465 is merged, we'll be using shared references for the underlying
I think the expected access pattern is that we'd make relatively few calls to A few other thoughts about the locks: In the
Overall, if the race-based approach can be implemented relatively easily, I think this would be a better alternative to using spinlocks (caveat, my knowledge of spinlocks is very limited and is mostly based on this article). But also curious what @bitwalker and @plafer think. |
Thats great context, thanks. I hadn't noticed the existing spin lock impl! Here is a commit swapping out the spin lock with |
+1 to this, no reason not to use
This is what we should use in no-std environments for most things, and is one of the reasons why it was added in the
Legitimate arguments against spinlocks, IMO, only apply when you have better locking primitives available you should be using instead (the bulk of the argument in that blog post, and not applicable here, as we will use better primitives when building for Using a race-style approach for something like Anyway, I don't have a strong opinion about which thing we use here, the RwLock primitive implemented in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach looks fine to me for no_std
environments, specifically since IIUC in the worst case, multiple threads that need the StdLibrary
at the beginning will all "redo" the work of deserializing it - but all subsequent calls to StdLibrary::default()
will see no synchronization overhead. I can't think of scenarios where this would be undesirable.
So if I followed the thread correctly, we could keep this implementation in no_std
environments, and use std::sync::LazyLock
in std
environments?
Depending on how complex it is, I think my preferred approach would be to do something similar for
Then, we can use The biggest uncertainty here is how much effort would it take to implement our version of |
@bobbinth why copy code instead of importing the library? I'm against copying code primarily because if they find a bug and fix it, we don't get the fix (and don't get notified that there's a bug). I'm also not comfortable implementing a Note: I also didn't know about our All this to say that I actually prefer to see #[cfg(not(feature = "std"))]
// use of `once_cell
#[cfg(feature = "std")]
// use of `LazyLock` than a "nicer" |
While I'm in 100% in agreement that we should just use
First - there is virtually no I mean, yes, I may be way more familiar with the intricacies of implementing things like this than the rest of the team, but I think you are also overstating the difficulty of implementing a primitive like that. Especially because in our specific implementation we are able to rely on simplifying assumptions about what no-std environments we support (i.e. just Wasm, where there is no OS, no threading, and no hardware interrupts). The reason why these primitives have a reputation for being hard/tricky, has more to do with how they are used than their actual implementation, and even on the implementation side, it is isn't because the essential complexity is high, but because of the desire to aggressively reduce the overhead of synchronization in the general case, that implementations can get very hairy, very quick. We don't need to care about any overhead, since on Wasm, atomics are lowered to regular loads and stores anyway, and as a result our implementation can remain very simple. The "C++" memory model used by Rust atomics is confusing to most people, because if your target environment has a relaxed memory model (primarily ARM and RISC-V, as x86 and Wasm are both strict), it requires you to understand the degree to which the compiler and the hardware can modify the original order of the program, namely with regard to loads and stores, without the use of any barriers or synchronization. However, this stuff is extremely well documented, for obvious reasons, and there are great testing tools available for it to boot. Our implementation is actually run through a form of model checker that tries different valid interleavings of the various atomic operations, to ensure that our implementation does not violate the desired semantics (such as letting two callers acquire a lock, or a deadlock). In any case, here I think there is little reason to implement our own
I don't disagree with this in principle, but again, we're not talking about implementing our own primitives for all platforms we support, just the one no-std environment out of those we do support (WebAssembly), so I don't find this to be a compelling reason to avoid implementing things ourselves in general. Bringing in dependencies also imposes an obligation on us, and one that can be misleading in the effort required. Many libraries are maintained by one person, and barely that, and vetting the implementations can be difficult (if not already well vetted, like To be clear, I agree in this instance that
This is what testing is for IMO, this same reasoning could be used just as easily to reject the use of a dependency (we have no defense against a maintainer merging a change that introduces a bug after we start depending on it). You have to plan defensively either way. TL;DR: I agree with @plafer that we should just use |
I was not aware of this, and knowing this changes my reaction by a margin. AFAIK, this is not documented anywhere. I think this should be documented loud and clear in the README that we only officially support and test on WASM for
You're right; I only glanced at it pretty quickly. I didn't mean it to be the main focus of my argument, and certainly was not implying that we ended up using unsafe code just for the fun of it.
As mentioned in the first point, indeed if we only care about an arguably simple WASM environment, then we can even use these abstractions incorrectly and they'll end up being compiled down to correct code (since as you mentioned the WASM environment doesn't try to be fancy with the "multicore guarantees" it provides). And we get to test that code extensively, so once again, most of what I described doesn't apply anymore. TLDR:
|
Great discussion! A few points from me:
Overall, for this specific issue, I think the most important point for me is point 1 - i.e., having a uniform interface for initializing static content both in |
Thanks guys. Got a bit on ATM so I'll aim to have something for review next week. |
8a194aa
to
32eaa25
Compare
Based on my reading of the comments, I understand that @plafer and @bitwalker would prefer to import The latest changes show an implementation of LazyLock's interface using the primitives behind This has the same interface for std and no_std: #[cfg(feature = "std")]
pub use std::sync::LazyLock;
#[cfg(not(feature = "std"))]
pub use racy_lock::RacyLock as LazyLock; The alternative would look something like this (different interface for std vs. no_std): #[cfg(feature = "std")]
pub use std::sync::LazyLock;
#[cfg(not(feature = "std"))]
pub use once_cell::race::OnceBox; PR has some fundamental unit tests and is still WIP until we reach consensus on importing vs. implementing. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! I left a couple of small comments inline. But would also be great to get reviews from @plafer and @bitwalker.
Also, I would probably put LazyLock
into the sync
module. The overall structure could look something like this:
utils
└── sync
└── mod.rs
├── racy_lock.rs
└── rw_lock.rs
In the above, rw_lock.rs
would basically be the current utils/sync.rs
.
core/src/utils/racy_lock.rs
Outdated
/// Thread-safe, non-blocking, "first one wins" flavor of `once_cell::sync::OnceCell` | ||
/// with the same interface as `std::sync::LazyLock`. | ||
/// | ||
/// The underlying implementation is based on `once_cell::sync::race::OnceBox` which relies on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably include a link here (I think https://github.com/matklad/once_cell/blob/c48d3c2c01de926228aea2ac1d03672b4ce160c1/src/race.rs#L294 is the right one?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also add an # Acknowledgements
section to core/README.md
in which we explicitly state that core/src/utils/racy_lock.rs
has some code taken from once_cell
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also nit: cargo doc
suggests we add brackets to the hyperlink (<https://github.com/matklad/once_cell/blob/v1.19.0/src/race.rs#L294>
) (for which an actual link is generated in the docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have found a data race in the Drop
implementation - which is a bit concerning, since it was copied from OnceBox
. Could you all please double-check this? @bitwalker @bobbinth @sergerad
core/src/utils/sync/racy_lock.rs
Outdated
let ptr = *self.inner.get_mut(); | ||
if !ptr.is_null() { | ||
drop(unsafe { Box::from_raw(ptr) }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this was copy/pasted from OnceBox::drop()
, but isn't there a possible double-free here? Basically, 2 threads could enter the !ptr.is_null()
if, and call drop on the Box
.
I'm not sure what would be the safe way to drop yet, but at least if we look at how it's done with Arc
, it's a lot more complex (and clearly takes thread synchronization into consideration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but this lock is not Clone
! So thinking of it like an Arc
is wrong. IIUC, the only case then that 2 threads case race is if RacyLock
is used in a static variable, but in the case of a static variable, drop()
is never called. Hence, drop()
is only ever called when a single thread has access to the lock, and therefore synchronization is not needed.
If this is correct, can we add this explanation (cleaned up) to a Safety
comment above the unsafe
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop
requires a mutable reference, so no other references can exist, and thus there can be no race.
+100 on adding a Safety
note on unsafe
blocks, but in this case I don't think it is necessary, since the unsafety here is due to the requirements of Box::from_raw
, not Drop
. I would like to see Safety
docs only when our implementation of something assumes some safety invariant, rather than when we're interacting with documented safety invariants (unless it is not clear from the context how those invariants are being upheld). I'll always err on the side of more, rather than less, documentation though, so I'll defer to @plafer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true of the RacyLock
value, but my concern was about a data race on the pointer wrapped in self.inner
(i.e. what's manipulated in the unsafe
block, and then explicitly dropped).
I still think it's worth adding a Safety
comment, as the reason why we don't need any synchronization in drop()
is nontrivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plafer I have updated the relevant comments in this commit. I struggled to formulate the # Safety
section because it didn't quite add up to me. Strictly speaking I don't think this definition for when a # Safety
block is required applies here:
Unsafe functions should be documented with a "Safety" section that explains all invariants that the caller is responsible for upholding to use the function correctly.
I say this because AFAICT the current implementation fulfils all the necessary invariants without any responsibility on the caller. I think we are more interested in explaining why the function is safe, rather than guiding users as to how to use it safely. So I have added explanatory comments to that effect.
Regarding the addition of Clone
:
The reasoning being that e.g. if we were to implement Clone on RacyLock, it would break the entire drop() implementation
This is assuming we would be implementing Clone
in a manner which performs a shallow copy right? So that the drop()
would cause a double free? I'm not sure it would ever make sense for us to do this. Note that core::sync::AtomicPtr
is not Clone
.
In any case, that is a hypothetical that is outside of the current implementation's invariants. So, again, it feels awkward to explain that under # Safety
.
LMK any further changes you would make around this. I was just worried the # Safety
block would add confusion rather than provide help. LMK if I am mistaken anywhere! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the # Safety
block in the method docstring could mislead users of the method.
My only goal here is to point future reviewers of the implementation in the right direction as to why the drop()
implementation will never result in a double-free. The assumption is that most reviewers (myself included) will probably not use AtomicPtr
often. For example, if you follow my stream of consciousness at the beginning of this thread, I mistakenly assumed AtomicPtr
worked like an Arc
(i.e. new owners of the underlying pointer can be created by cloning the AtomicPtr
), and so in that model, the drop()
implementation could lead to a double-free. I only later realized that my model of what an AtomicPtr
is was incorrect - the only scenario where we can have multiple owners of the underlying pointer is when the AtomicPtr
is used in a static variable. Otherwise, e.g. when used as a stack variable, it can be moved to other threads, but never cloned, so you'll only ever have one owner (at least that's my current understanding). But static variables never have drop()
called, and so in the only scenario where there could be multiple owners, drop()
is never called - hence, you can assume a single owner in drop()
and forego any synchronization.
To reformulate, I want to make it easier for future reviewers (myself included) to get to that reasoning, assuming they don't regularly think about how AtomicPtr
s work, whether or not they are cloneable, and how that affects the correctness of the drop()
implementation.
So taking everything into consideration, I would go back to my previous suggestion of adding a safety block above this line and give a more concise version of the above explanation as to why that unsafe block is used safely. For example,
// SAFETY: for any given value of `ptr`, we are guaranteed to have at most a single instance of
// `RacyLock` holding that value. Hence, synchronizing threads in `drop()` is not necessary, and we
// are guaranteed never to double-free. In short, since `RacyLock` doesn't implement `Clone`, the
// only scenario where there can be multiple instances of `RacyLock` across multiple threads
// referring to the same `ptr` value is when `RacyLock` is used in a static variable - but `drop()`
// is never called for static variables.
drop(unsafe { Box::from_raw(ptr) });
I personally prefer to be a bit too wordy rather than not, since a bit more context is likely to help the reviewer that's not very familiar with AtomicPtr
and lock implementations, while the seasoned reviewer can just skip the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, TY. I omitted the last sentence about the static variables because (purely hypothetically) the invocation of a static variable's drop()
would not be an issue because there is only ever a single such variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, agreed, your comment is indeed more accurate.
However I noticed that you put it in the method's docstring instead of over the unsafe{}
block, and in your previous comment you mentioned that you were concerned that a # Safety
block in the docstring might be confusing to users (which I agree with you). Feel free to move it inside the method if you prefer that - I'll approve the PR regardless. Up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
0f07bd2
to
3d1017b
Compare
3d1017b
to
51b5ca3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for this great work, and thank you for bearing with me 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good! Thank you for the great work!
Based on comment here:
miden_core::utils::sync::racy_lock
module forno_std
environments.miden_core::utils
to re-exportstd::sync::LazyLock
orracy_lock::RacyLock as LazyLock
forstd
andno_std
, respectively.rw_lock
module.StdLibrary::default()
to use above mentionedLazyLock
.