Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ItemSliceSync safe #1231

Open
martinvonz opened this issue Jan 3, 2024 · 15 comments
Open

make ItemSliceSync safe #1231

martinvonz opened this issue Jan 3, 2024 · 15 comments

Comments

@martinvonz
Copy link
Contributor

martinvonz commented Jan 3, 2024

Current behavior 😯

ItemSliceSync::get_mut() takes an non-mutable reference and an index. It doesn't check if there are multiple callers accessing the same index, so that's left to the caller to do (the function is therefore marked unsafe). I can't tell if the callers are actually guaranteed not to access the same index concurrently. My guess - without looking very carefully - is that the requirement can be violated by a bad packfile.

Expected behavior 🤔

Maybe ItemSliceSync should be replaced by a list of RefCell to make it safe?

@Byron
Copy link
Member

Byron commented Jan 3, 2024

Thanks for bringing this up!

I also would love to get rid of all unsafe. In this case, it's used to allow access to a size-optimized vec of items that form a tree, and we need to access it concurrently with each thread taking its own root node, and working its way through all the reachable leaves. The tree was built by decoding a pack whose entries refer to its bases only by OFS_DELTA - they are pointing backwards only which assures a correct tree. REF_DELTA entries aren't supported here, and cause immediate failure - they are expected to have been resolved before as part of the thin-pack handling. This is also the reason for this unsurmountable issue.

If we somehow would allow REF_DELTA entries to point to an in-pack object, then in theory malicious packs could cause all kinds of graphs as they can point anywhere in the pack, but they still can't link an entry to more than one base. And that's what one would really have to do for two threads to encounter the same child.

Thus I believe it's impossible for this data structure to end up in a place where it violates its assumption. Maybe that's wrong somehow, but I don't see it yet.

Maybe ItemSliceSync should be replaced by a list of RefCell to make it safe?

Even if the performance-drop when doing that would be negligible I'd think that the increase in memory consumption will not be that. Last time I eyeballed the memory consumption gitoxide came in with ~20% less than Git when dealing with the linux kernel pack, and that was hard-earned already.

Would you consider closing the issue until there is more evidence something needs to be done?

@martinvonz
Copy link
Contributor Author

The tree was built by decoding a pack whose entries refer to its bases only by OFS_DELTA - they are pointing backwards only which assures a correct tree.

Ah, I see. I agree that it should be safe then.

Would you consider closing the issue until there is more evidence something needs to be done?

I'll defer that to @Manishearth who did the unsafe review of gix-pack this time. Manish, do you think a comment explaining why it's safe would be sufficient?

@Manishearth
Copy link
Contributor

Would you consider closing the issue until there is more evidence something needs to be done?

As an unsafe reviewer the standard I tend to apply is that it should be very easy to verify that the invariant us upheld (not just a comment claiming it is). So I think what I would need is:

  • A comment and sufficent access control on the fields that have these invariants
  • (Ideally) methods that impact these fields are marked unsafe
  • Code which modifies these fields should justify that the invariants are upheld. This is easier if the methods that modify/construct the fields are all marked unsafe.

Unless the comment is sufficiently self-evident and does not require jumping around a lot of code to verify. Even then I'd like the code it references to also have comments to that effect.

Byron added a commit that referenced this issue Jan 4, 2024
Inspired by #1231 (comment) .

- make sure the type in question isn't use outside of its designated module
- improve documentation around safety to make the underlying data structure
  more obvious.
Byron added a commit that referenced this issue Jan 4, 2024
Inspired by #1231 (comment) .

- make sure the type in question isn't use outside of its designated module
- improve documentation around safety to make the underlying data structure
  more obvious.
@Byron
Copy link
Member

Byron commented Jan 4, 2024

Thanks for chiming in - I like your standard as it's easy to follow and thus hopefully just as easy to apply.

Trying just that, I created #1233.

  • A comment and sufficent access control on the fields that have these invariants

Access control was improved by restricting it more.

  • (Ideally) methods that impact these fields are marked unsafe
  • Code which modifies these fields should justify that the invariants are upheld. This is easier if the methods that modify/construct the fields are all marked unsafe.

I didn't propagate-up the unsafe just yet, but reviewed and improved the safety comments. The reason it's safe lies in the way the data structure is built and traversed, and that I tried to make clearer.

It's interesting that the ItemSliceMut (and related types in gix-features) are only needed after a change in rustc which prevented the usage of raw pointers - previously this type would just have been a pointer access. I think this was done to enforce the declaration of Send and Sync accordingly, but it's just a guess. I don't know of this information could help to improve the documentation further, or the way it is used.

Byron added a commit that referenced this issue Jan 4, 2024
Inspired by #1231 (comment) .

- make sure the type in question isn't use outside of its designated module
- improve documentation around safety to make the underlying data structure
  more obvious.
Byron added a commit that referenced this issue Jan 4, 2024
Inspired by #1231 (comment) .

- make sure the type in question isn't use outside of its designated module
- improve documentation around safety to make the underlying data structure
  more obvious.
Byron added a commit that referenced this issue Jan 4, 2024
Inspired by #1231 (comment) .

- make sure the type in question isn't use outside of its designated module
- improve documentation around safety to make the underlying data structure
  more obvious.
@Manishearth
Copy link
Contributor

@Byron I think that's a start but still not easily reviewed.

I'm also not convinced on that Send and Sync bound, the safety comments on the Send and Sync bounds are definitely not correct at the very least.

@Byron
Copy link
Member

Byron commented Jan 4, 2024

I think the main culprit is that ItemSliceMut is really a way to access T: Send items mutably from multiple threads, and there are no guardrails at all. There was a time when this was *const T and that was transmuted into *mut T as needed. That, however, stopped working probably because of the Send + Sync bounds, and ItemSliceSync was born.

By itself there is no way this can be considered safe, as it is only 'not unsafe' if its used correctly, and that usage is described in detail, but the code for that is in a couple of places and certainly not too easy to follow.

With that said, I don't know what else to do about this.

@Manishearth
Copy link
Contributor

Yeah the main thing to be careful about is that &T is Send only if T is both Sync and Send, which is not reflected in those bounds.

a RefCell is Send, but not Sync, so an &RefCell is not Send

@Byron
Copy link
Member

Byron commented Jan 4, 2024

I wish this could be reflected somehow, after all the T that's actually stored is Send + Sync, and it was never the intention to allow T that are &Type or Type<'_>. Maybe T can be constrained to not be a reference somehow?

@Manishearth
Copy link
Contributor

No, your type is logically an &T, it's not about whether T is an &Type, it's about whether it is RefCell. If your T is allowed to be RefCell your code is unsound.

@Byron
Copy link
Member

Byron commented Jan 5, 2024

This definitely compiles even though it shouldn't:

diff --git a/gix-pack/src/cache/delta/traverse/util.rs b/gix-pack/src/cache/delta/traverse/util.rs
index 95789b50b..79602ada1 100644
--- a/gix-pack/src/cache/delta/traverse/util.rs
+++ b/gix-pack/src/cache/delta/traverse/util.rs
@@ -59,3 +59,18 @@ unsafe impl<T> Send for ItemSliceSync<'_, T> where T: Send {}
 //         `get_mut()`, we only ever access one T at a time.
 #[allow(unsafe_code)]
 unsafe impl<T> Sync for ItemSliceSync<'_, T> where T: Send {}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::cell::RefCell;
+
+    #[test]
+    fn type_cannot_be_refcell() {
+        type Test<'a> = ItemSliceSync<'a, RefCell<String>>;
+        let mut item = RefCell::new("hi".into());
+        let value = Test::new(std::slice::from_mut(&mut item));
+        fn use_it(_slice: &Test<'_>) {}
+        use_it(&value);
+    }
+}

To me it's a matter of unsafe abstraction - the way ItemSliceSync works is something that was added relatively recently and probably already was an improvement at the time as it assures as &mut [T] is presented, so Rusts lifetime logic will be intact until then, with all the unsafe usage being internal.

Now the question is how this "array with members that are (trust me) owned by a single thread at a time (but accessed by reference)" can be expressed so that ideally Rusts other safeguards aren't removed like what's done here.

@Manishearth
Copy link
Contributor

Manishearth commented Jan 5, 2024

Now the question is how this "array with members that are (trust me) owned by a single thread at a time (but accessed by reference)" can be expressed so that ideally Rusts other safeguards aren't removed like what's done here.

There are a couple patterns that help with this. If you're doing parallel tree traversal the best thing is tohave a little bit of unsafe in the splitting code (effectively a vector based split_at_mut() type of thing), and be extremely strict about ownership elsewhere.

The core problem here is that the tree is not structured in an ownership-based way, so you're having to stitch this together across different parts of the code.

That said the main invariant is on Item.children. Maybe I can submit a PR that makes that super clear.

@Manishearth
Copy link
Contributor

Okay, the invariant is trickier because future_child_offsets is also weird.

@Byron
Copy link
Member

Byron commented Jan 5, 2024

I dug in and added this chunk of text. In summary, this is for handling a special case where ref-delta entries point to their base by id which is in the future of the current entry, not in its past. Thus these yet-to-be-seen parent objects are hooked up with their children when finishing the tree. Such packs exist only at rest and are untypical.
I hope that helps (and I am glad I added this documentation now).

diff --git a/gix-pack/src/cache/delta/mod.rs b/gix-pack/src/cache/delta/mod.rs
index 435008edf..c9d498ef1 100644
--- a/gix-pack/src/cache/delta/mod.rs
+++ b/gix-pack/src/cache/delta/mod.rs
@@ -53,6 +53,18 @@ pub struct Tree<T> {
     last_seen: Option<NodeKind>,
     /// Future child offsets, associating their offset into the pack with their index in the items array.
     /// (parent_offset, child_index)
+    /// In thin-packs, ref-deltas entries point to an object by its id. If this is a thin pack then this
+    /// would have been handled by [`LookupRefDeltaObjectsIter`](crate::data::input::LookupRefDeltaObjectsIter)
+    /// already.
+    /// Thus `child_index` points to a resolved ref-delta entry which pointed to an object within this pack
+    /// that wasn't seen yet, hence it points to a yet-to-be-seen pack entry.
+    /// And these we store and resolve later once we have seen all entries in the pack.
+    /// Note that such pack are atypical, as packs at rest should only ever use ofs-deltas to refer to parent entries.
+    /// In practice though, these packs with ref-delta entries exist at reset as well.
+    /// They are even produced by [Azure Git servers](https://github.com/Byron/gitoxide/issues/1025),
+    /// which is when we cannot handle them at all as we cannot know where an entry is in its pack by knowing its ID
+    /// without having an index yet, which we don't have as we are streaming the pack right now and thus try to produce
+    /// said index.
     future_child_offsets: Vec<(crate::data::Offset, usize)>,
 }
 

@Manishearth
Copy link
Contributor

It's fine I figured it out, I've got a partial PR.

It wasn't that it's confusing, it's that the invariant has multiple caveats which is tricky for invariants of this form.

@Manishearth
Copy link
Contributor

#1237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants