Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In place initialization of the Matter object, the buffers and subscriptions #198

Merged
merged 3 commits into from
Sep 14, 2024

Conversation

ivmarkov
Copy link
Contributor

@ivmarkov ivmarkov commented Jul 7, 2024

(UPDATED: The section "Why is this PR still a draft" with recent progress.)

What follows below is a "mini-RFC" of sorts, justifying why we need this change, how it is implemented, next steps and so on.

What is the problem?

Initialize the Matter object, as well as other large objects (like IM handler's PooledBuffers) by avoiding the possibility of stack blowups.

Status Quo

This problem is already solved actually!

The reason why we can initialize the Matter structure... (and other structures, but from now on I'll talk only about Matter as everything applicable to Matter is also valid for the other large non-Future structures we are having, like PooledBuffers and Subscriptions)... so the reason why we can initialize it without risking stack memory blow-ups is because it has a const constructor. In other words, Matter::new(...) can be called from a const context.

How does that help?

  • You can simply use static_cell's ConstStaticCell and simply do:
static MATTER: ConstStaticCell<Matter> = ConstStaticCell::new(Matter::new(...));

... what the above would do is that it will reserve a place for the Matter object in the data section of the executable, and the linker startup script will initialize the Matter structure (and other structures in the data section) - upon program startup - with its value. This is possible, because Matter::new is const and so is ConstStaticCell::new. Therefore, the object layout of ConstStaticCell<Matter> is generated at compile time, saved (in the form of a byte sequence) in the linker data elf section, and then upon program startup, the whole data section is copied from the elf (or from flash) into RAM by the linker startup using a simple memcpy (or a memcpy-like) routine.

  • You can also do all sorts of equilibristics at runtime, and the compiler usually optimizes it. Like (excuse any typos):
const MATTER: Matter = Matter::new(...);

let boxed_matter = unsafe {
    let boxed_matter: Box<MaybeUninit<Matter>> = Box::new_uninit();

    // This is optimized by the compiler to do a similar `memcpy` as to when the `data` section is initialized by the linker,
    // because we are copying a `const` value into the `*mut Matter` raw pointer.
    // Be careful to inline this code and not hide the `MATTER` const behind a function though, as this optimization might 
    // not trigger then.
    boxed_matter.as_mut_ptr().write(&MATTER);

    boxed_matter.assume_init()
}

// `boxed_matter` is now initialized, without any stack blow-ups
...

Why are we solving (again) an already solved problem then?

Using const-initializers comes with one disadvantage: flash size. Since the default value of the Matter object is not all 0s (or uninitialized data) - and if you can trust me - it is very, very difficult to come up with a default Matter object state which is all 0s and/or MaybeUninit (*) - ultimately - the Matter object is placed in the data section instead of in bss, which means the initial layout of the object has to be burned in flash.

Depending on the size of all buffers, number of sessions, number of supported fabrics and so on, the total size of the Matter stack is anywhere between 30KB to 70KB.

Now, this is nothing for embedded Linux scenarios, and is probably not that much for Espressif MCUs which have 4MB flash. But it might be something for other MCUs which generally have less flash size.

(*) Coming up with "all-zeroes" default object state in Rust is very, very difficult.

Stuff which is a useful default and which is "all-zeroes":

  • false in bool
  • 0 in u8, u32, u16 etc
  • None in Option<NonZeroU8>, Option<NonZeroU16>, etc

Stuff which is a useful default and which is NOT "all-zeroes":

  • "" in &str (&str is a fat pointer and is NOT modeled - like in C++ - with (nullptr, 0), but with (alignof(&str), 0) i.e. it ends up having a binary value of 0x0000000100000000 (on 32bit archs)
  • b"" (empty slice) in &[u8] - ditto as for &str
  • None for Option<Foo> where Option<Foo> is not special-cased by the compiler as in e.g. Option<NonZeroU8>. Especially with Option<Foo> where Foo is an enum, the compiler often plays trick where it models None as an extra value in the Foo's enum tag, and this extra value in the tag is almost never 0

The More Important Reason why we are solving it again

Putting aside flash size concerns, we have an existing, unsolved case of stack blow-ups (and excessive stack usage at runtime):
We are very often allocating large objects on-stack and then moving them in their final destination (which is usually a heapless::Vec or an array inside the Matter object).

Cases where we do this:

  • Fabric (1.7KB) - first allocated on-stack then moved in the vector by vec.push(fabric)
  • Session (~ 1KB currently, can be reduced) - ditto
  • ACL - ditto
  • A lot of other places, like temporary arrays of ~ 1KB inside the Secure Channel implementation

But let's concentrate on Fabric and Session for now.
In order to avoid their temporary stack allocation (and subsequent move into the vec via push), we need to somehow implement a "placement-new" for heapless::Vec, and somehow initialize the Fabric or Session in-place, directly in the memory of the heapless::Vec instance. Something like:

vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).fabric_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).fabric_idx).write(fabric_idx); 
        // ... and so on
    }
});

Two problems with the above:

  • We need a custom heapless::Vec, as the upstream one does not have (yet) a push_in_place method
  • Using push_in_place requires a lot of care by the caller, as it is full of unsafe code. And very, very verbose!

Can we do better?

Yes. With pinned-init.

(
BTW: We'll put aside the pinned- aspect of that crate. All our objects are Unpin and fortunately, pinned-init supports unpinned objects just fine, contrary to its name - without any exposure of the user to the notion of pinning.
)

So what does pinned-init do? Grossly oversimplifying, but it allows us to turn:

vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_idx); 
        // ... and so on
    }
});

into:

vec.push_in_place(Fabric::init());

where Fabric::init() is almost like Fabric::new() except with a slightly different signature and function body:

impl Fabric {
    fn init(node_id: u64, fabric_id: u64, fab_idx: NonZeroU8) -> impl Init<Self> {
        init!(Self {
            node_id,
            fabric_id,
            fab_idx,
            icac <- rs_matter::utils::vec::Vec::init(), // Notice the weird `<-` syntax here and how we compose our `init` with `Vec`s `init`!
        })
    }

The above syntax - thanks to the init! macro provided by pinned-init allows us to use safe, readable, composable syntax to in-place initialize structures recursively! Or rather and more correctly - to express an in-place initializer/constructor that does it.
Almost like in C++!

The similarity with regular new() -> Self is on purpose of course.

Can't we apply the const fn trick to Fabric and Session?

Yes, we can.
Assuming we implement const fn Fabric::new() -> Self as a parameter-less initial state of the Fabric object, we can try the following:

const INITIAL_FABRIC: Fabric = Fabric::new();

fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?;
let fabric = fabric_mgr.fabrics_vec.last_mut().unwrap();

// Now modify `fabric` members as we see fit
fabric.node_id = ...

BUT: The problem is fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?; Even though INITAL_FABRIC is a const, we have no guarantee, that the compiler will (a) inline the call to fabrics_vec.push(...) and (b) it will use a memcpy to initialize the new fabric at the end of the vector with the INITAL_FABRIC const. It might or might not do that, depending on the optimization settings.

In contrast pinned-init always initializes in place, even in opt=0.

pinned-init background

This crate (or rather, a copy of it) is used in the RfL (Rust for Linux) project, and is therefore merged in the Linux kernel mainline.
So despite the low-star rating on GitHub, I think the approach of the crate is very solid and credible.

Changes delta

The changes suggested here are incremental, and most importantly - additive.

  • All const fn constructors are preserved, so folks can continue using Matter::new(...) in a const context
  • Matter (and all other structures recursively downwards) just get an extra init method next to their new constructor functions, which is an almost verbatim copy of the existing new, yet with the large members of the concrete struct using the <- syntax of the pinned_init::init! macro which delegates to the placement-new init methods of the inner structures

Not so ideal stuff:

  • I had to copy RefCell from Rust core into rs_matter::utils::cell::RefCell as the core RefCell does not have an init placement-new method. However this is temporary. We'll have to anyway get rid of RefCell in future in favor of using a real mutex, which is either no-op for single-threaded scenarios (the default), or a real blocking one. This future mutex or the beginnings of it is available under the new blmutex module which is introduced by this PR as well
  • I had to copy heapless::Vec as rs_matter::utils::vec::Vec and extend it to (a) have an init in-place constructor and (b) to have the push_inplace method discussed above. Maybe in future, if the pinned-init crate gains traction in the Rust embedded echosystem (and Embassy in particular) we might be able to merge the Vec changes upstream in heapless (and merge our blmutex changes upstream too). But I find both of these not such a big issue.

Alternatives

Option 1: Do nothing

We can continue relying on const fn for the initialization of the Matter object (and pay in increased flash size usage). For Fabric and Session, we can either do the const trick described above (and rely on compiler opt settings to do their job), or we can still fork heapless::Vec and introduce push_in_place but without the convenience of pinned_init::init! we'll have to use a lot of unsafe to in-place initialize the members of the Vec (Fabric and Session and in future possinly others).

Option 2: &mut-borrow large heapless::Vec instances

(This is what I tried originally.)

We can make the big heapless::Vec instances used in Matter no longer owned by Matter, but rather - borrowed in Matter by &mut references.

I.e. FabricMgr would become FabricMgr<'v> because it would contain a &'v mut heapless::Vec<Fabric> rather than owning that vector (or array) as it is now. Consequently, Matter<'a> would become Matter<'a, 'v>, Exchange<'a> would become Exchange<'a, 'v> and so on down the line. Why the existing covariant 'a lifetime cannot be merged with the new 'v lifetime is explained below.

Advantages:

  • Simple and obvious change. Solves the "flash size" problem because these large, initially-empty Vecs are the primary contributors to the flash size increase as they constitute the majority of the Matter object "data", even if they initially contain... nothing

Disadvantages:

  • Matter becomes even less ergonomic: now the user has to separately inject another set of buffers besides PooledBuffers. I.e. this change increases complexity. With the alternative suggested here, we can in fact merge PooledBuffers back into Matter at some point in future, which would reduce complexity
  • Even worse: the 'v lifetime from above for the &'v mut heapless::Vec external buffers we'll be using is invariant (as it is for a &mut ref). So we can't merge it with the existing, nice covariant 'a lifetime the Matter object has. No matter what we do - I even tried with object pools and whatnot, but ultimately, these are either &mut or RefCell<&Pool> or SomeInteriorMutabilityConstructLikeMutex<&Pool> non-mutable references, and because cells with interior mutability always result in invariant lifetimes, like &mut, we always end up with 'v being invariant and thus unmergable with 'a.
    • We can of course hard-code both or either of 'a or 'v to be 'static, but this way we sacrifice the flexibility of Matter on platform like Embedded Linux, where none of the problems discussed here are applicable (stacks on Linux are 2MB by default) and where the user might want to allocate Matter and BasicInfoConfig and DevAttDataFetcher and its mDNS impl and even the buffers on-stack. That would be impossible with 'static.

Why is this PR still a draft?

pinned-init is currently still utilizing core::alloc::AllocError which is not in Rust stable yet. (and for no_std this is the only nightly feature it needs).

It seems the author is open to changing the crate in a way where it no longer unconditionally depends on core::alloc::AllocError. This error type is anyway only used in the InPlaceInit trait, which is a decorator of Arc, Rc and Box (i.e. the alloc module) which we don't use/need, so perhaps we can just put - in pinned-init - the definition of InPlaceInit (and the usage of core::alloc::AllocError) behind the alloc feature flag?

UPDATE: Problem solved. The relevant PR addressing ^^^ got merged yesterday and we are now using it via the newly released pinned-init V0.0.8.

I also need to thorough-fully test the in-place initialization - ideally - on an MCU. To be happening next week.

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Jul 7, 2024

@kedars @andy31415 Not ready for merging yet, because the current version of pinned-init it depends on requires nightly. Hopefully we can lift this restriction from pinned-init soon because I still need to test the changes end-to-end. But if you can look into the "mini-RFC" in the PR description.

Change is completely additive and incremental, but still important to understand and form an opinion on.

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Jul 7, 2024

Not so important but attached two screenshots:

  • MatterStack from rs-matter-stack initialized using const fn - you can see 50KB in the data segment
    matter_data

  • MatterStack initialized with the newly introduced init infrastructure - same 50KB, but now in .bss (as the data being initialized starts its life as MaybeUninit and only when we in-place initialize it with .uninit().init_with(MatterStack::init(...)) it becomes real)! :)
    matter_bss

@ivmarkov
Copy link
Contributor Author

@kedars @andreilitvin In case you are wondering why there is no activity on this PR:

The reason is - before marking it "ready for merge" I need to prototype the follow up PR, which would bear the fruits from this one (hopefully).

In a way, prototyping the subsequent PR would thus de-risk and prove the usefulness of the changes introduced by this one.

Aside from re-working how Fabrics and ACLs are initialized and built-up during commissioning (an exercise primarily aiming at memory savings - particularly stack memory) I'm also working on improving the TLV framework in two major aspects:

  • Enhancing FromTLV to support with - in addition to the existing from_tlv method - also with a new init_from_tlv which allows to in-place initialize large structures - ala C++. Which is what the current PR is all about. For example, this means we can in-place initialize the Fabric object from its TLV serialized buffer without materializing it on-stack first, and then moving it into the Vec of fabrics.
  • Reducing the size of our friend - TLVElement (the representation of partially-parsed TLV data) - from 40 bytes (currently, on x64) - to 16 bytes (on x64). (Similar reduction ratio for 32 bit MCU archs.) This one is interesting actually. While it might seem that I'm trying to cut the lawn with nail clippers, this is only at first sight. The problem is, we are full with TLV-from structures (IM ReadReq, SubscribeReq, WriteReq and quite a few in SC too - Cert - to name the worst one) - which are already full with partially-parsed TLV data in the form of TLVElement or TLVArray. So for those (which are created on the stack and moved around quite a bit and oh well, this is unavoidable, as this is their primary use case!) we might see a reduction of 2x if we can reduce TLVElement and TLVArray sizes. Hence why this exercise is important. Cert for example currently weights more than 400 bytes on x64. By removing the static Vec<T, N> instances in it and then by reducing the TLVElement sizes we can go down to something like 112 bytes on x64, which is quite an improvement.

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Aug 29, 2024

I think this is ready for merge (and had grown up a bit after I did the modules' reshuffling in the utils namespace suggested by @andy31415 . Not that they were a bad idea, quite the contrary.)

The fruits of this efforts will be shown with subsequent PRs of course. For now, the new init-in-place machinery is mostly latent, but its time is soon to come.

@ivmarkov ivmarkov marked this pull request as ready for review August 29, 2024 18:44
@ivmarkov ivmarkov requested review from kedars and andy31415 August 29, 2024 18:45
@ivmarkov
Copy link
Contributor Author

I took the freedom to squash my changes into a single commit, or else I - for the life of me - couldn't rebase my forthcoming TLV-related changes on top.

Hopefully no big deal, as I doubt anybody would be reviewing the changes in this PR commit by commit (and it does not make sense).

Update the example; more ergonomic init of UnsafeCell and MaybeUninit

More constructors

In-place ctr for the built-in Mdns too

enable no_std for pinned-init

In-place init for BTP

Re-export pinned-init as its API is unstable; document

Initializer for GATT peripheral

PAtch pinned-init with a forked version that does not need nightly

zeroed in pinned-init master no longer has the E generic

Use upstream pinned-init

In-place initializer for State

Stop pretending that ContainerInit has any other use case besides UnsafeCell

Maybe and other extensions

(Code review feedbacmoveplit storage and sync types into their dedicated modules

Fix astro and zeroconf

Restore a larger stack for now, until in-place-init is fully utilized

Remove commented out code
@ivmarkov
Copy link
Contributor Author

ivmarkov commented Sep 3, 2024

@kedars This PR also contains the small changes from #202 and #203 so we can also just merge this one, and close the other two.

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Sep 6, 2024

@kedars @andy31415 Can somebody spend some time reviewing this?

It is a precondition and a show-stopper for the following PRs down in my pipeline:

  • Fabrics & ACLs rewrite/improvements. Results in ~ 8K memory reduction (500 bytes per session, for our default of up to 16 sessions)
  • TLV rewrite/improvements. Yet to assess the memory savings, but should be significant, as the TLVElement is now just a fat pointer, and we now have TLV structures which are re-created completely on the fly. I.e. the Cert struct going down from 400 bytes to 16 bytes. Similar savings for a lot of structs in the IM layer too (ReadReq, WriteReq, InvokeReq etc.)

@ivmarkov
Copy link
Contributor Author

ivmarkov commented Sep 6, 2024

... Also a lot of temporary on-stack buffers with large sizes in the SC layer (800 bytes each) are now eliminated, but that depends a bit on the new TLV framework...

@ivmarkov ivmarkov merged commit 1e0117b into project-chip:main Sep 14, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants