Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm32: Ringbuffer rewrite #3336

Merged
merged 11 commits into from
Oct 16, 2024

Conversation

liarokapisv
Copy link
Contributor

@liarokapisv liarokapisv commented Sep 15, 2024

The current ring buffer implementation is reportedly a bit flaky and hard to audit.

This PR aims to rewrite the ring buffer module with a simpler implementation that is easier to audit and has clear semantics.
The new module includes stateful property tests which can encode and test the semantics of the module to ensure correctness.

I have been testing this implementation with a real-life DSP project, for both the USARTX code and the I2S codec processing. I of course welcome others to test the PR to get feedback.

One point worth discussing is the semantics of the new and clear methods for the WritableDmaRingBuffer.
After these calls, the ring buffer needs to assume some existing length, otherwise it will Underrun on the next write call.
Currently, these calls directly force a length equal to the capacity giving the maximum leeway for the DMA to catchup, but we may want to make these configurable. This has the disadvantage of directly using the previous contents of the underlying buffer at a non user-controllable offset. This is not a great idea due to the lack of user control, but is not fixable without changing the interface.

I am also a bit skeptical about the recently introduced write_immediate buffer method.

I think another approach is worth experimenting with, that of directly constructing/resetting an empty WritableDmaRingBuffer and having to hydrate with a version of write that does not produce an Underrun error.

A possible issue with the PR is that it removes the get_complete_count from the DmaCtrl trait since it uses the reset_complete_count instead. It may be worth reverting this to squeeze some extra performance.

As always, thanks for the great project and I hope this PR is useful.

@liarokapisv liarokapisv changed the title stm32: initial support for alternative ringbuffer implementation stm32: Ringbuffer rewrite Sep 15, 2024
@liarokapisv liarokapisv force-pushed the stm32-alternative-ringbuffer-impl branch 2 times, most recently from 9b5195c to 0e5d6c7 Compare September 15, 2024 16:37
@liarokapisv liarokapisv marked this pull request as ready for review September 15, 2024 16:57
@liarokapisv
Copy link
Contributor Author

liarokapisv commented Sep 15, 2024

@xoviat @rmja Your feedback would be invaluable!

@Dirbaio
Copy link
Member

Dirbaio commented Sep 16, 2024

let's see if ringbuffered uart tests pass

bender run

@liarokapisv
Copy link
Contributor Author

liarokapisv commented Sep 16, 2024

Another thing is that the state machine basically stays on Underrun state until the user calls clear so clients that don't handle Underrun may get stuck. (This is explicitly defined in the stateful property testing here) This may be unwanted. We could implicitly clear instead but then we need to decide what happens with the cleared write buffer queue. Do we empty it? Do we assume it is full on clear? Half full? The current clear implementation for the write buffer assumes it is full with the previous buffer contents.

@liarokapisv
Copy link
Contributor Author

bender run

@embassy-ci
Copy link

embassy-ci bot commented Sep 17, 2024

run: permission denied

@Dirbaio
Copy link
Member

Dirbaio commented Sep 17, 2024

bender run

@liarokapisv liarokapisv force-pushed the stm32-alternative-ringbuffer-impl branch from 0e5d6c7 to b965288 Compare September 17, 2024 21:42
.write_index
.diff(self.cap(), &mut self.read_index)
.try_into()
.unwrap();
Copy link
Contributor

@peterkrull peterkrull Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a panic for me because the write index seems to get zeroed out before the read index, causing the isize -> usize cast to fail. But presumably that should not even happen in the first place, and I am still trying to investigate what is causing the issue for me.

Unless a panic here should signify that some invariant has been broken, it could be changed to

match self.write_index.diff(self.cap(), &mut self.read_index).try_into() {
    Ok(diff) if diff <= self.cap() => Ok(diff),
    _ => Err(OverrunError),
}

Copy link
Contributor Author

@liarokapisv liarokapisv Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implicit invariant is that the read index is always behind the write index. It should not happen since we only udvance the read index up to the diff between it and the write index.

@dvdsk
Copy link
Contributor

dvdsk commented Sep 19, 2024

Could the ringbuffer be extended with an is_empty() or len() member?

That would make it easy to implement read_ready() for the RingbufferUart and other ringbuffer backed IO

@liarokapisv
Copy link
Contributor Author

I think it makes sense, will do.

embassy-stm32/src/dma/ringbuffer/mod.rs Outdated Show resolved Hide resolved
embassy-stm32/src/dma/ringbuffer/mod.rs Outdated Show resolved Hide resolved
embassy-stm32/src/dma/ringbuffer/mod.rs Outdated Show resolved Hide resolved
@liarokapisv
Copy link
Contributor Author

@peterkrull @dvdsk

I simplified the implementation by inlining some methods.
I also renamed margin to len and split the auto-normalization functionality of diff along with the update-dma-index to a separate sync-index method. This makes the len method take a &self instead of a &mut self.
I also added auto-clear functionality: this fast-forwards the index for the read version and gives a buffer worth of leeway for the writer implementation. The latter may need changes, I always welcome any feedback.

@dvdsk
Copy link
Contributor

dvdsk commented Sep 22, 2024

I also renamed margin to len

Thanks 👍 I'll get onto adding ReadReady once this is merged.

@liarokapisv liarokapisv force-pushed the stm32-alternative-ringbuffer-impl branch from 2232ac3 to 628dd36 Compare September 23, 2024 00:22
@peterkrull
Copy link
Contributor

That would make it easy to implement read_ready() for the RingbufferUart and other ringbuffer backed IO

What is the expected behavior of read_ready()? With len now being read-only it does not actually fetch whether the hardware DMA state is actually updated. It only returns the state from the last time the writer index was synced.

@liarokapisv
Copy link
Contributor Author

liarokapisv commented Sep 23, 2024

That would make it easy to implement read_ready() for the RingbufferUart and other ringbuffer backed IO

What is the expected behavior of read_ready()? With len now being read-only it does not actually fetch whether the hardware DMA state is actually updated. It only returns the state from the last time the writer index was synced.

Yea I was thinking this may be problematic, I can revert to a mutating version. The read-only version will not be updated until a new read/write so that doesn't make a lot of sense in retrospect.

@dvdsk
Copy link
Contributor

dvdsk commented Sep 23, 2024

What is the expected behavior of read_ready()?

For RingBufferedUartRx specifically it would mean the next call to read never return Poll::Pending. More generically it means that read should not block.

As I interpret the docs of ReadReady that does not go both ways. Since its meant to allow for non-blocking reading it should be fine for a false negative. So read_ready returning false while an immediate call to read would return Poll::Pending is fine I think. It should not only update when read & write are called however. It takes &mut self so making it mutating should not be a problem.

@liarokapisv
Copy link
Contributor Author

What is the expected behavior of read_ready()?

For RingBufferedUartRx specifically it would mean the next call to read never return Poll::Pending. More generically it means that read should not block.

As I interpret the docs of ReadReady that does not go both ways. Since its meant to allow for non-blocking reading it should be fine for a false negative. So read_ready returning false while an immediate call to read would return Poll::Pending is fine I think. It should not only update when read & write are called however. It takes &mut self so making it mutating should not be a problem.

In that case, I think making len take a &mut self is probably better, since it needs to update internal state to show the actual running length. I reverted the read-only changes, you should now be able to directly use len to implement ReadReady.

@liarokapisv
Copy link
Contributor Author

Two main issues that I believe still remain to be properly discussed:

Overrun recovery

Currently, the read ring-buffer version directly syncs to the current DMA location. I think this is pretty straightforward and covers most use cases properly.

The write version on the other hand directly syncs buf.len steps ahead of the current DMA location. This is kind of arbitrary, it ensures that the user has the most leeway to properly get back on track, but it also adds latency to the subsequent writes. This may also lead to sync issues between parallel read and write calls.
This is similar to what happens at ring buffer creation, since at that point the write-ring-buffer is full.

Broken Invariants Detection

The implementation assumes that the RingBuffers are only used while the DMA is both 1) enabled and 2) not modified other than through the DmaCtrl interface. If the DMA stream is reset while the buffer is in operation, or if the ringbuffer is used while the stream is not enabled, then these assumptions are broken.

Some of this can be detected through negative diff method results. At that point, we could either panic (this is what happens currently) or instead propagate a new error type and make this recoverable. This would prevent issues like those caused in #3356 but would also require modifying other ringbuffer clients without any obvious way to recover. Another possible approach is to sneak this error case as part of the Overrun error path perhaps also logging the detected broken invariant.

@peterkrull
Copy link
Contributor

For the invariant thing, I am still most in favor of just returning an error. An Overrun error would be fine for me, but perhaps someone more experienced can chime in on whether a new type of error is warranted. Then add a doc comment to ReadableDmaRingBuffer::clear() that it should only be called after the DMA is already enabled/started for the most predictable behavior.

Another little thing, dma_sync can be simplified by assigning directly to the complete_count and pos. I would also rename completion_count to complete_count since that is what it is called elsewhere.

fn dma_sync(&mut self, cap: usize, dma: &mut impl DmaCtrl) {
    let first_pos = cap - dma.get_remaining_transfers();
    self.complete_count += dma.reset_complete_count();
    self.pos = cap - dma.get_remaining_transfers();

    // If the latter call to get_remaining_transfers() returned a smaller value than the first, the dma
    // has wrapped around between calls and we must check if the complete count also incremented.
    if self.pos < first_pos {
        self.complete_count += dma.reset_complete_count();
    }
}

As for the WritableDmaRingBuffer I have only started looking at it now

@peterkrull
Copy link
Contributor

Apart from the initial added delay, what is the drawback of just giving the most leeway when starting/clearing the WritableDmaRingBuffer? Presumably whoever decides the length of the DMA buffer takes into account the acceptable delay of the whole system, and the worst-case execution time. If a possible latency of dt*dma_buf.len() is too long, then shouldn't the buffer just be shorter? Also, if the user did in fact overrun just prior, then they might need all the help they can get to catch up.

Unless there is some clear immediate need for a custom initial offset, defaulting to the maximum is probably fine.

@liarokapisv
Copy link
Contributor Author

My main problematic case goes something like this:
User in a loop repeatedly calls read_exact.await, processes, then write_exact.await. Clearing will then also affect the read call potentially leading to cascading overrun issues.

This is recoverable, just requires some thought in the application logic. I agree that giving the max length is a good default hence why I made it so, just thought I should get some feedback first before committing.

@liarokapisv
Copy link
Contributor Author

liarokapisv commented Sep 23, 2024

Also some bikesheding, should clear be renamed to sync or something? Clear makes
complete sense for the read version where the len will be reset to 0 but is perhaps counterintuitive for the write version which skips buf.len steps ahead of the dma.

@peterkrull
Copy link
Contributor

Could both not just be called reset? The doc comment already describes them as a reset

@liarokapisv liarokapisv marked this pull request as ready for review October 1, 2024 16:06
@liarokapisv
Copy link
Contributor Author

liarokapisv commented Oct 2, 2024

I think I have addressed most of my issues.
I still would like some feedback on the current ringbuffer Error -> Overrun conversion in the case of DmaUnsynced error. With the last update, it only logs an error message and silently converts to Overrun. I was thinking it may be worth panicking since it implies an embassy driver issue, but that may be too intrusive for users especially if it only happens as an edge case.
I don't get any DmaUnsynced errors with my last changes in any case.

@peterkrull
Copy link
Contributor

peterkrull commented Oct 2, 2024

I would very much prefer not to panic. It would really suck to have my drone fall out of the sky just because one of my RC packets (for whatever reason) caused invariant we have tried to model in the buffer, not to be upheld. Especially when one lost packet and a DMA restart later is all it takes to get back on track. It would be another thing if the entire peripheral locked up unrecoverably, but that is not the case so I prefer the error.

Also can the if..if..else at (and same for the writer)

https://github.com/embassy-rs/embassy/pull/3336/files#diff-79e995a8dc7e2d1878dc6f7a4f665c9aea320f6f0e589e178769ed329a570337R225-R233

be combined into an if..ifelse..else?

@liarokapisv
Copy link
Contributor Author

liarokapisv commented Oct 2, 2024

I agree with the reasoning for avoiding the panic. Also simplified the if/else as requested.

@Dirbaio
Copy link
Member

Dirbaio commented Oct 3, 2024

bender run

@liarokapisv liarokapisv force-pushed the stm32-alternative-ringbuffer-impl branch 3 times, most recently from f783a07 to 3ccceac Compare October 3, 2024 17:39
@Dirbaio
Copy link
Member

Dirbaio commented Oct 3, 2024

bender run

@liarokapisv
Copy link
Contributor Author

@Dirbaio I don't think I have anything else to address. I am open to feedback but otherwise I think this mostly does it. @peterkrull Do you agree?

@peterkrull
Copy link
Contributor

I do not have anything else to add

@liarokapisv
Copy link
Contributor Author

liarokapisv commented Oct 9, 2024

There is an issue with the (unchanged) read/write_exact calls: They only poll once before yielding (probably to ensure forward progress). If one reads/writes less than half the dma buffer in a loop and there are no spurious wake-ups it is almost guaranteed that the user will get overuns because the method simply does not get to run enough times to fill the buffer.

The above was wrong, the original code was fine.

@liarokapisv liarokapisv force-pushed the stm32-alternative-ringbuffer-impl branch 2 times, most recently from 6cab295 to 28d0353 Compare October 15, 2024 15:42
@lulf lulf added this pull request to the merge queue Oct 16, 2024
Merged via the queue into embassy-rs:main with commit bcfbaaa Oct 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants