core/debug: printf what asked to print #20168

kfessel · 2023-12-12T08:21:23Z

Contribution description

removes the unhelpful stack-size-test from debug-print

Testing procedure

Issues/PRs references

this is much more sane than #20166

riot-ci · 2023-12-12T09:51:52Z

Murdock results

✔️ PASSED

b39fbe2 core/debug: printf what asked to print

Success	Failures	Total	Runtime
8082	0	8082	10m:54s

Artifacts

Documentation preview

kaspar030 · 2023-12-12T11:26:52Z

core/lib/include/debug.h

@@ -44,17 +44,7 @@ extern "C" {
 */
 #ifdef DEVELHELP


this whole ifdef case can go

OlegHahm · 2023-12-12T20:15:33Z

Can I haz an explanation why this check is no longer needed?

maribu · 2023-12-12T20:30:55Z

With a switch to picolibc we wouldn't get stack overflows on printf() that easily.

With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught. And an "the debug info you wanted to see cannot be shown until you increase stack size" and an "stack overflown" both have the same result: Bump the stack size and flash again. So there isn't much lost even when using newlib.

OlegHahm · 2023-12-12T20:47:34Z

With a switch to picolibc we wouldn't get stack overflows on printf() that easily.

Not that easily but still possible, right? And as far as I understood newlib is still the default or am I wrong?

With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught.

On how many platforms do we have MPU support? And what is the "heuristic" one?

maribu · 2023-12-12T21:56:00Z

Most Cortex M boards have an MPU, but for some we don't use it to due bugs, if I recall correctly.

Some RISC-V MCUs also have an MPU (they call it differently, though). I'm not sure if we use that to guard the stack, though. @teufelchen should know.

The heuristic is the THREAD_CREATE_STACKTEST that fills the stack with canary values. (The address of a memory location is used as canary for that location.) The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.

OlegHahm · 2023-12-13T07:11:47Z

As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.

Regarding STACKTEST: that doesn't provide any stack protection but only shows stack usage. This feature already was in place when this macro was introduced but didn't help that much.

I totally agree that this macro hack is ugly as fuck but was introduced for a reason and I'm not convinced that reason has vanished.

maribu · 2023-12-13T07:25:52Z

@OlegHahm:

https://github.com/RIOT-OS/RIOT/blob/master/core/sched.c#L126-L139

kaspar030 · 2023-12-13T08:15:59Z

The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.

Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it. the test only covers changing the exact canary value, not anything below.

kaspar030 · 2023-12-13T08:18:28Z

As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.

What we do here is to also switch segments when task switching.
I'm not sure though whether we have a segment for the ISR stack.

maribu · 2023-12-13T08:44:34Z

Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it.

That is pretty like one of the vectors an attacker would try; it would also work when "jumping over" the MPU guard space.

The Linux kernel uses a similar mechanism to detect stack overflows, I think it has at least two unmapped pages after the stack ("after" here in terms of the direction the stack grows) in the virtual address, causing segfaults on most stack overflows. But jumping of the guard space is possible there as well. I think GCC can be told to touch at least one byte in every page it allocates on the stack to prevent this attack vector, but it requires binaries to be compiled with that magic. We might want to enable that as well (but with MPU guard area size instead of page size)?

kaspar030 · 2023-12-13T09:28:04Z

I also experimented with enlarging the canary value to a proper redzone (bringing non-mpu checks closer to what the mpu does), but never got to PR that. this is the branch.

maribu · 2023-12-13T09:35:49Z

I like the idea, but I don't like zero being the magic number there. I think e.g. GCC when asked to write at least one byte to every page of stack allocation during the allocation will write zero to it. The stacktest approach would also make it more difficult to correctly guess the correct magic value, as it depends on memory layout. (But then again, the memory layout is reproducible and static within the same firmware.)

Maybe we could also just check one word on context switch, and have the idle thread check the red zones of all stacks before going to sleep? Or is the context switch overhead not as bad as I assume?

kaspar030 · 2023-12-13T09:40:50Z

I like the idea, but I don't like zero being the magic number there.

I guess re-using the stack test approach makes sense.

Or is the context switch overhead not as bad as I assume?

TBH, I don't remember. I worked on that at the side like, three years ago. I'd assume the overhead to be prohibitive for production, though it might not be that bad. (assuming a word wise memory compare takes a cycle, in theory we're looking at only like, +10-20 cycles per context switch, which would be <5-10% overhead.)

OlegHahm · 2023-12-13T10:12:53Z

Regarding this PR:
the reason for this check was that enabling the debug macro may easily result in a stack overflow because of newlib's high stack usage for printf. In that case I would assume that the chances are actually not that bad that the scheduler stack test in place would catch these cases. Hence, the check may be obsolete indeed.

So, how about removing this check and see if we notice that people start to run into this type of problem on a regular basis, we can still revert the change.

OlegHahm · 2024-10-29T20:35:32Z

As stated above: from my perspective, let's give it a try.

core/debug: printf what asked to print

b39fbe2

kfessel requested a review from kaspar030 as a code owner December 12, 2023 08:21

github-actions bot added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Dec 12, 2023

kfessel requested a review from maribu December 12, 2023 08:30

benpicco added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Dec 12, 2023

kaspar030 reviewed Dec 12, 2023

View reviewed changes

core/lib/include/debug.h

@@ -44,17 +44,7 @@ extern "C" {

*/

#ifdef DEVELHELP

Copy link

Contributor

kaspar030 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole ifdef case can go

mguetschow requested review from OlegHahm and maribu and removed request for maribu October 29, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/debug: printf what asked to print #20168

core/debug: printf what asked to print #20168

kfessel commented Dec 12, 2023

riot-ci commented Dec 12, 2023

kaspar030 Dec 12, 2023

OlegHahm commented Dec 12, 2023

maribu commented Dec 12, 2023

OlegHahm commented Dec 12, 2023

maribu commented Dec 12, 2023

OlegHahm commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

OlegHahm commented Dec 13, 2023

OlegHahm commented Oct 29, 2024

core/debug: printf what asked to print #20168

Are you sure you want to change the base?

core/debug: printf what asked to print #20168

Conversation

kfessel commented Dec 12, 2023

Contribution description

Testing procedure

Issues/PRs references

riot-ci commented Dec 12, 2023

Murdock results

Artifacts

kaspar030 Dec 12, 2023

Choose a reason for hiding this comment

OlegHahm commented Dec 12, 2023

maribu commented Dec 12, 2023

OlegHahm commented Dec 12, 2023

maribu commented Dec 12, 2023

OlegHahm commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

maribu commented Dec 13, 2023

kaspar030 commented Dec 13, 2023

OlegHahm commented Dec 13, 2023

OlegHahm commented Oct 29, 2024