Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/tlsf: Fix the way system functions are overriden. #12031

Merged
merged 2 commits into from
Sep 17, 2019

Conversation

jcarrano
Copy link
Contributor

Contribution description

I screwed up

The correct way to overrride the malloc family of functions in newlib-nano is to provide the *_r (reentrant) variants. Newlib implements the "normal" functions on top of these (see the newlib source code). Also, internally it calls the *_r functions when allocating buffers.

If only the "normal" non-reentrant functions are provided this will mean that some of the code will still use the vanilla newlib allocator. Furthermore, if one uses the whole heap as a pool for TLSF then the system may in the best case crash as there is no enough memory for its internall allocations or in the worst case function eratically (this depends on how the heap reserved, there is an upcomming series of commits in that direction).

This commit splits the handling between newlib and native. It also prepares the ground for future work on the pool initialization.

Right now I could only test this in ARM and native and I cannot ensure it will work on other platforms.

Replacing the system's memory allocator is not something that can be taken lightly and will inevitably require diving into the depths of the libc. Therefore I would say that using TLSF as a system wide allocator is ATM supported officially only on those plaftorms.

Reviewing

This looks like lot of code but most of it is just renaming things an moving stuff around.

There is a bit of duplicate code but the alternative was to either do a hack with macros or litter the code with ifdefs, which can only get worse in my next PR where I add initialization routines that differ according to the platform.

Testing procedure

Aside from reading the newlib sources, you can see the issue in a live system using the debugger.

Compile any example (with or without tlsf-malloc), grab a debugger and place a breakpoint in sbrk and _sbrk_r. Doing a backtrace will reveal it gets called by _malloc_r.

Issues/PRs references

Original PR: #9006

A (void*) function was declared as (void**) because one of the void pointers
was hidden behind a typedef. Because of the way a void* works, this has no
consequences, but it is confusing.
@jcarrano jcarrano added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 19, 2019
@jcarrano
Copy link
Contributor Author

@cladmi Would you care to take a look at the build system part?

pkg/tlsf/Makefile.dep Show resolved Hide resolved
pkg/tlsf/contrib/Makefile.include Outdated Show resolved Hide resolved
@jcarrano jcarrano force-pushed the tlsf-malloc-fix-override branch from d21cd79 to 8e6a4ff Compare August 19, 2019 16:39
@jcarrano jcarrano added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Aug 26, 2019
@jcarrano jcarrano requested a review from gschorcht August 26, 2019 16:31
@jcarrano
Copy link
Contributor Author

I nobody can review TLSF then I propose we delete it. There are two issues, though:

  • Removing the entire package would require reworking Lua.
  • If we remove only tlsf-malloc I'm afraid someone will reintroduce it at some point in the future and make this same mistakes all over again.

@gschorcht
Copy link
Contributor

@jcarrano I'm out of office. I will try to take a look on it tomorrow.

@benpicco
Copy link
Contributor

benpicco commented Aug 28, 2019

So just to clarify: On newlib systems the reentrant versions are to be provided whereas on native (glibc) the non-reentrant have to be implemented.

Then this PR is pretty straightforward, it just converts the old code to the new & proper ways. Maybe add a comment why it is not sufficient to also just implement the reentrant versions on native.

@benpicco
Copy link
Contributor

I tested this both on esp32-wroom-32 and nucleo-l031k6 with tests/malloc, both ran fine. However, it appears as if the same binary is generated no matter if I set USEPKG += tlsf or USEMODULE += tlsf-malloc or neither.

I also ran examples/lua_basic and examples/lua_REPL on the esp32.
The basic example works fine, the REPL example loops at

 Welcome to the interactive interpreter
 L> 
 Exited. status: No errors, return code 0
 This is Lua: starting interactive session
 …

but it also does that on master.

@jcarrano
Copy link
Contributor Author

@benpicco the lua examples are not affected by this, since they use tslf functions directly (i.e. they do not replace the system malloc).

The issue with tests/malloc is weird. I do not get the same results with examples/default. Let me investigate.

@jcarrano
Copy link
Contributor Author

@benpicco I found the issue: I deleted the UNDEF.

You can see nothing gets overriden by doing:

arm-none-eabi-nm -l bin/nucleo-l031k6/default.elf | grep _malloc_r

which is wrong.

I'm pushing the fix.

@benpicco
Copy link
Contributor

benpicco commented Sep 12, 2019

Hm, I tested it again with nucleo-l031k6 and esp32-wroom-32.
Now with USEMODULE += tlsf-malloc a different binary is generated for both platforms.

I remembered that without #12032 a manual initialization of the heap is needed, so I added

--- a/tests/malloc/main.c
+++ b/tests/malloc/main.c
@@ -70,8 +70,12 @@ void free_memory(struct node *head)
     }
 }
 
+#include "tlsf-malloc.h"
+static char heap[4096];
 int main(void)
 {
+    tlsf_add_global_pool(heap, sizeof(heap));
+
     while (1) {
         struct node *head = malloc(sizeof(struct node));
         total += sizeof(struct node);

But when I flash it, neither of the two boards will print anything.

$ arm-none-eabi-nm -l bin/nucleo-l031k6/tests_malloc.elf | grep _malloc_r 
08000110 T _malloc_r	/home/benpicco/dev/RIOT/pkg/tlsf/contrib/newlib.c:56

@jcarrano
Copy link
Contributor Author

@benpicco turns out that at main() it is too late to initialize the memory pool - the system has already crashed. If I rebase #12032 on top of this it works.

@jcarrano
Copy link
Contributor Author

jcarrano commented Sep 12, 2019

Since this is hard to test on its own, I'm thinking of adding the nm -l $(ELFFILE) | grep _malloc_r test to the tests/pkg_tlsf_malloc test in #12032 .

Edit: for a similar (at least in spirit) approach, look at #9599.

#define ATTR_MALIGN __attribute__((alloc_align(1), alloc_size(2), malloc))
#define ATTR_REALLOC __attribute__((alloc_size(2)))

#else /* No GNU C -> no alias attribute */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be at least a warning then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't #warning a GNU C extension? At least that's the warning I always get if I use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is wrong, it should read:

Suggested change
#else /* No GNU C -> no alias attribute */
#else /* No GNU C -> no *alloc attribute */

The warning is not necessary: the "alloc" attributes are just hints to the compiler. In fact, I'm not 100% sure they help here.

@benpicco
Copy link
Contributor

Hm, I still can't get it to work with test/malloc.

nucleo-l031k6 crashes at

2019-09-13 00:18:23,222 - INFO # main(): This is RIOT! (Version: 2019.10-devel-836-g1d522-tlsf-early-init)
2019-09-13 00:18:23,228 - INFO # Free 1024 Bytes at 0x0x6863fbd1, total -1024
2019-09-13 00:18:23,228 - INFO # 
2019-09-13 00:18:23,234 - INFO # Context before hardfault:

and esp32-wroom-32 doesn't print anything after the bootloader. native just exits after allocating 8 MiB.

I rebased #12032 on top of this on top of master.

1d522180d (HEAD -> tlsf-early-init) pkg/tlsf: Early initialization of memory pool.
6cb4e5129 tests/pkg_tlsf_malloc: Test TSLF is correctly initialized.
dbebd4741 fixup! reintrodude UNDEF.
319e5c6ee fixup! replace warning with errors.
3b9e5cabc pkg/tlsf: Fix the way system functions are overriden.
1008938d0 pkg/tlsf: fix double pointer.
66ce29d94 (origin/master, origin/HEAD, master) Merge pull request #12051 from OTAkeys/fix/isotp

@jcarrano
Copy link
Contributor Author

I tested with BOARD=nucleo-f103rb and it works for me.

@benpicco
Copy link
Contributor

benpicco commented Sep 13, 2019

Ok, it's also working on samr21-xpro and same54-xpro.

On esp8266-esp-12x I get

/opt/esp/newlib-xtensa/xtensa-lx106-elf/lib/libc.a(lib_a-mallocr.o): In function `_malloc_r':
/root/newlib/xtensa-lx106-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/mallocr.c:2328: multiple definition of `_malloc_r'
/home/benpicco/dev/RIOT/tests/malloc/bin/esp8266-esp-12x/tlsf-malloc/newlib.o:newlib.c:(.text._malloc_r+0x10): first defined here
collect2: error: ld returned 1 exit status

but I suppose this might be fixed by #11108

I guess nucleo-l031k6 then just has too little RAM (although it fails in a strange way) and on esp comes with it's very own newlib and might do strange things there. @gschorcht may know more about it.

But this is all unrelated to this PR.

@benpicco
Copy link
Contributor

Please squash.
The PR makes sense and is working.

Let's not block this on unrelated breakage.

The correct way to overrride the malloc family of functions in newlib-nano is
to provide the *_r (reentrant) variants. Newlib implements the "normal"
functions on top of these (see the newlib source code). Also, internally it calls
the *_r functions when allocating buffers.

If only the "normal" non-reentrant functions are provided this will mean that
some of the code will still use the vanilla newlib allocator. Furthermore, if
one uses the whole heap as a pool for TLSF then the system may in the best case
crash as there is no enough memory for its internall allocations or in the worst
case function eratically (this depends on how the heap reserved, there is an
upcomming series of commits in that direction).

This commit splits the handling between newlib and native. It also prepares the
ground for future work on the pool initialization.

Right now I could only test this in ARM and native and I cannot ensure it will
work on other platforms. Replacing the system's memory allocator is not something
that can be taken lightly and will inevitably require diving into the depths of
the libc. Therefore I would say that using TLSF as a system wide allocator is ATM
supported officially only on those plaftorms.

Testing:

Aside from reading the newlib sources, you can see the issue in a live system
using the debugger.

Compile any example (with or without tlsf-malloc), grab a debugger and place
a breakpoint in sbrk and _sbrk_r. Doing a backtrace will reveal it gets called
by _malloc_r.
@jcarrano jcarrano force-pushed the tlsf-malloc-fix-override branch from cf09e4a to cc907fa Compare September 17, 2019 19:19
@jcarrano
Copy link
Contributor Author

@benpicco done

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR makes sense and has been tested.
Things are still broken around this package, just a bit less than before - but another PR is already addressing that.

@benpicco benpicco merged commit f020951 into RIOT-OS:master Sep 17, 2019
@kb2ma kb2ma added this to the Release 2019.10 milestone Sep 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants