Add option to force I2S routines into IRAM #9115

kleinesfilmroellchen · 2024-03-30T23:18:28Z

Basic Infos

This issue complies with the issue POLICY doc.
I have read the documentation at readthedocs and the issue is not addressed there.
I have tested that the issue is present in current master branch (aka latest git). (The relevant code is identical between master and latest release)
I have searched the issue tracker for a similar issue.
If there is a stack dump, I have decoded it. (There is none.)
I have filled out all fields below.

Platform

Hardware: ESP8266EX (Wemos/Lolin D1 mini)
Core Version: 3.1.2
Development Env: Arduino IDE
Operating System: Windows

Settings in IDE

Module: Wemos D1 mini r2
Flash Mode: dio
Flash Size: 4MB, no FS
MMU: 16KB/48KB with shared second heap
lwip Variant: v2 Lower Memory
Flash Frequency: 80Mhz
CPU Frequency: 160MHz
Upload Using: SERIAL
Upload Speed: 921600

Problem Description

In situations where a very high CPU load is present, and the I2S engine is constantly running to play audio, it is important that I2S can run as fast as possible to not slow down other parts of the system. If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore, or something else was going on with un-inlining, I am not quite certain. (Some bloaty information that may provide further clues is here.) This slowed down the high-load procedure significantly, and

The fix for this problem was simply to force all virtual functions of I2SClass into IRAM, like so:

  // from Stream
  virtual int IRAM_ATTR available();
  virtual int IRAM_ATTR read(); // Blocking, will wait for incoming data
  virtual int IRAM_ATTR peek(); // Blocking, will wait for incoming data   
  virtual void IRAM_ATTR flush();

  // from Print (see notes on write() methods below)
  virtual size_t IRAM_ATTR write(uint8_t);
  virtual size_t IRAM_ATTR write(const uint8_t *buffer, size_t size);
  virtual int IRAM_ATTR availableForWrite();

Despite my initial expectations, this made xtensa-gcc happy again and the code returned to previous performance.

To avoid having to edit base libraries for this to work, I'd prefer if there was the option to enable these attributes via a define, for those who need it. This would mean no functional changes to everyone who doesn't need it, providing any user full control over IRAM code.

Note: I would like to not receive recommendations for "better" hardware specific to my project. Not only is there no direct replacement for the D1 mini given my peripherals, but I am operating far away enough from the ESP8266's performance and memory limit that it's manageable without weird tricks (not that it's comfortable either, but if you want comfortable programming you become a web developer). I am asking for an option that allows me to optimize my code, and I consider carefully moving functions in and out of IRAM to be a proper optimization technique on architectures like the ESP8266. The inner demoscene kid in me is also having some fun here.

MCVE

This exact commit; note that it requires specific hardware which is laid out in the repository. It doesn't really get more minimal than this, since the issue only presents itself when the CPU load is extremely high.

Debug Messages

(None. I can't even enable the serial console here.)

The text was updated successfully, but these errors were encountered:

mcspr · 2024-03-31T16:20:33Z

If memory load is additionally significantly restrained, xtensa-gcc can sometimes decide to move code in and out of IRAM for no apparent reason. For me, this happened when several unrelated functions increased in size significantly - suddenly the I2S routines weren't in IRAM anymore

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something?
Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

kleinesfilmroellchen · 2024-04-02T13:21:08Z

Can you point to the specific instance of that happening? Bloaty output does not really explain what got moved where, or what functions are expected to be inline that are not.

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

I2S implementation is already in .cpp and not in header. afaik, we don't have lto enabled to change it into sometimes-inline or am I missing something? Neither implementations have IRAM_ATTR near functions, so everything i2s_... / I2SClass::... stays on flash by default.

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

mcspr · 2024-04-02T16:48:43Z

compile_commands.json says -fno-inline-functions for most files, but not all core files. Excuse me for not looking through all 400 commands :)

...why look there instead of build definitions? 🙃

For Arduino IDE / arduino-cli

Arduino/platform.txt

Lines 78 to 96 in 685f2c9

    
           compiler.libc.path={runtime.platform.path}/tools/sdk/libc/xtensa-lx106-elf 
        
           compiler.cpreprocessor.flags=-D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ -D_GNU_SOURCE -DESP8266 {build.debug_optim} {build.opt.flags} "-I{compiler.sdk.path}/include" "-I{compiler.sdk.path}/{build.lwip_include}" "-I{compiler.libc.path}/include" "-I{build.path}/core" 
        
           # support precompiled libraries in IDE v1.8.6+ 
        
           compiler.libraries.ldflags= 
        
           compiler.c.cmd=xtensa-lx106-elf-gcc 
        
           compiler.c.flags=-c "{compiler.warning_flags}-cflags" -std=gnu17 {build.stacksmash_flags} -g -free -fipa-pta -Werror=return-type -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.mmuflags} {build.non32xferflags} {build.iramfloat} 
        
           compiler.S.cmd=xtensa-lx106-elf-gcc 
        
           compiler.S.flags=-c -g -x assembler-with-cpp -MMD -mlongcalls "-I{runtime.tools.xtensa-lx106-elf-gcc.path}/include/" 
        
           compiler.c.elf.flags=-g "{compiler.warning_flags}-cflags" -nostdlib -Wl,--no-check-sections -u app_entry {build.float} -Wl,-static "-L{compiler.sdk.path}/lib" "-L{compiler.sdk.path}/lib/{build.sdk}" "-L{build.path}" "-L{compiler.libc.path}/lib" "-Tlocal.eagle.flash.ld" -Wl,--gc-sections -Wl,-wrap,system_restart_local -Wl,-wrap,spi_flash_read 
        
           compiler.c.elf.cmd=xtensa-lx106-elf-gcc 
        
           compiler.c.elf.libs=-lhal -lphy -lpp -lnet80211 {build.lwip_lib} -lwpa -lcrypto -lmain -lwps -lbearssl -lespnow -lsmartconfig -lairkiss -lwpa2 {build.stdcpp_lib} -lm -lc -lgcc 
        
           compiler.cpp.cmd=xtensa-lx106-elf-g++ 
        
           compiler.cpp.flags=-c "{compiler.warning_flags}-cppflags" {build.stacksmash_flags} -g -free -fipa-pta -Werror=return-type -mlongcalls -mtext-section-literals -fno-rtti -falign-functions=4 {build.stdcpp_level} -MMD -ffunction-sections -fdata-sections {build.exception_flags} {build.sslflags} {build.mmuflags} {build.non32xferflags} {build.iramfloat}

For PlatformIO

Arduino/tools/platformio-build.py

Lines 61 to 174 in 685f2c9

    
           env.Append( 
        
               ASFLAGS=[ 
        
                   "-mlongcalls", 
        
                   "-mtext-section-literals", 
        
               ], 
        
               ASPPFLAGS=[ 
        
                   "-x", "assembler-with-cpp", 
        
               ], 
        
               # General options that are passed to the C compiler (C only; not C++) 
        
               CFLAGS=[ 
        
                   "-std=gnu17", 
        
                   "-Wpointer-arith", 
        
                   "-Wno-implicit-function-declaration", 
        
                   "-Wl,-EL", 
        
                   "-fno-inline-functions", 
        
                   "-nostdlib" 
        
               ], 
        
               # General options that are passed to the C and C++ compilers 
        
               CCFLAGS=[ 
        
                   "-Os",  # optimize for size 
        
                   "-mlongcalls", 
        
                   "-mtext-section-literals", 
        
                   "-falign-functions=4", 
        
                   "-U__STRICT_ANSI__", 
        
                   "-ffunction-sections", 
        
                   "-fdata-sections", 
        
                   "-Wall", 
        
                   "-Werror=return-type", 
        
                   "-free", 
        
                   "-fipa-pta" 
        
               ], 
        
               # General options that are passed to the C++ compiler 
        
               CXXFLAGS=[ 
        
                   "-fno-rtti", 
        
                   "-std=gnu++17" 
        
               ], 
        
               # General user options passed to the linker 
        
               LINKFLAGS=[ 
        
                   "-Os", 
        
                   "-nostdlib", 
        
                   "-Wl,--no-check-sections", 
        
                   "-Wl,-static", 
        
                   "-Wl,--gc-sections", 
        
                   "-Wl,-wrap,system_restart_local", 
        
                   "-Wl,-wrap,spi_flash_read", 
        
                   "-u", "app_entry", 
        
                   "-u", "_printf_float", 
        
                   "-u", "_scanf_float", 
        
                   "-u", "_DebugExceptionVector", 
        
                   "-u", "_DoubleExceptionVector", 
        
                   "-u", "_KernelExceptionVector", 
        
                   "-u", "_NMIExceptionVector", 
        
                   "-u", "_UserExceptionVector" 
        
               ], 
        
               # A platform independent specification of C preprocessor definitions as either: 
        
               # - -DFLAG as "FLAG" 
        
               # - -DFLAG=VALUE as ("FLAG", "VALUE") 
        
               CPPDEFINES=[ 
        
                   ("F_CPU", "$BOARD_F_CPU"), 
        
                   "__ets__", 
        
                   "ICACHE_FLASH", 
        
                   "_GNU_SOURCE", 
        
                   ("ARDUINO", 10805), 
        
                   ("ARDUINO_BOARD", '\\"PLATFORMIO_%s\\"' % env.BoardConfig().id.upper()), 
        
                   ("ARDUINO_BOARD_ID", '\\"%s\\"' % env.BoardConfig().id), 
        
                   "FLASHMODE_${BOARD_FLASH_MODE.upper()}", 
        
                   "LWIP_OPEN_SRC" 
        
               ], 
        
               # The list of directories that the C preprocessor will search for include directories 
        
               CPPPATH=[ 
        
                   join(FRAMEWORK_DIR, "tools", "sdk", "include"), 
        
                   join(FRAMEWORK_DIR, "cores", env.BoardConfig().get("build.core")), 
        
                   join(platform.get_package_dir("toolchain-xtensa"), "include") 
        
               ], 
        
               # The list of directories that will be searched for libraries 
        
               LIBPATH=[ 
        
                   join("$BUILD_DIR", "ld"),  # eagle.app.v6.common.ld 
        
                   join(FRAMEWORK_DIR, "tools", "sdk", "lib"), 
        
                   join(FRAMEWORK_DIR, "tools", "sdk", "ld") 
        
               ], 
        
               # LIBS is set at the bottom of the builder script 
        
               # where we know about all system libraries to be included 
        
               LIBSOURCE_DIRS=[ 
        
                   join(FRAMEWORK_DIR, "libraries") 
        
               ], 
        
               BUILDERS=dict( 
        
                   ElfToBin=Builder( 
        
                       action=env.VerboseAction(" ".join([ 
        
                           '"$PYTHONEXE"', 
        
                           '"%s"' % join(FRAMEWORK_DIR, "tools", "elf2bin.py"), 
        
                           "--eboot", '"%s"' % join( 
        
                               FRAMEWORK_DIR, "bootloaders", "eboot", "eboot.elf"), 
        
                           "--app", "$SOURCE", 
        
                           "--flash_mode", "$BOARD_FLASH_MODE", 
        
                           "--flash_freq", "${__get_board_f_flash(__env__)}", 
        
                           "--flash_size", "${__get_flash_size(__env__)}", 
        
                           "--path", '"%s"' % join( 
        
                               platform.get_package_dir("toolchain-xtensa"), "bin"), 
        
                           "--out", "$TARGET" 
        
                       ] + gzip_switch), "Building $TARGET"), 
        
                       suffix=".bin" 
        
                   ) 
        
               ) 
        
           )

-fno-inline-functions only for .c files and does not apply to i2s code

I may not be properly discerning the difference in the binaries here, it's fairly opaque and I'm having a hard time telling where which functions live even when looking at load addresses and segment ranges. I can provide both binaries if required (but have to scrub WiFi passwords from them first) if you want to do your own analysis.

Yet again, I would not know where to look as I still not quite sure what is the actual issue. Suppose IRAM for I2S can be changed as suggested above, but class change only applies to the class and not the i2s_... implementation funcs it calls.

iirc manually pre-caching hot path of the app might be the solution
#6628
#6559
(ref. https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_spi_utils.cpp /PRECACHE/, /precache/, etc.)

Or, rewriting .ld script to forcibly move certain funcs to always be in IRAM. Mind the MMU option, though, as your app already severely limits IRAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to force I2S routines into IRAM #9115

Add option to force I2S routines into IRAM #9115

kleinesfilmroellchen commented Mar 30, 2024

mcspr commented Mar 31, 2024

kleinesfilmroellchen commented Apr 2, 2024

mcspr commented Apr 2, 2024

Add option to force I2S routines into IRAM #9115

Add option to force I2S routines into IRAM #9115

Comments

kleinesfilmroellchen commented Mar 30, 2024

Basic Infos

Platform

Settings in IDE

Problem Description

MCVE

Debug Messages

mcspr commented Mar 31, 2024

kleinesfilmroellchen commented Apr 2, 2024

mcspr commented Apr 2, 2024