Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syslog logger #1267

Merged
merged 4 commits into from
Sep 26, 2024
Merged

Add syslog logger #1267

merged 4 commits into from
Sep 26, 2024

Conversation

ranma
Copy link

@ranma ranma commented Sep 20, 2024

Implements syslog logging.

See issue #1194

Copy link
Member

@schlimmchen schlimmchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising! There are a couple of issues, though. The string-copying must be addressed for sure.

src/MessageOutput.cpp Outdated Show resolved Hide resolved
src/MessageOutput.cpp Outdated Show resolved Hide resolved
src/SyslogLogger.cpp Outdated Show resolved Hide resolved
include/Configuration.h Outdated Show resolved Hide resolved
include/SyslogLogger.h Outdated Show resolved Hide resolved
include/defaults.h Outdated Show resolved Hide resolved
This implements RFC5424 version of the protocol.

I didn't use https://github.com/arcao/Syslog since the protocol itself
is trivial and most of the libraries functionality is not needed here.
The library also doesn't support setting the PROCID field, which is set
to a random id to indicate a reboot here.
@madmartin
Copy link

I want to let you know that I have tested it yesterday and was quite happy with the result!
The only downside I noticed is that every log line has to "??" at the and - I dont really understand if the new push on this branch will change this (have no possibility to test it today),
Just want to give you a thumbs up!
And I can delete my unfinished "add_syslog" branch in my local git....

@ranma
Copy link
Author

ranma commented Sep 26, 2024

The only downside I noticed is that every log line has to "??" at the and - I dont really understand if the new push on this

Yes, that is fixed by if (c != '\r' && c != '\n') {. Initially I just dropped all control characters, when I added the replacement with '?' I initially forgot to still drop '\r' and '\n' an re-pushed the update shortly after.

@ranma ranma requested a review from schlimmchen September 26, 2024 18:46
@schlimmchen
Copy link
Member

I am not yet happy with this. Look at what the BOM does:

image

Have a look at the wireshark trace: syslog.zip

I guess, since we did decide to replace non-ASCII non-human-readable chars with question marks, we can also ditch the BOM.

Better:
image

Then, I don't understand why we would choose KERNEL and DEBUG. I think facility USER and level INFO are better suited.

Other than that this is awesome ❤️

I am merging this, including the changes I made. Please don't hesitate to push back if these changes need to be discussed.

@schlimmchen schlimmchen merged commit 20159f3 into hoylabs:development Sep 26, 2024
8 checks passed
@ranma
Copy link
Author

ranma commented Sep 27, 2024

I am not yet happy with this. Look at what the BOM does:

At least my version of rsyslogd handles the BOM fine. Though it looks like technically it is only required if UTF-8 characters are present, so since we are filtering those out it can be dropped: If a syslog application encodes MSG in UTF-8, the string MUST start with the Unicode byte order mask (BOM), which for UTF-8 is ABNF %xEF.BB.BF.

Then, I don't understand why we would choose KERNEL and DEBUG. I think facility USER and level INFO are better suited.

Either works I guess :)
Since OpenDTU is basically running almost baremetal with very little "OS" in the way, I used KERNEL.

@ranma ranma deleted the syslog branch September 27, 2024 19:39
@ranma
Copy link
Author

ranma commented Sep 28, 2024

Weirdly enough, I just got some corruption:

2024-09-28T15:34:15.014995+02:00 OpenDTU-347AF0 O#000#000#000#000TU[7c9a7353] Network connected

Though after rebooting the ESP it is now fine:

2024-09-28T15:34:39.786338+02:00 OpenDTU-347AF0 OpenDTU[13bc85e2] Network connected

Given that exactly 4 bytes were zero, I wonder if something did a stray word write that happened to hit the fixed _header string...

@schlimmchen
Copy link
Member

Unfortunately, I am also seeing issues. And I think they are related to the syslogger implementation. However, that's just a guess.

Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). 
Debug exception reason: Stack canary watchpoint triggered (async_tcp) 
Core  1 register dump:
PC      : 0x42117cad  PS      : 0x00060236  A0      : 0x82112bdc  A1      : 0x3fcbc160  
A2      : 0x3fcbc4f8  A3      : 0x00000086  A4      : 0x00000001  A5      : 0x0000000c  
A6      : 0x3fcbc170  A7      : 0x00000008  A8      : 0x00000001  A9      : 0x3fcbc170  
A10     : 0x0000005a  A11     : 0x0000002a  A12     : 0x00000078  A13     : 0x3c1b35a0  
A14     : 0x00000078  A15     : 0xff000000  SAR     : 0x0000000b  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x400556d5  LEND    : 0x400556e5  LCOUNT  : 0xffffffff  


Backtrace: 0x42117caa:0x3fcbc160 0x42112bd9:0x3fcbc470 0x420761e6:0x3fcbc530 0x420601fd:0x3fcbc550 0x420602fe:0x3fcbc5e0 0x420609f2:0x3fcbc620 0x4205bd63:0x3fcbc750 0x4204f2d6:0x3fcbc780 0x4205c1f5:0x3fcbc7a0 0x42
04f2fd:0x3fcbc7e0 0x4205c1f5:0x3fcbc800 0x4204f2fd:0x3fcbc840 0x4205c1f5:0x3fcbc860 0x4204f2fd:0x3fcbc8a0 0x4205c1f5:0x3fcbc8c0 0x4204f2fd:0x3fcbc900 0x4205c1f5:0x3fcbc920 0x4204f2fd:0x3fcbc960 0x4205c1f5:0x3fcbc9
80 0x4204f2fd:0x3fcbc9c0 0x4205c1f5:0x3fcbc9e0 0x4204f2fd:0x3fcbca20 0x4205c1f5:0x3fcbca40 0x4204f2fd:0x3fcbca80 0x4205c1f5:0x3fcbcaa0 0x4204f2fd:0x3fcbcae0 0x4205c1f5:0x3fcbcb00 0x4204f2fd:0x3fcbcb40 0x4205c1f5:0
x3fcbcb60 0x4204f2fd:0x3fcbcba0 0x4205c1f5:0x3fcbcbc0 0x4204f2fd:0x3fcbcc00 0x4205c1f5:0x3fcbcc20 0x4204f2fd:0x3fcbcc60 0x4205c1f5:0x3fcbcc80 0x4204f2fd:0x3fcbccc0 0x4205c1f5:0x3fcbcce0 0x4204f2fd:0x3fcbcd20 0x420
5c1f5:0x3fcbcd40 0x4204f2fd:0x3fcbcd80 0x4205c1f5:0x3fcbcda0 0x4204f2fd:0x3fcbcde0 0x4205c1f5:0x3fcbce00 0x4204f2fd:0x3fcbce40 0x4205c1f5:0x3fcbce60 0x4204f2fd:0x3fcbcea0 0x4205c1f5:0x3fcbcec0 0x4204f2fd:0x3fcbcf0
0 0x4205c1f5:0x3fcbcf20 0x4204f2fd:0x3fcbcf60 0x4205c1f5:0x3fcbcf80 0x4204f2fd:0x3fcbcfc0 0x4205c1f5:0x3fcbcfe0 0x4204f2fd:0x3fcbd020 0x4205c1f5:0x3fcbd040 0x4204f2fd:0x3fcbd080 0x4205c1f5:0x3fcbd0a0 0x4204f2fd:0x
3fcbd0e0 0x4205c1f5:0x3fcbd100 0x4204f2fd:0x3fcbd140 0x4205c1f5:0x3fcbd160 0x4204f2fd:0x3fcbd1a0 0x4205c1f5:0x3fcbd1c0 0x4204f2fd:0x3fcbd200 0x4205c1f5:0x3fcbd220 0x4204f2fd:0x3fcbd260 0x4205c1f5:0x3fcbd280 0x4204
f2fd:0x3fcbd2c0 0x4205c1f5:0x3fcbd2e0 0x4204f2fd:0x3fcbd320 0x4205c1f5:0x3fcbd340 0x4204f2fd:0x3fcbd380 0x4205c1f5:0x3fcbd3a0 0x4204f2fd:0x3fcbd3e0 0x4205c1f5:0x3fcbd400 0x4204f2fd:0x3fcbd440 0x4205c1f5:0x3fcbd460
 0x4204f2fd:0x3fcbd4a0 0x4205c1f5:0x3fcbd4c0 0x4204f2fd:0x3fcbd500 0x4205c1f5:0x3fcbd520 0x4204f2fd:0x3fcbd560 0x4205c1f5:0x3fcbd580 0x4204f2fd:0x3fcbd5c0 0x4205c1f5:0x3fcbd5e0 0x4204f2fd:0x3fcbd620 0x4205c1f5:0x3
fcbd640 0x4204f2fd:0x3fcbd680 0x4205c1f5:0x3fcbd6a0 0x4204f2fd:0x3fcbd6e0 0x4205c1f5:0x3fcbd700 0x4204f2fd:0x3fcbd740 0x4205c1f5:0x3fcbd760 0x4204f2fd:0x3fcbd7a0 0x4205c1f5:0x3fcbd7c0 0x4204f2fd:0x3fcbd800 0x4205c
1f5:0x3fcbd820 0x4204f2fd:0x3fcbd860 0x4205c1f5:0x3fcbd880 0x4204f2fd:0x3fcbd8c0 0x4205c1f5:0x3fcbd8e0 |<-CONTINUES

ELF file SHA256: 18da0f49da6d08d2

This is the backtrace decoded (which reads like a recursion issue):

Error: '0x42' does not match the expected format.
0x42117caa: _svfprintf_r at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/newlib/newlib/libc/stdio/vfprintf.c:1543
0x42112bd9: sprintf at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/newlib/newlib/libc/stdio/sprintf.c:618
0x420761e6: MD5Builder::getChars(char*) at /home/schlimmchen/.platformio/packages/framework-arduinoespressif32/cores/esp32/MD5Builder.cpp:108 (discriminator 2)
0x420601fd: getMD5(unsigned char*, unsigned short, char*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebAuthentication.cpp:75
0x420602fe: stringMD5(String const&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebAuthentication.cpp:113
0x420609f2: checkDigestAuthentication(char const*, char const*, char const*, char const*, char const*, bool, char const*, char const*, char const*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebAuthentication.cpp:238 (discriminator 11)
0x4205bd63: AsyncWebServerRequest::authenticate(char const*, char const*, char const*, bool) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:742
0x4204f2d6: AuthenticationMiddleware::allowed(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:648
 (inlined by) AuthenticationMiddleware::allowed(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:647
 (inlined by) AuthenticationMiddleware::run(AsyncWebServerRequest*, std::function<void ()>) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:652
0x4205c1f5: AsyncMiddlewareChain::_runChain(AsyncWebServerRequest*, std::function<void ()>)::{lambda()#1}::operator()() const at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:623
 (inlined by) std::_Function_handler<void (), AsyncMiddlewareChain::_runChain(AsyncWebServerRequest*, std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297

I saw this twice now. The last time it was not a "stack canary watchpoint triggered" but something else. This could also be an upstream issue, but a quick search did not show anything recent that could be related. And it could be the same issue you saw where the header got corrupted. I don't know how the async_tcp task could have enough stack in the upstream project, but not in ours. Very strange.

Also: logrotate...

-rw-r----- 1 root adm 56M Sep 28 17:15 /var/log/syslog-udp.log

That's on the user, of course.

@ranma
Copy link
Author

ranma commented Sep 28, 2024

async_tcp

In theory this should be unrelated unless the udp packet sending is also handled on the async_tcp task. Haven't seen any Guru Meditation related to the syslog logging myself so far, but during development I only had it on intermittently.
I did flash it onto the production unit early and its been running stable with syslog enabled for 3 hours now (but I did disable some currently unneeded logging to keep the flood down a bit, so its basically only the DPL and NRF messages).

@schlimmchen
Copy link
Member

I often see these "could not send data" messages right before the ESP restarts:

[598606][E][WiFiUdp.cpp:185] endPacket(): could not send data: 12
[598612][E][WiFiUdp.cpp:185] endPacket(): could not send data: 12
Connecting to MQTT...
[598666][E][WiFiUdp.cpp:185] endPacket(): could not send data: 12
All missing
Nothing received, resend count exeeded

abort() was called at PC 0x42105fc3 on core 1


Backtrace: 0x403780de:0x3fcebca0 0x4037d615:0x3fcebcc0 0x403841d9:0x3fcebce0 0x42105fc3:0x3fcebd60 0x4210600a:0x3fcebd80 0x421061e5:0x3fcebda0 0x421062a0:0x3fcebdc0 0x4206d231:0x3fcebde0 0x4206d435:0x3fcebe00 0x42
06cf96:0x3fcebe40 0x42011f81:0x3fcebe70 0x42125ecd:0x3fcebec0 0x42054016:0x3fcebee0 0x42054191:0x3fcebf00 0x420541aa:0x3fcebf20 0x4207a9b5:0x3fcebf40

ELF file SHA256: 18da0f49da6d08d2

Backtrace:

0x403780de: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d615: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x403841d9: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
0x42105fc3: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
0x4210600a: std::terminate() at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
0x421061e5: __cxa_allocate_exception at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:300
0x421062a0: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/new_op.cc:54
0x4206d231: __gnu_cxx::new_allocator<std::_List_node<ChannelNum_t> >::allocate(unsigned int, void const*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:111
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<ChannelNum_t> > >::allocate(std::allocator<std::_List_node<ChannelNum_t> >&, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:436
 (inlined by) std::__cxx11::_List_base<ChannelNum_t, std::allocator<ChannelNum_t> >::_M_get_node() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:450
 (inlined by) std::_List_node<ChannelNum_t>* std::__cxx11::list<ChannelNum_t, std::allocator<ChannelNum_t> >::_M_create_node<ChannelNum_t const&>(ChannelNum_t const&) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:642
 (inlined by) void std::__cxx11::list<ChannelNum_t, std::allocator<ChannelNum_t> >::_M_insert<ChannelNum_t const&>(std::_List_iterator<ChannelNum_t>, ChannelNum_t const&) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:1903
 (inlined by) std::__cxx11::list<ChannelNum_t, std::allocator<ChannelNum_t> >::push_back(ChannelNum_t const&) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:1220
 (inlined by) StatisticsParser::getChannelsByType[abi:cxx11](ChannelType_t) const at /home/schlimmchen/Documents/OpenDTU-OnBattery/lib/Hoymiles/src/parser/StatisticsParser.cpp:297
0x4206d435: calcTotalYieldTotal(StatisticsParser*, unsigned char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/lib/Hoymiles/src/parser/StatisticsParser.cpp:394

I agree that these seem to be unrelated, but they started happening to me quite often and AFAIK right after I started using the Syslog feature.

@schlimmchen
Copy link
Member

schlimmchen commented Sep 28, 2024

image

I will fix this shortly.

Edit: Fixed in d0ba065.

@schlimmchen
Copy link
Member

One more (before I stop posting them here):

All missing
Nothing received, resend whole request
Setting Hostname... done
All missing
Nothing received, resend whole request
Configuring WiFi STA using existing credentials... done
[128312][E][WiFiClient.cpp:275] connect(): socket error on fd 48, errno: 113, "Software caused connection abort"
Configuring WiFi STA DHCP IP... done
[SyslogLogger] Disable
[SyslogLogger] Syslog not enabled
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x4205c2d8  PS      : 0x00060630  A0      : 0x8205e99d  A1      : 0x3fcbfee0  
A2      : 0x3fccd260  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcccb58  
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8205c2ed  A9      : 0x3fcbfec0  
A10     : 0x00000018  A11     : 0x3fcccf28  A12     : 0x00000000  A13     : 0x00000001  
A14     : 0x00060020  A15     : 0x00000001  SAR     : 0x0000001f  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff  


Backtrace: 0x4205c2d5:0x3fcbfee0 0x4205e99a:0x3fcbff00 0x4203937a:0x3fcbff20 0x42039392:0x3fcbff40 0x4203939d:0x3fcbff60 0x4205dc7e:0x3fcbff80 0x4205fbf5:0x3fcbffa0 0x4205bfa2:0x3fcbffc0 0x4205bfaf:0x3fcbffe0 0x42
108f42:0x3fcc0000 0x421092d1:0x3fcc0030 0x421095e7:0x3fcc0060

ELF file SHA256: 18da0f49da6d08d2

I am unable to save the network settings using the web UI. Often, an exception like this will occur. They are "random", so I guess a stack overflow is likely.

I will look out for the task size of the async_tcp task. Hopefully, we can increase it easily. If I can then save the network settings reliably, I assume the issue is gone.

The Syslog feature could have something to do with it, as the async_tcp task does print stuff through MessageOutput.

@schlimmchen
Copy link
Member

Okay, this has nothing to do with the syslog feature.

I reverted it and tried saving the network settings and the ESP32 tripped immediately. It does not trip when I use the current development branch. So there is something wrong with the current state of support for multiple inverters in the DPL. Weird...

Sorry for blaming you, @ranma

Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants