Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loosing information when noobaa log level is set to all, when rate limit is on (rsyslog) #8606

Open
rkomandu opened this issue Dec 18, 2024 · 3 comments
Assignees
Labels

Comments

@rkomandu
Copy link
Collaborator

rkomandu commented Dec 18, 2024

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

noobaa 5.17.1 stage upstream rpm (not related to this though exactly) + efix for DI issue
noobaa-core-5.17.1-20241211-stage_5_17_2.el9.ppc64le.rpm

Actual behavior

When DEBUGLEVEL : all in configuration, idea is to have all the logs collected , which will be useful for the debug purposes. However it was observed that due to rate limit on rsyslog (10min) interval, the logging is lost in noobaa.log

Warp / IO upload workload is running and FOFB (HA capability) is tested which observed the messages are missed as shown in rsyslog "supressed" eventhough the value should be "0" as per noobaa_syslog.conf file


 rsyslog.service - System Logging Service
     Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; preset: enabled)
     Active: active (running) since Wed 2024-12-11 11:42:53 EST; 6 days ago
       Docs: man:rsyslogd(8)
             https://www.rsyslog.com/doc/
   Main PID: 991 (rsyslogd)
      Tasks: 3 (limit: 417394)
     Memory: 638.5M
        CPU: 1h 12min 56.454s
     CGroup: /system.slice/rsyslog.service
             └─991 /usr/sbin/rsyslogd -n

Dec 17 22:57:58 node-gui0 rsyslogd[991]: imjournal from <node-gui0:node>: begin to drop messages due to rate-limiting
Dec 17 23:00:43 node-gui0 rsyslogd[991]: imjournal: journal files changed, reloading...  [v8.2102.0-113.el9_2.2 try https://www.rsyslog.com/e/0 ]
Dec 17 23:00:43 node-gui0 rsyslogd[991]: imjournal: journal files changed, reloading...  [v8.2102.0-113.el9_2.2 try https://www.rsyslog.com/e/0 ]
Dec 17 23:02:57 node-gui0 rsyslogd[991]: imjournal: 15286 messages lost due to rate-limiting (20000 allowed within 600 seconds)
Dec 17 23:07:58 node-gui0 rsyslogd[991]: imjournal from <node-gui0:node>: begin to drop messages due to rate-limiting
Dec 17 23:12:58 node-gui0 rsyslogd[991]: imjournal: 16881 messages lost due to rate-limiting (20000 allowed within 600 seconds)
Dec 17 23:18:12 node-gui0 rsyslogd[991]: imjournal: journal files changed, reloading...  [v8.2102.0-113.el9_2.2 try https://www.rsyslog.com/e/0 ]
Dec 17 23:18:12 node-gui0 rsyslogd[991]: imjournal from <node-gui0:node>: begin to drop messages due to rate-limiting
Dec 17 23:18:12 node-gui0 rsyslogd[991]: imjournal: journal files changed, reloading...  [v8.2102.0-113.el9_2.2 try https://www.rsyslog.com/e/0 ]
Dec 17 23:27:37 node-gui0 rsyslogd[991]: imjournal: 9638 messages lost due to rate-limiting (20000 allowed within 600 seconds)

noobaa service file /etc/rsyslog.d/noobaa_syslog.conf

# NooBaa syslog setting

$umask 0000
$FileCreateMode 0644
$EscapeControlCharactersOnReceive off

$imjournalRatelimitInterval 0
$imjournalRatelimitBurst 0

# Provides UDP/TCP forwarding. The IP/DNS is the server's IP/DNS address
# This is an example of sending everything except NooBaa logs using UDP
# When changing this format make sure to change the relevant functions in os_utils
#if $syslogfacility-text != 'local0' then @192.168.1.108:514

# For servers
local0.*        /var/log/noobaa.log
&stop

# For events
local2.*        /var/log/noobaa_events.log
&stop

# For clients
local1.*        /var/log/client_noobaa.log
&stop

noobaa.log (for example)

Below the test assert/sigsegv issued at 08:32 and at 08:34:01 noobaa service restarted as per mmfs.log , however in noobaa.log on the node has entries from the 08:37, which indicates the noobaa startup messages are lost which could be key for us to debug the environment in case of issues.

Dec 16 08:27:08 node-gui0 [251554]: [nsfs/251554]    [L1] core.util.file_writer:: FileWriter._write_to_file: buffers 12 size 196608 offset 13008896
Dec 16 08:27:08 node-gui0 [251554]: 2024-12-16 08:27:08.985820 [PID-251554/TID-251608] [L1] FS::FSWorker::Execute: FileWritev _wrap->_path=/ibm/fvt_fs/s3user-18002-dir/warp-16dec-sigsegv-efix-long-mixed-womultipart/.noobaa-nsfs_6760280d930655555d130355/multipart-uploads/b44915bf-3453-4dce-8b9d-638e47792695/part
s-size-16777216 _total_len=458752 buffers_len=28 _offset=40222720  took: 0.042592 ms
Dec 16 08:27:08 node-gui0 [251554]: 2024-12-16 08:27:08.985839 [PID-251554/TID-251554] [L1] FS::FSWorker::Begin: FileWritev _wrap->_path=/ibm/fvt_fs/s3user-18002-dir/warp-16dec-sigsegv-efix-long-mixed-womultipart/.noobaa-nsfs_6760280d930655555d130355/multipart-uploads/644a4627-c4e5-49a9-a7c6-c9dbd16db03b/parts-
size-16777216 _total_len=196608 buffers_len=12 _offset=13008896
Dec 16 08:27:08 node-gui0 [251554]: 2024-12-16 08:27:08.985859 [PID-251554/TID-251600] [L1] FS::FSWorker::Execute: FileWritev _wrap->_path=/ibm/fvt_fs/s3u
ser-18002-dir/warp-16dec-sigsegv-efix-long-mixed-womultipart/.noobaa-nsfs_6760280d930655555d130355/multipart-uploads/644a4627-c4e5-49a9-a7c6-c9dbd16db03b/parts-size-16777216 _total_len=196608 buffers_len=12 _offset=13008896 _uid=18002 _gid=18000 _backend=GPFS
Dec 16 08:37:13 node-gui0 [295572]: [nsfs/295572]    [L1] core.util.buffer_utils:: BufferPool.get_buffer: sem value 3800039424 waiting_value 0 buffers length 1
Dec 16 08:37:13 node-gui0 [295572]: 2024-12-16 08:37:13.322273 [PID-295572/TID-295572] [L1] FS::FSWorker::Begin: FileRead _wrap->_path=/ibm/fvt_fs/s3user-
18002-dir/warp-16dec-sigsegv-efix-long-mixed-womultipart/eLyxipgo/66.5hm1LnVyXgfhbpWI.rnd _wrap->_fd=24 _pos=125829120 _offset=0 _len=8388608
Dec 16 08:37:13 node-gui0 [295572]: 2024-12-16 08:37:13.322304 [PID-295572/TID-295615] [L1] FS::FSWorker::Execute: FileRead _wrap->_path=/ibm/fvt_fs/s3use
r-18002-dir/warp-16dec-sigsegv-efix-long-mixed-womultipart/eLyxipgo/66.5hm1LnVyXgfhbpWI.rnd _wrap->_fd=24 _pos=125829120 _offset=0 _len=8388608 _uid=18002 _gi
d=18000 _backend=GPFS

Expected behavior

Shouldn't loose message logging to the noobaa.log file . Is there a way to ensure that no messages are lost when running in "all" mode , as the idea is to collect as much information as possible during enabled logging.

Steps to reproduce

Run IO workload
issue test assert / test sigsegv
observe noobaa.log and rsyslog status for these messages

More information - Screenshots / Logs / Other output

Will upload the relevant logs to the box folder https://ibm.ent.box.com/folder/298865317839

@rkomandu rkomandu added the NS-FS label Dec 18, 2024
@romayalon
Copy link
Contributor

@naveenpaul1 can you please take a look?

@naveenpaul1 naveenpaul1 self-assigned this Dec 18, 2024
@rkomandu
Copy link
Collaborator Author

@naveenpaul1 , as we discussed, could we fix this if there are relevant settings for the rsyslog env. Same we will try to move to the syslog-ng

@naveenpaul1
Copy link
Contributor

naveenpaul1 commented Jan 27, 2025

@rkomandu with these configurations, we are making sure that the log rate limit should not affect the rsyslog logs

$imjournalRatelimitInterval 0
$imjournalRatelimitBurst 0

Could you please also update the journald configuration that set rate limit for journald.
increase or remove limit for RateLimitBurst and RateLimitInterval for systemd-journald

config path : /etc/systemd/journald.conf
RateLimitBurst=100000 or RateLimitBurst=0(no limitation)
RateLimitInterval=0(no limitation)

doc : Referance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants