You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We have Loki deployed as the output of FluentBit and owning to the maintenance of K8S cluster, Loki stack was down as well (36 hours-ish). After bring Loki back online this morning, we found some of the logs were lost.
Please correct me if there is any misunderstanding:
We noticed that the task_id stopped at 2047 though there was incoming chunks created, this should be by design: https://github.com/fluent/fluent-bit/blob/v3.2.6/include/fluent-bit/flb_config.h#L291. Each chunk will be processed by each task_id, taking the chunk size into consideration, the maximum buffered data should be:
2048 * 2MB = 4096MB/4G (or even smaller), though storage.total_limit_size is set as 5G or larger.
Shutdown the output plugin service, Loki in our case
Wait till the task_id goes to 2047
Check if there is any log files missed after bring up output plugin
Expected behavior
FluentBit can buffer the log files as much as it can be per the setting in storage.total_limit_size while output service is down.
Your Environment
Version used: 3.2.6
Environment name and version (e.g. Kubernetes? What version?): Linux
Server type and version: Linux
Operating System and version: RHEL 8.0
The text was updated successfully, but these errors were encountered:
Just some update:
1> As we observed, when use storage.type: filesystem, the local fs_chunk file size may diff from 4k, 36k or at maximum 2M, do you know why and how this happened as we keep tailing same logs files?
2> As mentioned in #9966 (comment), question 2, curl cmd shows the total_chunks is 2099 this time after about 40h without output. As the total file size (data/tail.1/*) is about 132M, still confused with unmapping number of task_id and local chunk files.
3> Could we add a new configurable parameter, such as max_flb_task in service. Default value is 2048 as https://github.com/fluent/fluent-bit/blob/v3.2.6/include/fluent-bit/flb_config.h#L291 defined and also could be changed per need. The actual number of taskid is depends on storage.total_limit_size and max_flb_task which reached first. Any concern or suggestion of this change?
Bug Report
Describe the bug
We have Loki deployed as the output of FluentBit and owning to the maintenance of K8S cluster, Loki stack was down as well (36 hours-ish). After bring Loki back online this morning, we found some of the logs were lost.
Below is the configuration in our environment:
Please correct me if there is any misunderstanding:
We noticed that the
task_id
stopped at 2047 though there was incoming chunks created, this should be by design: https://github.com/fluent/fluent-bit/blob/v3.2.6/include/fluent-bit/flb_config.h#L291. Each chunk will be processed by each task_id, taking the chunk size into consideration, the maximum buffered data should be:2048 * 2MB = 4096MB/4G (or even smaller), though
storage.total_limit_size
is set as 5G or larger.Similar issues:
#8503
#8395
To Reproduce
Expected behavior
FluentBit can buffer the log files as much as it can be per the setting in
storage.total_limit_size
while output service is down.Your Environment
Version used: 3.2.6
Environment name and version (e.g. Kubernetes? What version?): Linux
Server type and version: Linux
Operating System and version: RHEL 8.0
The text was updated successfully, but these errors were encountered: