-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] Make otel as a default logsEngine #614
Conversation
Yess!!! Please make this happen! It's time for us to push filelog receiver to the forefront!!! We already use it on any Splunk Enterprise or Splunk Cloud customer looking to monitor Kubernetes! |
I'd like to see open-telemetry/opentelemetry-collector-contrib#17846 resolved first. To a lesser extent, open-telemetry/opentelemetry-collector-contrib#17308 should be fixed as well. |
Makes sense! I'll close the PR for now. |
I really think that first issue on filelog rotation we should be careful taking that on as our problem. Need to research with cloud providers to up that file rotation as the setting is silly. I would also say making file based buffer for persistence queue an option is needed as well |
Actually increasing the log's max size can increase the problem I described in that issue. It will be more visible when doing a fresh installation as none of the log files would be consumed.
I am already working on that one (https://github.com/harshit-splunk/splunk-otel-collector-chart/tree/persistent_queue). |
Interesting, so you think leaving smaller chunks and just finding them faster is all we need? If we can't keep up could one not just increase resources on the collector? Would the answer be somewhere in the middle? where rotation maybe doesn't go to 1Gi, but gets to a place where rotation isn't happening in a matter of seconds? I wonder if theres an impact to the node/kublet of having handle those rotate jobs at the speed of light....the rotation happening that fast also risks those files rotating off the node completely, which may undermine the bit of persistence we can work with in case of issues or failure... Let me know if you think I should reach out to the cloud vendors to campaign for some control for users. |
Yup. Once any file is consumed, go ahead and consume the next available file. The list of available files will be refreshed at each The example I gave earlier will not happen frequently. It is bursty in nature. So, the agent will keep up eventually.
Exactly, the max size should be such that it should give the agent some breathing time to catch up.
I don't have the expertise to answer this. But from my observation using |
@harshit-splunk is this fix been addressing the issue # fluent/fluentd#3882 ?? If yes can you please let me know how could I test this and what are the parameters needs to be followed? |
@SriramDuvvuri you can already use otel logs collection instead of fluentd. See https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#logs-collection |
filelog receiver is promoted to beta (open-telemetry/opentelemetry-collector-contrib#15355) recently.
Also, I have observed an increasing number of issues with
in_tail
fluentd plugin. It basically stops collecting the data and resumes when the process is restarted manually. The root cause is unknown yet. Listing down some of the issues:So, it may be a good time to make otel as a default logsEngine.