-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback/file buffering when requests can't be sent #155
Comments
Hi @alex-lawrence-conf, thanks for the ticket and PR! The mechanism you're describing is very similar to what's being discussed in serilog/serilog#1791 (comment) - just a sketch, but the central idea is to implement fallbacks using shared infrastructure that multiple sinks could benefit from. The surface syntax for the most common usage would probably end up being something like: .WriteTo.FallbackChain(wt => wt.OpenTelemetry(...), wt => wt.File(...)) (Naming TBC.) In this scenario, the sink configured by There's a bit involved, and some help rolling this out would would be welcome, but I probably need to whip up a spike to get the ball rolling since there are multiple use cases we had in mind for the mechanism. I'll try to loop back in the next few days 🤞 |
Thanks @nblumhardt for the response. This mechanism sounds like the perfect solution for what I am looking for. I think the majority of what I've done here is easily generalisable by design and could rather trivially be extracted to a fallback chain in the way you describe. I think from what I've seen making the fallback for this module, an appropriate design choice would be to have sinks opt in to the fallback by exposing their own extensions on a new fallback version of the type returned by Either that, or we could just expose a result type on the sink emissions in a new IFailureLogSink interface using that enum as you've described, and then the fallback chain infrastructure in serilog can determine it's own retry protocol and pacing profile for the different methods. Just a few design questions:
public struct LogEmissionState
{
private enum EmissionState
{
Failed,
Success,
Retry,
}
private Action? _retryAction;
private EmissionState _emissionState;
... Factory methods for the combinations
} |
@nblumhardt a quick point about how this use case wouldn't be generalisable is that what I've implemented captures the exact request that would have been sent as either Protobuf or NDJson. We'd lose all of the open telemetry information captured and transformed if it just sends the original logevent down a chain external to this sink |
Hi @alsi-lawr; serilog/serilog#2108 implements a spike of this; although it might take a bit of shuffling of code, hopefully the need to carry through some OTel-specific things like resource attributes can be worked around. Keen for some feedback if you have a chance to look in on it. Thanks for the nudge on this! |
@nblumhardt is there scope to implement the fallback features in this package now? I can work on integrating some of the otel-specific things from my fork into the new framework, and work on implementing the It should be trivial to then implement the Is there a contribution style guide, by the way? |
Thanks for the ping! Returning to this, I'm not 100% clear on how usage would look, could you possibly post a short snippet showing what the configuration (in C#) would look like, and how the data would flow between sinks? For resource attributes I'm wondering if the solution to this might instead just be using enrichers on the fallback sinks to explicitly add these as regular properties 🤔 |
The way I approached this was by re-emitting a logevent with the otlp message as a single property for an empty message. I'm envisioning a simpler solution that just does exactly what you're suggesting. And a second that does what I've implemented already as a |
Thanks for the follow-up. I think to do anything specific for fallbacks in this sink we'd need to dig a bit deeper into examples of what's possible with the default approach and how the modified version would differ. If the fallback chain was targeting an OTLP sink, all of the OTLP info would/could just be re-added by specifying the same resource attributes on the fallback sink. If the fallback chain was targeting a different sink then OTLP message bodies probably wouldn't be useful. I might be missing something; let me know if so. Cheers! |
So, my use case is that we need to have a fallback that retains all information at the moment of logging, as it's being used for financial auditing. Currently, this is logged to disk as a backup mechanism if connection to our otel-collector goes down. We then have a secondary process that ingests those binary logs (newline delimited grpc messages) and attempts retries. This was important to have these logs persist in all situations without the reasonable possibility of loss. Currently, this isn't possible as we have no way to hook into the gRPC call with all the context and pre-built message on failures at all without forking the project. Re-generating the message could lose important information we require for financial audit logs. With the default implementation, we just the same context as the incoming log message before attaching resource information, so I'd need to maintain mirrored logic for that log enrichment done in this package in order to get close to the original message in another sink. |
Thanks for the follow-up! Perhaps for this use-case a local, buffering, OTLP forwarder service and |
Is your feature request related to a problem? Please describe.
When the specified OTLP endpoint is unreachable or otherwise incapable of receiving the request, it would be great to have support for a fallback to file (or a custom sink). This will ensure that no data loss would occur in the event of a failure to export, as this sink is being used in critical auditing infrastructure.
Proposal:
Change the
IExporter
interface for exports on log service requests to return some kind of information about the success/failure state of the exports.This can then feed into the sink to make a decision about whether to reroute the logs to a secondary sink (or keep it as a filesystem-only fallback), or to continue with ignoring the response.
Expose an option to configure either a filesystem fallback location or a secondary sink fallback.
Edit: upon looking into it, we'd also need to catch exceptions in the case of unreachable grpc endpoints.
Describe alternatives you've considered
Additional context
I'd be happy to have a crack at implementing this for the filesystem-only approach if there's nothing already in the works.
The text was updated successfully, but these errors were encountered: