-
Notifications
You must be signed in to change notification settings - Fork 793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[otlp] Grpc Status check and retry #6000
[otlp] Grpc Status check and retry #6000
Conversation
…angaraj/opentelemetry-dotnet into rajrang/otlpAddOthExp
{ | ||
var nextRetryDelayMilliseconds = retryDelayMilliseconds; | ||
|
||
if (IsDeadlineExceeded(response.DeadlineUtc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code in this method is same as TryGetRetryResult
...xporter.OpenTelemetryProtocol/Implementation/Transmission/OtlpExporterTransmissionHandler.cs
Show resolved
Hide resolved
...Telemetry.Exporter.OpenTelemetryProtocol/Implementation/ExportClient/ExportClientResponse.cs
Show resolved
Hide resolved
...metry.Exporter.OpenTelemetryProtocol/Implementation/ExportClient/BaseOtlpGrpcExportClient.cs
Outdated
Show resolved
Hide resolved
...metry.Exporter.OpenTelemetryProtocol/Implementation/ExportClient/Grpc/GrpcProtocolHelpers.cs
Outdated
Show resolved
Hide resolved
@@ -39,7 +39,7 @@ public override ExportClientResponse SendExportRequest(OtlpCollector.ExportMetri | |||
{ | |||
OpenTelemetryProtocolExporterEventSource.Log.FailedToReachCollector(this.Endpoint, ex); | |||
|
|||
return new ExportClientGrpcResponse(success: false, deadlineUtc: deadlineUtc, exception: ex); | |||
return new ExportClientGrpcResponse(false, deadlineUtc, ex, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Easier to read if you use named parameters.
return new ExportClientGrpcResponse(success: false, deadlineUtc: deadlineUtc, exception: ex, status: null, grpcStatusDetailsHeader: null);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be removed as follow up, hence kept simple.
trailingHeaders = httpResponse.TrailingHeaders(); | ||
status = GrpcProtocolHelpers.GetResponseStatus(httpResponse, trailingHeaders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did this work on line 66 do we need to do it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second call is necessary because trailing headers might not be fully available until the response stream is consumed. While we retrieve the status initially, gRPC often sends critical information like error details or final statuses in trailing headers, which can only be reliably accessed after reading the response body. This additional check ensures we handle cases where the gRPC status or errors are deferred until the stream is processed, improving robustness and accuracy. This follows the same logic as in Grpc library - https://github.com/grpc/grpc-dotnet/blob/5a58c24efc1d0b7c5ff88e7b0582ea891b90b17f/src/Grpc.Net.Client/Internal/GrpcCall.cs#L465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I pushed a commit to capture that info as a comment
} | ||
|
||
public Status? Status { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we possibly switch this to also be non-nullable Status
? I think it would improve the code readability to always have a status. Seems like there are only a couple spots where null
gets passed but most paths define/pass something. We could introduce canned/static codes for the null
ones? Not going to block for this, if you agree consider as maybe a follow-up/refactor 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes Part of #5730
Design discussion issue #
Changes
Please provide a brief description of the changes here.
The current logic checks the HTTP response status for gRPC and makes retry decisions based on it, which is incorrect. We need to read the
grpc-status
header to understand the status of the gRPC call. These changes are based on the following reference: https://github.com/grpc/grpc-dotnet/blob/5a58c24efc1d0b7c5ff88e7b0582ea891b90b17f/src/Grpc.Net.Client/Internal/GrpcCall.cs#L465Additionally, the
grpc-status-details-bin
header value needs to be deserialized during retries to determine theGrpcRetryDelay
for the retry logic.Updated both in-memory and persistent storage tests to confirm there is no change in behavior.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial changes