Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[otlp] Grpc Status check and retry #6000

Merged

Conversation

rajkumar-rangaraj
Copy link
Contributor

Fixes Part of #5730
Design discussion issue #

Changes

Please provide a brief description of the changes here.

The current logic checks the HTTP response status for gRPC and makes retry decisions based on it, which is incorrect. We need to read the grpc-status header to understand the status of the gRPC call. These changes are based on the following reference: https://github.com/grpc/grpc-dotnet/blob/5a58c24efc1d0b7c5ff88e7b0582ea891b90b17f/src/Grpc.Net.Client/Internal/GrpcCall.cs#L465

Additionally, the grpc-status-details-bin header value needs to be deserialized during retries to determine the GrpcRetryDelay for the retry logic.

Updated both in-memory and persistent storage tests to confirm there is no change in behavior.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@rajkumar-rangaraj rajkumar-rangaraj requested a review from a team as a code owner November 23, 2024 01:14
{
var nextRetryDelayMilliseconds = retryDelayMilliseconds;

if (IsDeadlineExceeded(response.DeadlineUtc))
Copy link
Contributor Author

@rajkumar-rangaraj rajkumar-rangaraj Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code in this method is same as TryGetRetryResult

@github-actions github-actions bot added the pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package label Nov 23, 2024
Copy link

codecov bot commented Nov 23, 2024

Codecov Report

Attention: Patch coverage is 50.32680% with 76 lines in your changes missing coverage. Please review.

Project coverage is 85.00%. Comparing base (f9a0b4c) to head (6c582eb).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ation/ExportClient/ProtobufOtlpGrpcExportClient.cs 29.87% 54 Missing ⚠️
...tation/OpenTelemetryProtocolExporterEventSource.cs 64.28% 10 Missing ⚠️
...yProtocol/Implementation/ExportClient/OtlpRetry.cs 66.66% 8 Missing ⚠️
...mentation/ExportClient/Grpc/GrpcProtocolHelpers.cs 66.66% 2 Missing ⚠️
...rotocol/Implementation/ExportClient/Grpc/Status.cs 0.00% 1 Missing ⚠️
...on/Transmission/OtlpExporterTransmissionHandler.cs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6000      +/-   ##
==========================================
+ Coverage   84.00%   85.00%   +0.99%     
==========================================
  Files         275      275              
  Lines       12582    12710     +128     
==========================================
+ Hits        10570    10804     +234     
+ Misses       2012     1906     -106     
Flag Coverage Δ
unittests-Project-Experimental 84.89% <50.32%> (+0.90%) ⬆️
unittests-Project-Stable 84.78% <50.32%> (+0.79%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...mentation/ExportClient/BaseOtlpGrpcExportClient.cs 84.00% <100.00%> (+5.05%) ⬆️
...mentation/ExportClient/ExportClientGrpcResponse.cs 100.00% <100.00%> (ø)
...mplementation/ExportClient/ExportClientResponse.cs 100.00% <ø> (ø)
...ementation/ExportClient/OtlpGrpcLogExportClient.cs 83.33% <100.00%> (ø)
...tation/ExportClient/OtlpGrpcMetricsExportClient.cs 83.33% <100.00%> (ø)
...entation/ExportClient/OtlpGrpcTraceExportClient.cs 83.33% <100.00%> (-8.34%) ⬇️
...ation/ExportClient/ProtobufOtlpHttpExportClient.cs 90.90% <100.00%> (+90.90%) ⬆️
...mission/ProtobufOtlpExporterTransmissionHandler.cs 87.09% <100.00%> (+87.09%) ⬆️
...rotocol/Implementation/ExportClient/Grpc/Status.cs 36.36% <0.00%> (+36.36%) ⬆️
...on/Transmission/OtlpExporterTransmissionHandler.cs 73.52% <0.00%> (-20.59%) ⬇️
... and 4 more

... and 11 files with indirect coverage changes

@@ -39,7 +39,7 @@ public override ExportClientResponse SendExportRequest(OtlpCollector.ExportMetri
{
OpenTelemetryProtocolExporterEventSource.Log.FailedToReachCollector(this.Endpoint, ex);

return new ExportClientGrpcResponse(success: false, deadlineUtc: deadlineUtc, exception: ex);
return new ExportClientGrpcResponse(false, deadlineUtc, ex, null, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Easier to read if you use named parameters.

return new ExportClientGrpcResponse(success: false, deadlineUtc: deadlineUtc, exception: ex, status: null, grpcStatusDetailsHeader: null);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed as follow up, hence kept simple.

Comment on lines +91 to +92
trailingHeaders = httpResponse.TrailingHeaders();
status = GrpcProtocolHelpers.GetResponseStatus(httpResponse, trailingHeaders);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did this work on line 66 do we need to do it again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second call is necessary because trailing headers might not be fully available until the response stream is consumed. While we retrieve the status initially, gRPC often sends critical information like error details or final statuses in trailing headers, which can only be reliably accessed after reading the response body. This additional check ensures we handle cases where the gRPC status or errors are deferred until the stream is processed, improving robustness and accuracy. This follows the same logic as in Grpc library - https://github.com/grpc/grpc-dotnet/blob/5a58c24efc1d0b7c5ff88e7b0582ea891b90b17f/src/Grpc.Net.Client/Internal/GrpcCall.cs#L465

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I pushed a commit to capture that info as a comment

}

public Status? Status { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly switch this to also be non-nullable Status? I think it would improve the code readability to always have a status. Seems like there are only a couple spots where null gets passed but most paths define/pass something. We could introduce canned/static codes for the null ones? Not going to block for this, if you agree consider as maybe a follow-up/refactor 😄

Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CodeBlanch CodeBlanch merged commit 88d2ad6 into open-telemetry:main Nov 26, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:OpenTelemetry.Exporter.OpenTelemetryProtocol Issues related to OpenTelemetry.Exporter.OpenTelemetryProtocol NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants