-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Add tracing instrumentation #289
base: main
Are you sure you want to change the base?
Conversation
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
FYI, this is how a request currently looks like in my example project: InstrumentationSystem.bootstrap(FakeTracer())
let client = HTTPClient(eventLoopGroupProvider: .createNew)
var context = BaggageContext()
// Extracted from a prior request
context.fakeTraceID = "123456789"
var request = try HTTPClient.Request(url: "https://httpbin.org/anything", method: .POST, body: .string(#"{"hello":"World"}"#))
request.headers.add(name: "Content-Type", value: "application/json")
let response = try client
.execute(request: request, context: context)
.wait() In the background, [
"http.request_content_length": TracingInstrumentation.SpanAttribute.int(17),
"http.status_code": TracingInstrumentation.SpanAttribute.int(200),
"http.target": TracingInstrumentation.SpanAttribute.string("/anything"),
"http.host": TracingInstrumentation.SpanAttribute.string("httpbin.org"),
"http.status_text": TracingInstrumentation.SpanAttribute.string("OK"),
"http.scheme": TracingInstrumentation.SpanAttribute.string("https"),
"http.response_content_length": TracingInstrumentation.SpanAttribute.int(433),
"http.method": TracingInstrumentation.SpanAttribute.string("POST")
] It also added a trace header to the request containing the |
public func get(url: String, context: BaggageContext, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> { | ||
return self.execute(.GET, url: url, context: context, deadline: deadline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for backward compatibility, the context should be optional or have default (static?) value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, contexts should never be defaulted to nothing/empty - that leads to fragile APIs where dropping the traces becomes the default operation, rather than doing the right thing -- keep passing along contexts. We want our APIs to help people do the right thing. Note in a same way, Go's context values must never be defaulted to empty, we're in the same boat.
It's another discussion how and when this PR will be merged though; I do strongly believe contexts must not be optional, as that leads to breaking traces accidentally (been there, done that). See discussions in https://forums.swift.org/t/server-distributed-tracing/37464/10 and the guidelines in https://github.com/slashmo/gsoc-swift-baggage-context#argument-namingpositioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. This will, however, affect integration in a big way.
Please see my comment about an integration branch below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it will, it breaks API, we're aware.
My wording here was not very nice, sorry about that.
What I mean that this is a PoC branch and it's breaking here. An adoption plan will be laid forward, see https://forums.swift.org/t/the-context-passing-problem/39162/27 (AHC plan incomming there)
deadline: NIODeadline? = nil, | ||
logger originalLogger: Logger?) -> Task<Delegate.Response> { | ||
let logger = (originalLogger ?? HTTPClient.loggingDisabled).attachingRequestInformation(request, requestID: globalRequestID.add(1)) | ||
// TODO: net.peer.ip / Not required, but recommended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that its typically propagated in X-Forwarded-For
header
Awesome 👍 |
do you think its possible and makes sense to create |
Maybe an WDYT @Lukasa @artemredkin @tomerd ? |
public func delete(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> { | ||
return self.execute(.DELETE, url: url, deadline: deadline, logger: logger) | ||
public func delete(url: String, context: BaggageContext, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> { | ||
return self.execute(.DELETE, url: url, context: context, deadline: deadline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's an interesting question here: so... deadline
was defaulted in the public API, but on execute it's not...
Technically our rules say "context last, before defaulted" so... Technically this would mean self.execute(.DELETE, url: url, deadline: deadline, context: context)
I wonder if it matters / if it is worth it to be strict about this hm...
span.attributes.http.host = request.host | ||
if let requestContentLength = request.body?.length { | ||
span.attributes.http.requestContentLength = requestContentLength | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is http.url
attribute in OTel example
http.url | "https://example.com:8080/webshop/articles/4?s=1" |
---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reread the Otel definition for HTTP client spans. Your correct, the combination of host/scheme/target is not their recommended way for client spans: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/http.md#http-client
One of the following sets of attributes is required (in order of usual preference unless for a particular web client/framework it is known that some other set is preferable for some reason; all strings must be non-empty):
- http.url
- http.scheme, http.host, http.target
- http.scheme, net.peer.name, net.peer.port, http.target
- http.scheme, net.peer.ip, net.peer.port, http.target
Therefore I'll be replacing scheme/host/target with the full url. Thanks for pointing this out 👍
I'm fine with that. By the way, right now this patch doesn't just have API impact, it's API breaking. I'd like to find a way to avoid that if we can. |
We briefly chatted about the reasoning of having it API breaking here: slashmo/gsoc-swift-baggage-context#8 |
I don't think that conversation does a good job of making the case. |
The topic is a bit deeper than "this breaks API so it's bad", it is what is the end goal we want to head towards, and if we agree on this, we can talk more how we get there. This was touched upon in various places but as Tanner just opened up https://forums.swift.org/t/the-context-passing-problem/39162 which also relates to this, perhaps we use that forums thread to discuss this again, and then decide how to roll changes into the client here -- sounds good? I do not think the end goal should be "context is optional" that causes all kinds of headaches in "where did I drop my trace / context" that are hard to debug (been there, it's a pain), and confirmed by others in: https://forums.swift.org/t/server-distributed-tracing/37464/28?u=ktoso (last replies, and also lessons learnt from how the Go context works and is used IMO); and here where we're in agreement with Johannes that those APIs will change in the future #227 (comment) Long story short: Yes, context parameter replaces logger parameter, because of the insistence of the "pass a logger" yet context is the actual object that has tracing metadata so so it is the actual thing that needs passing around more than a logger even, but since we also want to pass loggers, we need e.g. a composite context protocol, like designed here: slashmo/gsoc-swift-baggage-context#9 - so we can pass one parameter, and not logger + context all the time. I believe it was very much known up front with Johannes's work and merging the logger parameter that there would have to be an API breaking change once tracing/logging is "done", I talked with Johannes about this at length back then. As mentioned above, let's first continue in the thread that Tanner kicked off, agree there and loop back here, sounds good? |
I added a do-not-merge marker, perhaps that was partially the confusion here as well? This PR is not intended to be merged as is. |
4d519b6
to
aee6ed8
Compare
@@ -64,7 +64,8 @@ extension HTTPClientTests { | |||
("testNoResponseWithIgnoreErrorForSSLUncleanShutdown", testNoResponseWithIgnoreErrorForSSLUncleanShutdown), | |||
("testWrongContentLengthForSSLUncleanShutdown", testWrongContentLengthForSSLUncleanShutdown), | |||
("testWrongContentLengthWithIgnoreErrorForSSLUncleanShutdown", testWrongContentLengthWithIgnoreErrorForSSLUncleanShutdown), | |||
("testEventLoopArgument", testEventLoopArgument), | |||
// TODO: Comment back in once failure was resolved | |||
// ("testEventLoopArgument", testEventLoopArgument), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs investigation as it currently leads to a failing precondition. Commented out, for now, to be able to run the full test-suite.
@@ -88,7 +89,8 @@ extension HTTPClientTests { | |||
("testUncleanShutdownCancelsTasks", testUncleanShutdownCancelsTasks), | |||
("testDoubleShutdown", testDoubleShutdown), | |||
("testTaskFailsWhenClientIsShutdown", testTaskFailsWhenClientIsShutdown), | |||
("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown), | |||
// TODO: Comment back in once failure was resolved | |||
// ("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs investigation as it currently leads to a failing assertion. Commented out, for now, to be able to run the full test-suite.
extension SpanStatus { | ||
/// Map status code to canonical code according to OTel spec | ||
/// | ||
/// - SeeAlso: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/http.md#status | ||
init(_ responseStatus: HTTPResponseStatus) { | ||
switch responseStatus.code { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mapping of HTTP status code (UInt) to SpanStatus
is very much reusable and as such should be provided in TracingInstrumentation
, sth like
SpanStatus(code: UInt, message: String?)
otherwise each library making HTTP calls and not using AHC will need to map it on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, this reminded me I had an not submitted review here, yeah this seems like a good candidate to move up into OpenTelemetryInstrumentationSupport/OpenTelemetrySemanticConventions 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or not such a good candidate after all 😁👉 slashmo/gsoc-swift-tracing#134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(seems I forgot to hit submit on a review here I did a while ago, submitted)
@@ -88,7 +88,7 @@ extension HTTPClientTests { | |||
("testUncleanShutdownCancelsTasks", testUncleanShutdownCancelsTasks), | |||
("testDoubleShutdown", testDoubleShutdown), | |||
("testTaskFailsWhenClientIsShutdown", testTaskFailsWhenClientIsShutdown), | |||
("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown), | |||
// ("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
self = SpanStatus(canonicalCode: .unknown, message: responseStatus.reasonPhrase) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should move to OpenTelemetryInstrumentationSupport
?
Btw, been thinking if that should be called OpenTelemetrySemanticConventions
?
Just as a heads up, the main development branch has been changed to This PR has been re-targeted to main and should just work. However when performing rebases etc please keep this in mind -- you may want to fetch the main branch and rebase onto the |
let span = TestSpan(operationName: operationName, | ||
kind: kind, | ||
startTimestamp: timestamp, | ||
context: context.baggage) | ||
recordedSpans.append(span) | ||
return span | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the context should be updated, see slashmo/gsoc-swift-tracing#126 (comment)
var request = request | ||
InstrumentationSystem.instrument.inject(span.context.baggage, into: &request.headers, using: HTTPHeadersInjector()) | ||
|
||
let logger = context.logger.attachingRequestInformation(request, requestID: globalRequestID.add(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(copied from soto-project/soto-core#348 (comment))
how about sth like:
private struct RequestMetadata: CustomStringConvertible {
var requestId: Int
var service: String
var operation: String
var description: String {
"aws-request-id=\(requestId) aws-service=\(service) aws-operation=\(operation)"
}
}
private enum RequestKey: BaggageContextKey {
typealias Value = RequestMetadata
var name: String { "AWSRequest" }
}
then updating the baggage like:
context.baggage[RequestKey.self] = RequestMetadata(requestId: Self.globalRequestID.add(1),
service: serviceConfig.service,
operation: "signURL")
instead of updating the logger directly (current):
context.logger = context.logger.attachingRequestId(Self.globalRequestID.add(1), operation: "signURL", service: serviceConfig.service)
45e6adf
to
577f205
Compare
577f205
to
1a98075
Compare
Very much WIP.