[azservicebus] Enable distributed tracing #23860

karenychen · 2024-12-11T01:21:34Z

The purpose of this PR is explained in this or a referenced issue.
The PR does not update generated files.
- These files are managed by the codegen framework at Azure/autorest.go.
Tests are included and/or updated for code changes.
Updates to module CHANGELOG.md are included.
MIT license headers are included in each file.

github-actions · 2024-12-11T01:21:50Z

Thank you for your contribution @karenychen! We will review the pull request and get back to you soon.

sdk/messaging/azservicebus/internal/tracing/fake_tracing.go

sdk/messaging/azservicebus/internal/constants.go

karenychen · 2024-12-11T19:26:22Z

Hi @lmolkova ! I had a small question regarding diagnostic-id in https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-end-to-end-tracing?tabs=net-standard-sdk-2

The .NET SDK seems to be hooking them up to a ReceiveMessages trace when the users set the diagnostic-id and linking the list of diagnostic ids from the messages to the Receive trace (code here). I might be misunderstanding how .NET is doing it, but I am wondering what we enable with the diagnostic-ids?

richardpark-msft

Looks great so far, got some questions for the other experts in the crowd, but from an SB perspective it looks great.

sdk/messaging/azservicebus/sender_unit_test.go

sdk/messaging/azservicebus/client_test.go

sdk/messaging/azservicebus/client.go

richardpark-msft · 2024-12-11T19:32:36Z

sdk/messaging/azservicebus/tracing.go

+}
+
+func getSpanAttributesForMessage(message *Message) []tracing.Attribute {
+	attrs := []tracing.Attribute{}


Suggested change

attrs := []tracing.Attribute{}

var attrs []tracing.Attribute

I swear there's some linter that complains if you pre-init the slice and it's not technically needed anyways.

richardpark-msft · 2024-12-11T19:33:59Z

sdk/messaging/azservicebus/sender.go

+	)
+	defer func() { endSpan(err) }()
+
+	err = s.links.Retry(ctx, EventSender, "SendMessageBatch", func(ctx context.Context, lwid *internal.LinksWithID, args *utils.RetryFnArgs) error {
 		return lwid.Sender.Send(ctx, batch.toAMQPMessage(), nil)


@lmolkova, in cases like this where I have retries, should the span reporting be in the retry loop, or outside, like we have here?

The way it works in our HTTP SDKs is that the method span is above the retry layer, and its child HTTP span is below the retry layer. So you'd have spans like this.

Some method call span HTTP span retry 1 HTTP span retry 2

I presume we'd want to do the same thing here.

@jhendrixMSFT I did a bit of digging -- the semantic conventions for HTTP has examples for how the spans should look like for retries: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client-authorization-retry-examples. However, for messaging systems there is not much information besides this chunk:

I am not sure if this implies the messaging spans are 1 per operation?

Pinging @lmolkova, the authority for this kind of thing :)

Although I'd say that we'd do it the same way as what @jhendrixMSFT outlined above - start a span when the whole retry() function starts, and then a new sub-span within each retry operation.

richardpark-msft · 2024-12-11T19:36:12Z

sdk/messaging/azservicebus/sender.go

+	ctx, endSpan := s.startSpan(ctx, "ScheduleAMQPAnnotatedMessages", tracing.ScheduleOperationName,
+		tracing.Attribute{Key: tracing.BatchMessageCount, Value: int64(len(messages))},
+	)
+	defer func() { endSpan(err) }()


@jhendrixMSFT, would it be worth building this pattern (via a callback, probably) into the tracing library? It can be internal, but it seems like everyone's going to do the "last error gets passed to endSpan before block ends" pattern.

If not, @karenychen, we can build a helper function - maybe we'd stick it right in the retry function to make things easier since we're passing very similar information to both.

Maybe it goes in the sdk/internal module?

Synced with Richard offline -- we are moving this to the Retry() layer :)

sdk/messaging/azservicebus/sender.go

…rting a span

karenychen · 2024-12-12T18:57:04Z

sdk/messaging/azservicebus/liveTestHelpers_test.go

@@ -186,10 +187,11 @@ func deleteSubscription(t *testing.T, ac *admin.Client, topicName string, subscr
 // and fails tests otherwise.
 func peekSingleMessageForTest(t *testing.T, receiver *Receiver) *ReceivedMessage {
 	var msg *ReceivedMessage
+	// TODO


General question: is it possible for me to test the traces in outside of the local unit tests too? Are there instructions on how I can run the live tests (and potentially the stress tests)?

You can run the same suite of live tests. I need to update the sample.env to list them properly, but for now you can see the ones we use here: https://github.com/karenychen/azure-sdk-for-go/blob/bd50b2a1e4c72b1d22ba11314d315d939796c201/sdk/messaging/azservicebus/internal/test/test_helpers.go#L81

Just create a .env file and place it in the azservicebus folder, and run go test.

Also, you can add /azp run go - azservicebus as a comment on your PR to run it as part of your CI.

karenychen · 2024-12-12T19:33:32Z

sdk/messaging/azservicebus/sender.go

+	ctx, endSpan := s.startSpan(ctx, "ScheduleAMQPAnnotatedMessages", tracing.ScheduleOperationName,
+		tracing.Attribute{Key: tracing.BatchMessageCount, Value: int64(len(messages))},
+	)
+	defer func() { endSpan(err) }()


Synced with Richard offline -- we are moving this to the Retry() layer :)

karenychen · 2024-12-12T20:07:24Z

sdk/messaging/azservicebus/sender.go

+	)
+	defer func() { endSpan(err) }()
+
+	err = s.links.Retry(ctx, EventSender, "SendMessageBatch", func(ctx context.Context, lwid *internal.LinksWithID, args *utils.RetryFnArgs) error {
 		return lwid.Sender.Send(ctx, batch.toAMQPMessage(), nil)


@jhendrixMSFT I did a bit of digging -- the semantic conventions for HTTP has examples for how the spans should look like for retries: https://opentelemetry.io/docs/specs/semconv/http/http-spans/#http-client-authorization-retry-examples. However, for messaging systems there is not much information besides this chunk:

I am not sure if this implies the messaging spans are 1 per operation?

…s-tracing

richardpark-msft · 2025-01-14T01:28:21Z

sdk/messaging/azservicebus/internal/amqpLinks.go

+	Tracer() tracing.Tracer
+
+	// SetTracer sets the tracer for the AMQPLinks instance.
+	SetTracer(tracing.Tracer)


Does AMQPLinks needs to own a Tracer object or should it just be passed in as an argument to each function call?

For this I am open to your preference :) Currently the tracer starts a span at the Retry() function level. So we can either have it in the amqpLink layer, or keep it in the Sender/Receiver layer and passing it as an argument to each function call all the way down to the Retry() function level. Which option do you think is more appropriate here?

passing it as an argument to each function call all the way down to the Retry() function level

I'd prefer this, just to eliminate any potential race conditions with state.

I know the argument list is getting pretty gnarly with Retry(), and we can work on that (separately).

Let me know if it gets too gnarly though.

Updated to move the tracers 1 level up. Now they live in Sender, Receiver and Namespace

richardpark-msft · 2025-01-14T01:30:39Z

sdk/messaging/azservicebus/internal/namespace.go

@@ -433,7 +434,7 @@ func (ns *Namespace) startNegotiateClaimRenewer(ctx context.Context,
 				return
 			case <-time.After(nextClaimAt):
 				for {
-					err := utils.Retry(refreshCtx, exported.EventAuth, "NegotiateClaimRefresh", func(ctx context.Context, args *utils.RetryFnArgs) error {
+					err := utils.Retry(refreshCtx, tracing.NewNoOpTracer(), exported.EventAuth, "NegotiateClaimRefresh", func(ctx context.Context, args *utils.RetryFnArgs) error {


Is the NoOpTracer here temporary or is there a reason we shouldn't trace this?

The claim negotiation is not outlined in the Otel conventions for SB, so I left it to a no-op tracer https://opentelemetry.io/docs/specs/semconv/messaging/azure-messaging/

Definitely open to adding it if we want to support it though

Hm. I think we could consider it a request, just not user initiated. We can figure it out later, but maybe we could file a separate issue so we pick it up later.

It's definitely useful info.

Ah! Just noticed we can use tracing.SpanKind to denote that it is an internal span.

https://github.com/Azure/azure-sdk-for-go/blob/af2aacb0bf5b03231cba3fdc08e330469f297cd4/sdk/azcore/tracing/constants.go#L7C1-L27C2

Updated the code to include this everywhere: Sender spans will have Producer span kind, and Receiver spans will have Consumer span kind. The NegotiateClaim function has the Internal span kind.

richardpark-msft · 2025-01-14T01:48:50Z

sdk/messaging/azservicebus/internal/tracing/tracing.go

+
+type SetAttributesFn func([]Attribute) []Attribute
+
+func NewSpanConfig(name SpanName, options ...SetAttributesFn) *SpanConfig {


I think this indirection (for SetAttributesFn) is a bit odd - it looks like attributes ...Attribute would cover what we use it for, and helps a bit in showing that's the only thing we use it for.

switched this one up a bit to directly return list of attributes, let me know what you think!

richardpark-msft · 2025-01-14T01:52:44Z

sdk/messaging/azservicebus/receiver.go

@@ -361,6 +369,12 @@ func (r *Receiver) DeadLetterMessage(ctx context.Context, message *ReceivedMessa
 }

 func (r *Receiver) receiveMessagesImpl(ctx context.Context, maxMessages int, options *ReceiveMessagesOptions) ([]*ReceivedMessage, error) {
+	var err error


If you move this span code into ReceiveMessages it'll be less tricky since you'll only have a single spot where you need to log the error, and we don't have to be wary of assigning err in all the code here.

richardpark-msft

It's coming along really well. I left some comments, but I'm really liking it.

… unit test

karenychen added 2 commits December 10, 2024 17:19

add internal tracing wrapper and fake tracer for UT

feffdf8

set up tracer in SB client and traces in sender methods

2f4a2b2

github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Bus labels Dec 11, 2024

karenychen commented Dec 11, 2024

View reviewed changes

sdk/messaging/azservicebus/internal/tracing/fake_tracing.go Outdated Show resolved Hide resolved

karenychen added 2 commits December 10, 2024 17:57

add more unit tests

7c2e4ce

linting

23d7e94

richardpark-msft requested review from richardpark-msft, jhendrixMSFT and lmolkova December 11, 2024 19:19

jhendrixMSFT reviewed Dec 11, 2024

View reviewed changes

sdk/messaging/azservicebus/internal/constants.go Outdated Show resolved Hide resolved

richardpark-msft reviewed Dec 11, 2024

View reviewed changes

karenychen added 6 commits December 11, 2024 15:40

move matcher to sdk/internal folder and add callback function for sta…

1be63f7

…rting a span

address comments and moved startspan snippet to retrier layer

b026074

reverting some files

3d07a0a

added receiver traces and some UT

1bb68ed

add session traces

e653088

linting

992d9b9

karenychen commented Dec 12, 2024

View reviewed changes

karenychen added 3 commits December 12, 2024 12:24

reverting some files

af668bf

Merge remote-tracking branch 'origin/main' into karenchen/azservicebu…

eaf0389

…s-tracing

linting

bd50b2a

richardpark-msft reviewed Jan 14, 2025

View reviewed changes

karenychen added 3 commits January 16, 2025 17:58

move tracer to sender/receiver level, refractor SetAttrFn patter, add…

ab81102

… unit test

include span kind in our spans and tests + more unit test

09f971e

add internal tracing for NegotiateClaim

14c6576

karenychen requested a review from richardpark-msft January 17, 2025 04:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[azservicebus] Enable distributed tracing #23860

[azservicebus] Enable distributed tracing #23860

karenychen commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024

karenychen commented Dec 11, 2024

richardpark-msft left a comment

richardpark-msft Dec 11, 2024

richardpark-msft Dec 11, 2024

jhendrixMSFT Dec 12, 2024

karenychen Dec 12, 2024

richardpark-msft Jan 13, 2025

richardpark-msft Jan 14, 2025

richardpark-msft Dec 11, 2024

richardpark-msft Dec 11, 2024

jhendrixMSFT Dec 11, 2024

karenychen Dec 12, 2024

karenychen Dec 12, 2024

richardpark-msft Jan 14, 2025

richardpark-msft Jan 14, 2025

karenychen Dec 12, 2024

karenychen Dec 12, 2024

richardpark-msft Jan 14, 2025

karenychen Jan 16, 2025

richardpark-msft Jan 16, 2025

richardpark-msft Jan 16, 2025

karenychen Jan 17, 2025

richardpark-msft Jan 14, 2025

karenychen Jan 16, 2025

richardpark-msft Jan 16, 2025

karenychen Jan 17, 2025 •

edited

Loading

richardpark-msft Jan 14, 2025

karenychen Jan 17, 2025

richardpark-msft Jan 14, 2025

karenychen Jan 17, 2025

richardpark-msft left a comment


		type SetAttributesFn func([]Attribute) []Attribute

		func NewSpanConfig(name SpanName, options ...SetAttributesFn) *SpanConfig {

[azservicebus] Enable distributed tracing #23860

Are you sure you want to change the base?

[azservicebus] Enable distributed tracing #23860

Conversation

karenychen commented Dec 11, 2024 • edited Loading

github-actions bot commented Dec 11, 2024

karenychen commented Dec 11, 2024

richardpark-msft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karenychen Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardpark-msft left a comment

Choose a reason for hiding this comment

karenychen commented Dec 11, 2024 •

edited

Loading

karenychen Jan 17, 2025 •

edited

Loading