Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

Closed
danielmarbach opened this issue May 27, 2024 · 19 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Bus
Milestone

Comments

@danielmarbach
Copy link
Contributor

Library name and version

Azure.Messaging.ServiceBus

Describe the bug

The batching capability works and takes the tier maximum size restriction into account as long as you are never sending more than what the tier standard size is. With the premium tier the default maximum message size is set to 1024 KB. When you enable large message size support even though the documentation hints that "batching is not supported" batching works as long as you stay within 1024 KB. If you add more than 1024 KB data to the batch the batch accepts more data because the MaximumMessageSize reported by the queue is higher but then fails to send the message with an exception message that doesn't really tell you what's going on:

Unhandled exception. System.InvalidOperationException: The link 'G11:2654476:amqps://mypremiumnamespace.servicebus.windows.net/-4f31a45f;0:5:6' is force detached by the broker because publisher(link155) received a batch message with no data in it. Detach origin: Publisher.
For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(IReadOnlyCollection`1 messages, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<>c.<<SendBatchAsync>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusSender.SendMessagesAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 54
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.<Main>(String[] args)

Expected behavior

Unfortunately as of today I'm not aware of any good way to support large message sizes because the whole idea of the TryAdd on the batch was supposed to take the tier into account but works as long as you stay within the default parameters of the tier but falls apart once you don't.

Unfortunately without having some good support from the SDK we would have to go back reimplementing some sort of message size measurement manually (which we moved away from because we now have the batch). Only by going back to manual message size guesstimating (which we would have to reimplement somehow but we don't have access to the underlying AMQP message size) and trying somehow to figure out the tier (don't recall if that is possible) and the maximum message size on the destination entity we would be able to figure out whether a message can be sent in a batch or has to be sent outside.

The client behavior is far from ideal because it works until it doesn't and then the exception doesn't give good indications of what's actually going on

Actual behavior

TryAdd returns true even though the batch can never be sent when it crosses the 1024 KB default size limit despite the queue accepting larger content

I'm aware from a service standpoint things work as they are described because the documentation clearly says large message support doesn't support batching. Yet the client to service interaction kind of "pretends" to support batching while effectively it shouldn't.

Reproduction Steps

  1. Create a premium namespace
  2. Run https://github.com/danielmarbach/AzureServiceBus.DeepDive/blob/batching-repro/Batching/Program.cs

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Bus labels May 27, 2024
Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@danielmarbach
Copy link
Contributor Author

danielmarbach commented May 27, 2024

Would it be acceptable for CreateMessageBatchAsync to throw an exception when the MaximumMessageSize is larger than the default tier size (I think the client knows that) and then the client could cache that and always throw on attempts to create batches? With that in place it would at least be possible to discover the sender is interacting with a queue that has large message size support and the client code could fall back to always do individual sends.

One of the benefits this would have is that should the service ever support batch sends in the future the code path would no longer throw and no additional changes are required from a usage of the SDK perspective (the custom exception type could be obsoleted and removed over multiple versions)

@danielmarbach
Copy link
Contributor Author

@JoshLove-msft I pushed an idea PR #44272

@jsquire
Copy link
Member

jsquire commented May 28, 2024

Hi @danielmarbach. Thanks for reaching out and we regret that you're experiencing difficulties.

This is related to the outstanding ask on the Service Bus service to provide metadata to the client indicating the maximum size allowed for a batch, the maximum number of messages allowed in a batch, and the maximum size of an individual message. Currently, the service only provides the maximum size of an individual message. The lack of this data means that the client cannot make informed decisions about service limitations to provide client-side information to improve the developer experience.

In this particular scenario, it's a bit more nuanced than "You can never send batches with large message support enabled" - as you can send batches, just not batches larger than the standard message size limit for the SKU. Because the service returns us only the maximum size of a message allowed on the AMQP link, there's no way for the client to understand what is permitted by the service without hardcoding assumptions about "the regular maximum message size."

Hardcoding this information in the client isn't something that we are able to do, as it violates the Azure SDK guidelines and is a service implementation detail that may change.

Because the additional metadata requests were made via internal dependency asks, there is no existing public issue for tracking. As a result, I've opened the following in the Service Bus repository for visibility:

@jsquire
Copy link
Member

jsquire commented May 28, 2024

//cc: @EldertGrootenboer

@jsquire jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels May 28, 2024
Copy link

Hi @danielmarbach. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

@danielmarbach
Copy link
Contributor Author

@jsquire Thanks for the response. I totally understand the reasoning behind closing the PR. I remember we had other discussions around the maximum allowed number of messages that can be put into the batch that would also require more metadata from the service. That's all fine and understandable.

From the perspective of the current documentation would you agree that

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

especially

Batching isn't supported.

Should then be more nuanced. I read this as it is not supported so creating a batch should also not be allowed while effectively as I pointed out and you confirmed it is supported with "caveats" ;) Would you agree to that?

When it comes to the concrete problem at hand do you have good ideas how the situation could be made better for consumers of the SDK? As much as the SDK lacks knowledge any higher level framework, library or middleware code built on top of the SDK faces similar challenges which makes it hard to implement a cohesive way to fall back to non-batched interactions. Unfortunately the annotated message also gives no indication of the potential message size since that is only available on the lower levels. If there would be a way to have the annotated message have some message size information available or the ServiceBus(Received)Message exposing that information in one form or another users of the SDK could make the decision to hard code certain assumptions until the SDK supports things more transparently without having to reimplement parts of the already available "sizing" logic in the SDK.

@jsquire
Copy link
Member

jsquire commented May 29, 2024

@jsquire Thanks for the response. I totally understand the reasoning behind closing the PR. I remember we had other discussions around the maximum allowed number of messages that can be put into the batch that would also require more metadata from the service. That's all fine and understandable.

From the perspective of the current documentation would you agree that

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

especially

Batching isn't supported.

Should then be more nuanced. I read this as it is not supported so creating a batch should also not be allowed while effectively as I pointed out and you confirmed it is supported with "caveats" ;) Would you agree to that?

Agreed. The text of that article does not match the actual service behavior. I think there should be additional context provided.

@spelluru, @EldertGrootenboer: Please see Daniel's observation on the large messages article. Can we add some additional content to call out that you can send batches but those batches are limited to 1MB in size?

@jsquire
Copy link
Member

jsquire commented May 29, 2024

When it comes to the concrete problem at hand do you have good ideas how the situation could be made better for consumers of the SDK? As much as the SDK lacks knowledge any higher level framework, library or middleware code built on top of the SDK faces similar challenges which makes it hard to implement a cohesive way to fall back to non-batched interactions. Unfortunately the annotated message also gives no indication of the potential message size since that is only available on the lower levels. If there would be a way to have the annotated message have some message size information available or the ServiceBus(Received)Message exposing that information in one form or another users of the SDK could make the decision to hard code certain assumptions until the SDK supports things more transparently without having to reimplement parts of the already available "sizing" logic in the SDK.

@danielmarbach:

Yeah, not gonna lie - this is an ugly scenario and not a great developer experience. There's an unfortunate disconnect in the service behavior between the maximum size of a message and the maximum size of a batch that was only introduced with the large message feature - which overlooked the discrepancy and doesn't provide the right context hints downstream.

The "right" fix for this is for the service to provide information on the link that details its limitation so that the client can set the batch size limit when CreateMessageBatchAsync is called. A more pedantic work-around would be for the service to simply use the existing mechanism to communicate the max batch size rather than max message size, which would fix the developer experience using the SDK because the client uses that as the batch size limit and does not attempt to validate/enforce size constraints on loose messages.
(@EldertGrootenboer - maybe this is an interim quick fix that can be considered?)

Outside of that we have two options that I see:

  • Callers pass CreateMessageBatchOptions with the max size set to 1024 bytes when creating a batch for entities with large message support enabled. This will cap the batch at the current implicit size limit allowed by the service and prevent accidental overage. The downside is, of course, that the burden of having to do pass the options and assuming the risk of hardcoding falls to developers.

  • The client library intentionally deviates from the Azure SDK guidelines and hardcodes the 1024-byte limit when MaxMessageSize > 1024. This allows for safe batches, assuming the service behavior does not change, and absolves the need for developers to have awareness. The downside is that any change to the service behavior breaks the SDK (again) and applications that worked yesterday may start to fail today with no changes to the app code or dependencies. The additional challenge here is that we would need a dispensation from each language architect to allow for the deviation.

Truthfully, I think both of these are not wonderful. I believe the first is a better approach, as the application has full control to update/fix the behavior if a service change breaks it, and the root cause of such a failure is more discoverable due to the explicit limit being passed.

Thoughts?

@danielmarbach
Copy link
Contributor Author

The downside is, of course, that the burden of having to do pass the options and assuming the risk of hardcoding falls to developers.

That would be OK as a temporary workaround until this is properly addressed although I think the SDK team in combination with the service team is much closer to the "pulse of change" and could react faster to service changes.

For us the problem is though a little bit more complex. We are a middleware / abstraction on top of the SDK and not "specific code that is written under the assumption that a given service configuration is present in the environment the code is being used". As far as I understand there is no way for us to know whether we are running against a premium namespace nor that it has large message support enabled without using the admin client to query the queue properties which requires manage rights. We need to be able to run in environments without manage rights.

So it seem even setting that explicit limit might be a challenge or am I missing something?

Copy link

github-actions bot commented Jun 6, 2024

Hi @danielmarbach, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

1 similar comment
Copy link

Hi @danielmarbach, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

@danielmarbach
Copy link
Contributor Author

/unresolve

@github-actions github-actions bot reopened this Jun 29, 2024
@github-actions github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Jun 29, 2024
@danielmarbach
Copy link
Contributor Author

Waiting for feedback on

So it seem even setting that explicit limit might be a challenge or am I missing something?

@jsquire
Copy link
Member

jsquire commented Jun 30, 2024

So it seem even setting that explicit limit might be a challenge or am I missing something?

Sorry, I missed this when it came in. I agree. This challenge exists for both the clients and developers/applications using them. There is no context available to the client when establishing a connection/link that allows discovery of the SKU nor the configuration of the entity. As you mentioned, this is available only via the AdminClient, which queries an HTTP-based service and requires manage rights. The other clients can't access this information for the same reasons that you've mentioned.

That leaves us with just the maximum message size as a hint. Your point about that being available to the AMQP-based clients but not to applications using them is a salient one. For middleware scenarios where you don't own the Azure resource and cannot inspect it - yeah. There's not a great option there.

I think that's a reasonable enough scenario to justify going with option #2 in the short term, where the client intentionally caps the batch size. It's a bad option, and I don't like it. But, all that we have is bad options at this point.

@jsquire jsquire added feature-request This issue requires a new behavior in the product in order be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 30, 2024
@jsquire jsquire modified the milestones: Backlog, 2024-08 Jun 30, 2024
@jsquire
Copy link
Member

jsquire commented Jun 30, 2024

I'll take a look at adding this in for the August release and see if I can get approval for the deviation.

@danielmarbach
Copy link
Contributor Author

Maybe a middle ground solution is to expose the message size of the link as a property on the sender? It would require to issue an operation that then makes sure the link is created. With that information available on the API surface it would be possible to make the decision you mentioned in

Callers pass CreateMessageBatchOptions with the max size set to 1024 bytes when creating a batch for entities with large message support enabled.

on the SDK usage side and you wouldn't be violating the SDK guidelines since you are exposing and information that is already available anyway in the lower levels. I would prefer something built into the SDK because of

I think the SDK team in combination with the service team is much closer to the "pulse of change" and could react faster to service changes.

but this would at least give us a way to move forward. Furthermore you might be able to use that property to deprecate it at a later time to guide users away from it once the service has better adaptive behavior built in

@jsquire
Copy link
Member

jsquire commented Jul 2, 2024

Good thought, but I definitely think it'll be easier for us to sneak in the behavior with a plan for removing it when the service feature lands than it would be to get approval for a new public API member. If the service had an AMQP operation to expose runtime properties, as Event Hubs does - we could hide it there, but having just a "get me the max message size" hanging off the client would be odd.

@jsquire
Copy link
Member

jsquire commented Jul 11, 2024

Fixed by #44917, which will be included in our July release next week.

(I also snuck the 4,500 item constraint into the checks while we're waiting on the service metadata to be available)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Bus
Projects
None yet
3 participants