[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

danielmarbach · 2024-05-27T18:39:01Z

Library name and version

Azure.Messaging.ServiceBus

Describe the bug

The batching capability works and takes the tier maximum size restriction into account as long as you are never sending more than what the tier standard size is. With the premium tier the default maximum message size is set to 1024 KB. When you enable large message size support even though the documentation hints that "batching is not supported" batching works as long as you stay within 1024 KB. If you add more than 1024 KB data to the batch the batch accepts more data because the MaximumMessageSize reported by the queue is higher but then fails to send the message with an exception message that doesn't really tell you what's going on:

Unhandled exception. System.InvalidOperationException: The link 'G11:2654476:amqps://mypremiumnamespace.servicebus.windows.net/-4f31a45f;0:5:6' is force detached by the broker because publisher(link155) received a batch message with no data in it. Detach origin: Publisher.
For troubleshooting information, see https://aka.ms/azsdk/net/servicebus/exceptions/troubleshoot.
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(AmqpMessage batchMessage, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchInternalAsync(IReadOnlyCollection`1 messages, TimeSpan timeout, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.<>c.<<SendBatchAsync>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.<>c__22`1.<<RunOperation>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1,TResult](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken, Boolean logTimeoutRetriesAsVerbose)
   at Azure.Messaging.ServiceBus.ServiceBusRetryPolicy.RunOperation[T1](Func`4 operation, T1 t1, TransportConnectionScope scope, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.Amqp.AmqpSender.SendBatchAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at Azure.Messaging.ServiceBus.ServiceBusSender.SendMessagesAsync(ServiceBusMessageBatch messageBatch, CancellationToken cancellationToken)
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 54
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.Main(String[] args) in /Users/danielmarbach/Projects/AzureServiceBus.DeepDive/Batching/Program.cs:line 35
   at Batching.Program.<Main>(String[] args)

Expected behavior

Unfortunately as of today I'm not aware of any good way to support large message sizes because the whole idea of the TryAdd on the batch was supposed to take the tier into account but works as long as you stay within the default parameters of the tier but falls apart once you don't.

Unfortunately without having some good support from the SDK we would have to go back reimplementing some sort of message size measurement manually (which we moved away from because we now have the batch). Only by going back to manual message size guesstimating (which we would have to reimplement somehow but we don't have access to the underlying AMQP message size) and trying somehow to figure out the tier (don't recall if that is possible) and the maximum message size on the destination entity we would be able to figure out whether a message can be sent in a batch or has to be sent outside.

The client behavior is far from ideal because it works until it doesn't and then the exception doesn't give good indications of what's actually going on

Actual behavior

TryAdd returns true even though the batch can never be sent when it crosses the 1024 KB default size limit despite the queue accepting larger content

I'm aware from a service standpoint things work as they are described because the documentation clearly says large message support doesn't support batching. Yet the client to service interaction kind of "pretends" to support batching while effectively it shouldn't.

Reproduction Steps

Create a premium namespace
Run https://github.com/danielmarbach/AzureServiceBus.DeepDive/blob/batching-repro/Batching/Program.cs

Environment

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-27T18:39:43Z

Thank you for your feedback. Tagging and routing to the team member best able to assist.

danielmarbach · 2024-05-27T18:41:48Z

Would it be acceptable for CreateMessageBatchAsync to throw an exception when the MaximumMessageSize is larger than the default tier size (I think the client knows that) and then the client could cache that and always throw on attempts to create batches? With that in place it would at least be possible to discover the sender is interacting with a queue that has large message size support and the client code could fall back to always do individual sends.

One of the benefits this would have is that should the service ever support batch sends in the future the code path would no longer throw and no additional changes are required from a usage of the SDK perspective (the custom exception type could be obsoleted and removed over multiple versions)

danielmarbach · 2024-05-28T10:07:52Z

@JoshLove-msft I pushed an idea PR #44272

jsquire · 2024-05-28T13:37:13Z

Hi @danielmarbach. Thanks for reaching out and we regret that you're experiencing difficulties.

This is related to the outstanding ask on the Service Bus service to provide metadata to the client indicating the maximum size allowed for a batch, the maximum number of messages allowed in a batch, and the maximum size of an individual message. Currently, the service only provides the maximum size of an individual message. The lack of this data means that the client cannot make informed decisions about service limitations to provide client-side information to improve the developer experience.

In this particular scenario, it's a bit more nuanced than "You can never send batches with large message support enabled" - as you can send batches, just not batches larger than the standard message size limit for the SKU. Because the service returns us only the maximum size of a message allowed on the AMQP link, there's no way for the client to understand what is permitted by the service without hardcoding assumptions about "the regular maximum message size."

Hardcoding this information in the client isn't something that we are able to do, as it violates the Azure SDK guidelines and is a service implementation detail that may change.

Because the additional metadata requests were made via internal dependency asks, there is no existing public issue for tracking. As a result, I've opened the following in the Service Bus repository for visibility:

Request for additional sending link metadata for service limitations (#708)

jsquire · 2024-05-28T13:37:22Z

//cc: @EldertGrootenboer

github-actions · 2024-05-28T13:37:55Z

Hi @danielmarbach. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

danielmarbach · 2024-05-28T21:18:24Z

@jsquire Thanks for the response. I totally understand the reasoning behind closing the PR. I remember we had other discussions around the maximum allowed number of messages that can be put into the batch that would also require more metadata from the service. That's all fine and understandable.

From the perspective of the current documentation would you agree that

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

especially

Batching isn't supported.

Should then be more nuanced. I read this as it is not supported so creating a batch should also not be allowed while effectively as I pointed out and you confirmed it is supported with "caveats" ;) Would you agree to that?

When it comes to the concrete problem at hand do you have good ideas how the situation could be made better for consumers of the SDK? As much as the SDK lacks knowledge any higher level framework, library or middleware code built on top of the SDK faces similar challenges which makes it hard to implement a cohesive way to fall back to non-batched interactions. Unfortunately the annotated message also gives no indication of the potential message size since that is only available on the lower levels. If there would be a way to have the annotated message have some message size information available or the ServiceBus(Received)Message exposing that information in one form or another users of the SDK could make the decision to hard code certain assumptions until the SDK supports things more transparently without having to reimplement parts of the already available "sizing" logic in the SDK.

jsquire · 2024-05-29T13:45:22Z

@jsquire Thanks for the response. I totally understand the reasoning behind closing the PR. I remember we had other discussions around the maximum allowed number of messages that can be put into the batch that would also require more metadata from the service. That's all fine and understandable.

From the perspective of the current documentation would you agree that

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-premium-messaging#large-messages-support

especially

Batching isn't supported.

Should then be more nuanced. I read this as it is not supported so creating a batch should also not be allowed while effectively as I pointed out and you confirmed it is supported with "caveats" ;) Would you agree to that?

Agreed. The text of that article does not match the actual service behavior. I think there should be additional context provided.

@spelluru, @EldertGrootenboer: Please see Daniel's observation on the large messages article. Can we add some additional content to call out that you can send batches but those batches are limited to 1MB in size?

jsquire · 2024-05-29T14:01:40Z

When it comes to the concrete problem at hand do you have good ideas how the situation could be made better for consumers of the SDK? As much as the SDK lacks knowledge any higher level framework, library or middleware code built on top of the SDK faces similar challenges which makes it hard to implement a cohesive way to fall back to non-batched interactions. Unfortunately the annotated message also gives no indication of the potential message size since that is only available on the lower levels. If there would be a way to have the annotated message have some message size information available or the ServiceBus(Received)Message exposing that information in one form or another users of the SDK could make the decision to hard code certain assumptions until the SDK supports things more transparently without having to reimplement parts of the already available "sizing" logic in the SDK.

@danielmarbach:

Yeah, not gonna lie - this is an ugly scenario and not a great developer experience. There's an unfortunate disconnect in the service behavior between the maximum size of a message and the maximum size of a batch that was only introduced with the large message feature - which overlooked the discrepancy and doesn't provide the right context hints downstream.

The "right" fix for this is for the service to provide information on the link that details its limitation so that the client can set the batch size limit when CreateMessageBatchAsync is called. A more pedantic work-around would be for the service to simply use the existing mechanism to communicate the max batch size rather than max message size, which would fix the developer experience using the SDK because the client uses that as the batch size limit and does not attempt to validate/enforce size constraints on loose messages.
(@EldertGrootenboer - maybe this is an interim quick fix that can be considered?)

Outside of that we have two options that I see:

Callers pass CreateMessageBatchOptions with the max size set to 1024 bytes when creating a batch for entities with large message support enabled. This will cap the batch at the current implicit size limit allowed by the service and prevent accidental overage. The downside is, of course, that the burden of having to do pass the options and assuming the risk of hardcoding falls to developers.
The client library intentionally deviates from the Azure SDK guidelines and hardcodes the 1024-byte limit when MaxMessageSize > 1024. This allows for safe batches, assuming the service behavior does not change, and absolves the need for developers to have awareness. The downside is that any change to the service behavior breaks the SDK (again) and applications that worked yesterday may start to fail today with no changes to the app code or dependencies. The additional challenge here is that we would need a dispensation from each language architect to allow for the deviation.

Truthfully, I think both of these are not wonderful. I believe the first is a better approach, as the application has full control to update/fix the behavior if a service change breaks it, and the root cause of such a failure is more discoverable due to the explicit limit being passed.

Thoughts?

danielmarbach · 2024-05-30T20:57:40Z

The downside is, of course, that the burden of having to do pass the options and assuming the risk of hardcoding falls to developers.

That would be OK as a temporary workaround until this is properly addressed although I think the SDK team in combination with the service team is much closer to the "pulse of change" and could react faster to service changes.

For us the problem is though a little bit more complex. We are a middleware / abstraction on top of the SDK and not "specific code that is written under the assumption that a given service configuration is present in the environment the code is being used". As far as I understand there is no way for us to know whether we are running against a premium namespace nor that it has large message support enabled without using the admin client to query the queue properties which requires manage rights. We need to be able to run in environments without manage rights.

So it seem even setting that explicit limit might be a challenge or am I missing something?

github-actions · 2024-06-06T22:33:23Z

Hi @danielmarbach, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

github-actions · 2024-06-13T22:33:28Z

Hi @danielmarbach, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

danielmarbach · 2024-06-29T08:52:54Z

/unresolve

danielmarbach · 2024-06-29T08:53:35Z

Waiting for feedback on

So it seem even setting that explicit limit might be a challenge or am I missing something?

jsquire · 2024-06-30T16:28:36Z

So it seem even setting that explicit limit might be a challenge or am I missing something?

Sorry, I missed this when it came in. I agree. This challenge exists for both the clients and developers/applications using them. There is no context available to the client when establishing a connection/link that allows discovery of the SKU nor the configuration of the entity. As you mentioned, this is available only via the AdminClient, which queries an HTTP-based service and requires manage rights. The other clients can't access this information for the same reasons that you've mentioned.

That leaves us with just the maximum message size as a hint. Your point about that being available to the AMQP-based clients but not to applications using them is a salient one. For middleware scenarios where you don't own the Azure resource and cannot inspect it - yeah. There's not a great option there.

I think that's a reasonable enough scenario to justify going with option #2 in the short term, where the client intentionally caps the batch size. It's a bad option, and I don't like it. But, all that we have is bad options at this point.

jsquire · 2024-06-30T16:30:04Z

I'll take a look at adding this in for the August release and see if I can get approval for the deviation.

danielmarbach · 2024-06-30T17:16:40Z

Maybe a middle ground solution is to expose the message size of the link as a property on the sender? It would require to issue an operation that then makes sure the link is created. With that information available on the API surface it would be possible to make the decision you mentioned in

Callers pass CreateMessageBatchOptions with the max size set to 1024 bytes when creating a batch for entities with large message support enabled.

on the SDK usage side and you wouldn't be violating the SDK guidelines since you are exposing and information that is already available anyway in the lower levels. I would prefer something built into the SDK because of

I think the SDK team in combination with the service team is much closer to the "pulse of change" and could react faster to service changes.

but this would at least give us a way to move forward. Furthermore you might be able to use that property to deprecate it at a later time to guide users away from it once the service has better adaptive behavior built in

jsquire · 2024-07-02T13:33:50Z

Good thought, but I definitely think it'll be easier for us to sneak in the behavior with a plan for removing it when the service feature lands than it would be to get approval for a new public API member. If the service had an AMQP operation to expose runtime properties, as Event Hubs does - we could hide it there, but having just a "get me the max message size" hanging off the client would be odd.

jsquire · 2024-07-11T19:00:57Z

Fixed by #44917, which will be included in our July release next week.

(I also snuck the 4,500 item constraint into the checks while we're waiting on the service metadata to be available)

github-actions bot assigned JoshLove-msft May 27, 2024

adamralph mentioned this issue May 28, 2024

An endpoint is "force detached" by the broker when sending messages Particular/NServiceBus.Transport.AzureServiceBus#994

Closed

danielmarbach mentioned this issue May 28, 2024

[Service Bus] Prevent batching from being used on entities with large message support #44272

Closed

jsquire assigned jsquire and unassigned JoshLove-msft May 28, 2024

jsquire mentioned this issue May 28, 2024

Request for additional sending link metadata for service limitations Azure/azure-service-bus#708

Open

jsquire added issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels May 28, 2024

github-actions bot closed this as completed Jun 13, 2024

github-actions bot reopened this Jun 29, 2024

github-actions bot added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Jun 29, 2024

jsquire added feature-request This issue requires a new behavior in the product in order be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 30, 2024

jsquire modified the milestones: Backlog, 2024-08 Jun 30, 2024

jsquire mentioned this issue Jul 9, 2024

[Service Bus] Enforce batch limits #44917

Merged

jsquire closed this as completed Jul 11, 2024

This was referenced Aug 5, 2024

upgrade to Azure.Messaging.ServiceBus version that limits batch TryAdd when size limit is reached Particular/NServiceBus.Transport.AzureServiceBus#1017

Merged

Large messages in the audit queue cannot be processed Particular/ServiceControl#4185

Closed

github-actions bot locked and limited conversation to collaborators Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

danielmarbach commented May 27, 2024

github-actions bot commented May 27, 2024

danielmarbach commented May 27, 2024 •

edited

Loading

danielmarbach commented May 28, 2024

jsquire commented May 28, 2024

jsquire commented May 28, 2024

github-actions bot commented May 28, 2024

danielmarbach commented May 28, 2024

jsquire commented May 29, 2024

jsquire commented May 29, 2024 •

edited

Loading

danielmarbach commented May 30, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 13, 2024

danielmarbach commented Jun 29, 2024

danielmarbach commented Jun 29, 2024

jsquire commented Jun 30, 2024

jsquire commented Jun 30, 2024

danielmarbach commented Jun 30, 2024

jsquire commented Jul 2, 2024

jsquire commented Jul 11, 2024

[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

[BUG] Batch.TryAdd returns true on premium with large message support but sending the batch throws an InvalidOperationException #44261

Comments

danielmarbach commented May 27, 2024

Library name and version

Describe the bug

Expected behavior

Actual behavior

Reproduction Steps

Environment

github-actions bot commented May 27, 2024

danielmarbach commented May 27, 2024 • edited Loading

danielmarbach commented May 28, 2024

jsquire commented May 28, 2024

jsquire commented May 28, 2024

github-actions bot commented May 28, 2024

danielmarbach commented May 28, 2024

jsquire commented May 29, 2024

jsquire commented May 29, 2024 • edited Loading

danielmarbach commented May 30, 2024

github-actions bot commented Jun 6, 2024

github-actions bot commented Jun 13, 2024

danielmarbach commented Jun 29, 2024

danielmarbach commented Jun 29, 2024

jsquire commented Jun 30, 2024

jsquire commented Jun 30, 2024

danielmarbach commented Jun 30, 2024

jsquire commented Jul 2, 2024

jsquire commented Jul 11, 2024

danielmarbach commented May 27, 2024 •

edited

Loading

jsquire commented May 29, 2024 •

edited

Loading