You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.
Describe the bug
Including the same partial partition key (that is already in the aggregate count query WHERE clause) in the query options hurts the query performance. Was I not supposed to pass the partition key in there at all? I thought we were supposed to pass the partition key wherever possible.
To Reproduce
Suppose I have a Cosmos DB "CountTest" with a collection "Subpartitioned" with a 2-part hierarchical partition key /TenantId, /id and I want to count all documents belonging to a specific tenant, say "1". A query SELECT VALUE COUNT(1) FROM c WHERE c["TenantId"] = "1" is quick and cheap, so far so good.
To my surprise, if I execute this query using the latest SDK and pass new PartitionKeyBuilder().Add("1").Build() in the QueryRequestOptions the index utilization goes down significantly (down to 10% on the test data described here, down to 0% in my real life deployment). And on bigger data sets the query become prohibitively expensive.
Consider the following code.
using Microsoft.Azure.Cosmos;
using Microsoft.Azure.Cosmos.Linq;
var cosmosClient = new CosmosClient(Environment.GetEnvironmentVariable("CosmosConnectionString"));
var container = cosmosClient.GetDatabase("CountTest").GetContainer("Subpartitioned");
var expensiveResponse = await container.GetItemLinqQueryable<Document>(requestOptions: new QueryRequestOptions
{
PartitionKey = new PartitionKeyBuilder()
.Add("1")
.Build()
})
.Where(doc => doc.TenantId == "1")
.CountAsync();
var cheapResponse = await container.GetItemLinqQueryable<Document>(requestOptions: new QueryRequestOptions
{
PartitionKey = null
})
.Where(doc => doc.TenantId == "1")
.CountAsync();
Console.WriteLine($"I would expect {expensiveResponse.RequestCharge} to be the same as {cheapResponse.RequestCharge} but it's much higher.");
public class Document
{
public string TenantId { get; set; }
public string id { get; set; }
}
As a workaround, I can omit the PartitionKey from the QueryRequestOptions and the index utilization goes back to 100% and RU cost back down to single digits.
Expected behavior
I would expect the count query index utilization to be the same if the partition key is inferred from the query as if it's explicitly passed in the request options. Passing the partition key in the options may seem a bit pointless, but I would not expect it to make performance worse.
Actual behavior
Explicitly passing the (partial) partition key appears to hurt index utilization and thus query charge. Based on the diagnostics, the database has to retrieve each document it counts. Though on the output there's only the one document with a number.
Environment summary
SDK Version: 3.35.0
OS Version: Windows 10
Additional context
Obviously, this is only issue with subpartitions/hierarchical partition keys, as originally one could only pass the entire partition key and not just a part of it. Furthermore, I haven't observed any index utilization issues with normal document-fetching queries. Appears to be only a problem with the aggregated ones. It doesn't matter whether I use LINQ or the string based QueryDefinition.
The text was updated successfully, but these errors were encountered:
vit-svoboda
changed the title
Passing parttial partition key in QueryRequestOptions hurts COUNT index utilization
Passing partial partition key in QueryRequestOptions hurts COUNT index utilization
Jun 23, 2023
@vit-svoboda What was discovered was that when using Partial Partition Keys with Queries, it fails to correctly route the query in some cases (when the partial Partition Key spans multiple physical partitions). With this bug still unresolved, it is hard to understand if the problem you are facing is really an issue with the query execution or really just linked to the routing problem.
I just retested the issue using the latest SDK (v3.37.1) where the possibly related issues mentioned above have been fixed. But my issue still appears to be there.
Describe the bug
Including the same partial partition key (that is already in the aggregate count query WHERE clause) in the query options hurts the query performance. Was I not supposed to pass the partition key in there at all? I thought we were supposed to pass the partition key wherever possible.
To Reproduce
Suppose I have a Cosmos DB
"CountTest"
with a collection"Subpartitioned"
with a 2-part hierarchical partition key/TenantId
,/id
and I want to count all documents belonging to a specific tenant, say"1"
. A querySELECT VALUE COUNT(1) FROM c WHERE c["TenantId"] = "1"
is quick and cheap, so far so good.To my surprise, if I execute this query using the latest SDK and pass
new PartitionKeyBuilder().Add("1").Build()
in theQueryRequestOptions
the index utilization goes down significantly (down to 10% on the test data described here, down to 0% in my real life deployment). And on bigger data sets the query become prohibitively expensive.Consider the following code.
As a workaround, I can omit the
PartitionKey
from theQueryRequestOptions
and the index utilization goes back to 100% and RU cost back down to single digits.Expected behavior
I would expect the count query index utilization to be the same if the partition key is inferred from the query as if it's explicitly passed in the request options. Passing the partition key in the options may seem a bit pointless, but I would not expect it to make performance worse.
Actual behavior
Explicitly passing the (partial) partition key appears to hurt index utilization and thus query charge. Based on the diagnostics, the database has to retrieve each document it counts. Though on the output there's only the one document with a number.
Environment summary
SDK Version: 3.35.0
OS Version: Windows 10
Additional context
Obviously, this is only issue with subpartitions/hierarchical partition keys, as originally one could only pass the entire partition key and not just a part of it. Furthermore, I haven't observed any index utilization issues with normal document-fetching queries. Appears to be only a problem with the aggregated ones. It doesn't matter whether I use LINQ or the string based
QueryDefinition
.The text was updated successfully, but these errors were encountered: