Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing partial partition key in QueryRequestOptions hurts COUNT index utilization #3936

Open
vit-svoboda opened this issue Jun 23, 2023 · 4 comments

Comments

@vit-svoboda
Copy link

We are continuously addressing and improving the SDK, if possible, make sure the problem persist in the latest SDK version.

Describe the bug
Including the same partial partition key (that is already in the aggregate count query WHERE clause) in the query options hurts the query performance. Was I not supposed to pass the partition key in there at all? I thought we were supposed to pass the partition key wherever possible.

To Reproduce
Suppose I have a Cosmos DB "CountTest" with a collection "Subpartitioned" with a 2-part hierarchical partition key /TenantId, /id and I want to count all documents belonging to a specific tenant, say "1". A query SELECT VALUE COUNT(1) FROM c WHERE c["TenantId"] = "1" is quick and cheap, so far so good.
To my surprise, if I execute this query using the latest SDK and pass new PartitionKeyBuilder().Add("1").Build() in the QueryRequestOptions the index utilization goes down significantly (down to 10% on the test data described here, down to 0% in my real life deployment). And on bigger data sets the query become prohibitively expensive.

Consider the following code.

using Microsoft.Azure.Cosmos;
using Microsoft.Azure.Cosmos.Linq;

var cosmosClient = new CosmosClient(Environment.GetEnvironmentVariable("CosmosConnectionString"));
var container = cosmosClient.GetDatabase("CountTest").GetContainer("Subpartitioned");

var expensiveResponse = await container.GetItemLinqQueryable<Document>(requestOptions: new QueryRequestOptions
    {
        PartitionKey = new PartitionKeyBuilder()
            .Add("1")
            .Build()
    })
    .Where(doc => doc.TenantId == "1")
    .CountAsync();

var cheapResponse = await container.GetItemLinqQueryable<Document>(requestOptions: new QueryRequestOptions
    {
        PartitionKey = null
    })
    .Where(doc => doc.TenantId == "1")
    .CountAsync();

Console.WriteLine($"I would expect {expensiveResponse.RequestCharge} to be the same as {cheapResponse.RequestCharge} but it's much higher.");

public class Document
{
    public string TenantId { get; set; }
    public string id { get; set; }
}

As a workaround, I can omit the PartitionKey from the QueryRequestOptions and the index utilization goes back to 100% and RU cost back down to single digits.

Expected behavior
I would expect the count query index utilization to be the same if the partition key is inferred from the query as if it's explicitly passed in the request options. Passing the partition key in the options may seem a bit pointless, but I would not expect it to make performance worse.

Actual behavior
Explicitly passing the (partial) partition key appears to hurt index utilization and thus query charge. Based on the diagnostics, the database has to retrieve each document it counts. Though on the output there's only the one document with a number.

Environment summary
SDK Version: 3.35.0
OS Version: Windows 10

Additional context
Obviously, this is only issue with subpartitions/hierarchical partition keys, as originally one could only pass the entire partition key and not just a part of it. Furthermore, I haven't observed any index utilization issues with normal document-fetching queries. Appears to be only a problem with the aggregated ones. It doesn't matter whether I use LINQ or the string based QueryDefinition.

@vit-svoboda vit-svoboda changed the title Passing parttial partition key in QueryRequestOptions hurts COUNT index utilization Passing partial partition key in QueryRequestOptions hurts COUNT index utilization Jun 23, 2023
@ealsur ealsur added the QUERY label Jun 26, 2023
@ealsur
Copy link
Member

ealsur commented Jun 26, 2023

See #3933 and #3934

There is currently an issue with Queries and Subpartitioning when the Partition Key is partial.

@vit-svoboda
Copy link
Author

Does that mean my observation regarding request options was incorrect and the entire thing won't work reliably outside of the specific case I tested?

@ealsur
Copy link
Member

ealsur commented Jul 3, 2023

@vit-svoboda What was discovered was that when using Partial Partition Keys with Queries, it fails to correctly route the query in some cases (when the partial Partition Key spans multiple physical partitions). With this bug still unresolved, it is hard to understand if the problem you are facing is really an issue with the query execution or really just linked to the routing problem.

@akotalwar akotalwar self-assigned this Aug 28, 2023
@vit-svoboda
Copy link
Author

I just retested the issue using the latest SDK (v3.37.1) where the possibly related issues mentioned above have been fixed. But my issue still appears to be there.

@akotalwar akotalwar assigned adityasa and unassigned akotalwar Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants