Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A targeted multi-partition query on a sub-partitioned container becomes a fan-out cross-partition query if the filter condition is always false #4916

Open
alesk-kontent opened this issue Nov 28, 2024 · 0 comments
Labels

Comments

@alesk-kontent
Copy link

Describe the bug

When I run a targeted multi-partition query on a sub-partitioned container, the query becomes a fan-out cross-partition query if the following conditions are met:

  • The filter condition is always false.
  • The filter condition contains the first part of the partition key.
  • The query options do not contain a partition key.

To reproduce

I wrote a console application to reproduce the issue. Please download the solution, open the Program.cs file and specify a connection string to an Azure Cosmos DB for NoSQL account. The application creates a new database with a container Subpartitioned. It also upserts random data, runs a query (with and without a partition key in the request options) and displays relevant information. You can run the application several times; it will only create resources and upsert data once.

The container contains 100 documents (articles) with the following properties:

Id : string
CustomerId : Guid
Label : string
Title : string

The value of the Id property is a random GUID. The value of the CustomerId and Label properties is always the same. The value of the Title property is unique for each document.

The indexing policy is automatic and consistent. As for the partition key, it has two components, customerId and id, so there are 100 logical partitions with one document. The maximum throughput is 20,000 RU/s so the container has two physical partitions, one of which is empty.

The query is SELECT * FROM c WHERE c.customerId = @customerId AND c.label = @label AND c.label <> @label and the application runs it twice:

  • Request options contain a partition key with the customer id.
  • Request options do not contain a partition key.

You can also simplify the query to SELECT * FROM c WHERE c.customerId = @customerId AND false.

I can see the following results:

Container: Subpartitioned
Partition key: ["df165b31-7641-4664-9549-37862ed806ee"]
Request charge: 2,25
Physical partitions: 1

Container: Subpartitioned
Partition key:
Request charge: 4,50
Physical partitions: 0, 1

Expected behavior

As the filter condition contains the first component of the partition key, the request charge should be the same and the SDK should only read data only from one physical partition.

Actual behavior

Without a partition key in the request options the query becomes a fan-out cross-partition query so the request charge is higher as the SDK reads data from all physical partitions.

Environment summary

SDK Version: 3.46.0
OS Version: Windows 11 Enterprise (10.0.22631 Build 22631)

Additional context

It happens in both Direct and Gateway mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants