Skip to content

DOC-734 | AQL optimization: COLLECT ... AGGREGATE can utilize persistent index #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,40 @@ to some extent.
See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex)
for details.

---

<small>Introduced in: v3.12.5</small>

The `use-index-for-collect` optimizer rule has been further extended.
Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively
refers to attributes covered by a persistent index and no other variables can
now utilize this index.

Reading the data from the index instead of the stored documents for aggregations
can significantly increase the perform if the there are few different values.

```aql
FOR doc IN coll
COLLECT a = doc.a AGGREGATE b = MAX(doc.b)
RETURN { a, b }
```

If there is a persistent index over the attributes `a` and `b`, then the query
explain output shows an `IndexCollectNode` if the optimization is applied:

```aql
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */
6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */
7 ReturnNode 4999 - RETURN #5

Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
10 idx_1836452431376941056 persistent coll
```

## Indexing

### Multi-dimensional indexes
Expand Down
34 changes: 34 additions & 0 deletions site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md
Original file line number Diff line number Diff line change
Expand Up @@ -1257,6 +1257,40 @@ to some extent.
See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex)
for details.

---

<small>Introduced in: v3.12.5</small>

The `use-index-for-collect` optimizer rule has been further extended.
Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively
refers to attributes covered by a persistent index and no other variables can
now utilize this index.

Reading the data from the index instead of the stored documents for aggregations
can significantly increase the perform if the there are few different values.

```aql
FOR doc IN coll
COLLECT a = doc.a AGGREGATE b = MAX(doc.b)
RETURN { a, b }
```

If there is a persistent index over the attributes `a` and `b`, then the query
explain output shows an `IndexCollectNode` if the optimization is applied:

```aql
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */
6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */
7 ReturnNode 4999 - RETURN #5

Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
10 idx_1836452431376941056 persistent coll
```

## Indexing

### Multi-dimensional indexes
Expand Down