Skip to content

Conversation

@tonyd3
Copy link

@tonyd3 tonyd3 commented Aug 5, 2025

Refactor Iterator API and Remove Old Iterators

This pull request introduces a significant refactoring of the core iterator API, replacing the legacy IndexIterator with a new, unified QueryIterator interface. This change aims to streamline iteration patterns and remove deprecated code, resulting in substantial deletions across various modules. The refactoring also includes the removal of the ConcurrentSearchCtx key management, simplifying concurrency handling within the system.

Key Changes

• Replaced IndexIterator with QueryIterator across core modules, including inverted_index, hybrid_reader, geo_index, and aggregate.
• Removed IndexReader and its associated functions (IR_Read, IR_SkipTo, IR_Free, IR_Rewind) from inverted_index`.c` and inverted_index.h.
• Removed IdListIterator and its related functions, leading to the deletion of src/id_list.c.
• Removed the old index_iterator`.h` header, replaced by `src/iterators/`iterator_api`.h`. • Removed ConcurrentSearchCtxandConcurrentKeyCtxstructures and their management functions, simplifying concurrent key handling. • UpdatedHybridIteratorto fully utilize the newQueryIteratorinterface, including changes to its `Read` andSkipTo logic and internal state management. • Renamed and adapted `geometry/`QueryIterator to CPPQueryIterator to align with the new C-style QueryIterator base struct.
• Adjusted DocTable's field expiration predicate checks, removing DocTable_HasExpiration and DocTable_VerifyFieldExpirationPredicate in favor of DocTable_CheckFieldExpirationPredicate.
• Modified debug commands (DumpInvertedIndex, DumpNumericIndex, DumpTagIndex) to use the new QueryIterator for displaying index contents.
• Updated AREQ (Aggregate Request) structure and pipeline building logic to integrate with the new QueryIterator as the root iterator.

Affected Areas

• src/inverted_index/ (inverted_index.c, inverted_index.h)
• src/iterators/ (hybrid_reader.c, empty_iterator.c, empty_iterator.h, iterator_api.h)
• src/id_list.c (deleted)
• src/index_iterator.h (deleted)
• src/index.h (deleted)
• src/geometry/ (query_iterator.cpp, query_iterator.hpp, geometry_api.cpp, geometry_api.h)
• src/concurrent_ctx.c, src/concurrent_ctx.h
• src/debug_commands.c
• src/geo_index.c, src/geo_index.h
• src/doc_table.c, src/doc_table.h
• src/aggregate/ (aggregate.h, aggregate_request.c, aggregate_exec.c)
• src/indexer.c


This summary was automatically generated by @propel-code-bot

@propel-code-bot propel-code-bot bot changed the title Guyav replace old iterators Refactor: Replace IndexIterator with unified QueryIterator API Aug 5, 2025
Comment on lines +89 to +92
static void insertResultToHeap_Aggregate(HybridIterator *hr, RSIndexResult *child_res,
RSIndexResult *vec_res, double *upper_bound) {

AggregateResult_AddChild(res, vec_res);
AggregateResult_AddChild(res, child_res);
RSIndexResult *hit = IndexResult_DeepCopy(res);
IndexResult_ResetAggregate(res); // Reset the current result.
ResultMetrics_Add(hit, hr->base.ownKey, RS_NumVal(vec_res->data.num.value));
RSIndexResult *res = NewHybridResult();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Potential memory leak in insertResultToHeap_Aggregate

The function creates copies using IndexResult_DeepCopy() and marks them as copies, but if mmh_insert() or mmh_exchange_max() fail, these copied results could leak. Consider adding error handling.

static void insertResultToHeap_Aggregate(HybridIterator *hr, RSIndexResult *child_res,
                                         RSIndexResult *vec_res, double *upper_bound) {
  RSIndexResult *res = NewHybridResult();
  if (!res) return; // Handle allocation failure
  
  RSIndexResult *vec_copy = IndexResult_DeepCopy(vec_res);
  RSIndexResult *child_copy = IndexResult_DeepCopy(child_res);
  if (!vec_copy || !child_copy) {
    IndexResult_Free(res);
    IndexResult_Free(vec_copy);
    IndexResult_Free(child_copy);
    return;
  }
  
  AggregateResult_AddChild(res, vec_copy);
  AggregateResult_AddChild(res, child_copy);
  // ... rest of function
}
Copy Context

Comment on lines +818 to +821
IteratorStatus status = iter->SkipTo(iter, startId);

// Continue reading the rest
while (status == ITERATOR_OK) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

There's a potential logic error here. If iter->SkipTo(iter, startId) returns ITERATOR_NOTFOUND, the while loop will not be executed, even though the iterator is positioned at the next valid element. This will cause documents added since the fork to be missed by the HLL update.

The loop should execute if status is ITERATOR_OK or ITERATOR_NOTFOUND after the SkipTo call. You can normalize ITERATOR_NOTFOUND to ITERATOR_OK before the loop, as the iterator is already positioned at the correct element to start reading from.

Copy Context


IndexResult_ConcatMetrics(*vec_res, child_res); // Pass child metrics, if there are any
ResultMetrics_Add(*vec_res, hr->base.ownKey, RS_NumVal((*vec_res)->data.num.value));
ResultMetrics_Add(*vec_res, hr->ownKey, RS_NumVal((*vec_res)->data.num.value));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The HybridIterator struct has its own ownKey field, which shadows the ownKey field in the embedded QueryIterator base. The ownKey is set on base.ownKey during query parsing, but here you are accessing hr->ownKey, which is uninitialized. This will likely lead to a crash or incorrect behavior when adding metrics.

You should use hr->base.ownKey instead.

Copy Context

AggregateResult_AddChild(res, IndexResult_DeepCopy(vec_res));
AggregateResult_AddChild(res, IndexResult_DeepCopy(child_res));
res->isCopy = true; // Mark as copy, so when we free it, it will also free its children.
ResultMetrics_Add(res, hr->ownKey, RS_NumVal(vec_res->data.num.value));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Similar to the issue in insertResultToHeap_Metric, you are accessing hr->ownKey which shadows hr->base.ownKey and is uninitialized. This should be hr->base.ownKey to correctly access the key set during query parsing.

Copy Context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants