You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A critical memory corruption occurs in form_reverse_links_ when new_neighbors (a view of context.top) becomes invalidated during iteration. new_neighbors is a view of context.top, during traversal of new_neighbors, context.top may be modified to complete the function refine_, therefore corrupt the 'new_neighbors' as it is just a view.
Subsequent accesses to new_neighbors after refine_ function on a corrupted/stale memory range, leading to a violation of HNSW algorithm
In a word, the following code will cause some correctness problem, it will lead to some wrong links among nodes that will never appear in original HNSW algorithm.
form_reverse_links_ function is as follows:
template <typename value_at, typename metric_at>
void form_reverse_links_( //
metric_at&& metric, compressed_slot_t new_slot, candidates_view_t new_neighbors, value_at&& value,
level_t level, context_t& context) usearch_noexcept_m {
top_candidates_t& top = context.top_candidates;
std::size_t const connectivity_max = level ? config_.connectivity : config_.connectivity_base;
// Reverse links from the neighbors:
for (auto new_neighbor : new_neighbors) {
compressed_slot_t close_slot = new_neighbor.slot;
if (close_slot == new_slot)
continue;
node_lock_t close_lock = node_lock_(close_slot);
node_t close_node = node_at_(close_slot);
neighbors_ref_t close_header = neighbors_(close_node, level);
// The node may have no neighbors only in one case, when it's the first one in the index,
// but that is problematic to track in multi-threaded environments, where the order of insertion
// is not guaranteed.
// usearch_assert_m(close_header.size() || new_slot == 1, "Possible corruption - isolated node");
usearch_assert_m(close_header.size() <= connectivity_max, "Possible corruption - overflow");
usearch_assert_m(close_slot != new_slot, "Self-loops are impossible");
usearch_assert_m(level <= close_node.level(), "Linking to missing level");
// If `new_slot` is already present in the neighboring connections of `close_slot`
// then no need to modify any connections or run the heuristics.
if (close_header.size() < connectivity_max) {
close_header.push_back(new_slot);
continue;
}
// To fit a new connection we need to drop an existing one.
top.clear();
usearch_assert_m((top.capacity() >= (close_header.size() + 1)),
"The memory must have been reserved in `add`");
top.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
for (compressed_slot_t successor_slot : close_header)
top.insert_reserved(
{context.measure(citerator_at(close_slot), citerator_at(successor_slot), metric), successor_slot});
// Export the results:
close_header.clear();
candidates_view_t top_view =
refine_(metric, connectivity_max, top, context, context.computed_distances_in_reverse_refines);
usearch_assert_m(top_view.size(), "This would lead to isolated nodes");
for (std::size_t idx = 0; idx != top_view.size(); idx++)
close_header.push_back(top_view[idx].slot);
}
}
Steps to reproduce
Write a test program, trying to continuely add the vectors into usearch index one by one, ~1000 vectors.
Set a breakpoint in funtion form_reverse_link_, here is accurate code postion: link
Run the test program and it will hit the breakpoint.
Watch the elements in new_neighbors when you reach the breakpoint.
Then run the program step by step, you will find the change of the new_neighbours, which should not be changed.
Expected behavior
During the execution of the function form_reverse_link_, the new_neighbors should never be changed.
USearch version
latest
Operating System
Ubuntu 24.04
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
No response
Are you open to being tagged as a contributor?
I am open to being mentioned in the project .git history as a contributor
Is there an existing issue for this?
I have searched the existing issues
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉
Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉
Thank you Ash! USearch is a popular high-performance library for vector search. I am trying to adapt USearch to our project, a universal data layer, to build the distributed vector database. Our system have a universal data layer can provide the ability of auto-scale, cache, ACID transaction and fault-tolerance. I just want to adapt the USearch to our universal data layer. When I do this, I need slightly modify the original code of USearch and find this bug.
Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉
Hi ash, I wanna know if this bug is verified and fixed?
Describe the bug
Problem statement
A critical memory corruption occurs in form_reverse_links_ when
new_neighbors
(a view of context.top) becomes invalidated during iteration.new_neighbors
is a view ofcontext.top
, during traversal ofnew_neighbors
,context.top
may be modified to complete the functionrefine_
, therefore corrupt the 'new_neighbors' as it is just a view.Subsequent accesses to
new_neighbors
afterrefine_
function on a corrupted/stale memory range, leading to a violation of HNSW algorithmIn a word, the following code will cause some correctness problem, it will lead to some wrong links among nodes that will never appear in original HNSW algorithm.
form_reverse_links_
function is as follows:Steps to reproduce
form_reverse_link_
, here is accurate code postion: linknew_neighbors
when you reach the breakpoint.new_neighbours
, which should not be changed.Expected behavior
During the execution of the function
form_reverse_link_
, thenew_neighbors
should never be changed.USearch version
latest
Operating System
Ubuntu 24.04
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: