Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Wrong memory override lead to a violation of HNSW algorithm #576

Open
3 tasks done
ycw66 opened this issue Mar 21, 2025 · 3 comments
Open
3 tasks done

Bug: Wrong memory override lead to a violation of HNSW algorithm #576

ycw66 opened this issue Mar 21, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@ycw66
Copy link

ycw66 commented Mar 21, 2025

Describe the bug

Problem statement

A critical memory corruption occurs in form_reverse_links_ when new_neighbors (a view of context.top) becomes invalidated during iteration. new_neighbors is a view of context.top, during traversal of new_neighbors, context.top may be modified to complete the function refine_, therefore corrupt the 'new_neighbors' as it is just a view.

Subsequent accesses to new_neighbors after refine_ function on a corrupted/stale memory range, leading to a violation of HNSW algorithm

In a word, the following code will cause some correctness problem, it will lead to some wrong links among nodes that will never appear in original HNSW algorithm.

form_reverse_links_ function is as follows:

 template <typename value_at, typename metric_at>
    void form_reverse_links_( //
        metric_at&& metric, compressed_slot_t new_slot, candidates_view_t new_neighbors, value_at&& value,
        level_t level, context_t& context) usearch_noexcept_m {

        top_candidates_t& top = context.top_candidates;
        std::size_t const connectivity_max = level ? config_.connectivity : config_.connectivity_base;

        // Reverse links from the neighbors:
        for (auto new_neighbor : new_neighbors) {
            compressed_slot_t close_slot = new_neighbor.slot;
            if (close_slot == new_slot)
                continue;
            node_lock_t close_lock = node_lock_(close_slot);
            node_t close_node = node_at_(close_slot);
            neighbors_ref_t close_header = neighbors_(close_node, level);

            // The node may have no neighbors only in one case, when it's the first one in the index,
            // but that is problematic to track in multi-threaded environments, where the order of insertion
            // is not guaranteed.
            // usearch_assert_m(close_header.size() || new_slot == 1, "Possible corruption - isolated node");
            usearch_assert_m(close_header.size() <= connectivity_max, "Possible corruption - overflow");
            usearch_assert_m(close_slot != new_slot, "Self-loops are impossible");
            usearch_assert_m(level <= close_node.level(), "Linking to missing level");

            // If `new_slot` is already present in the neighboring connections of `close_slot`
            // then no need to modify any connections or run the heuristics.
            if (close_header.size() < connectivity_max) {
                close_header.push_back(new_slot);
                continue;
            }

            // To fit a new connection we need to drop an existing one.
            top.clear();
            usearch_assert_m((top.capacity() >= (close_header.size() + 1)),
                             "The memory must have been reserved in `add`");
            top.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
            for (compressed_slot_t successor_slot : close_header)
                top.insert_reserved(
                    {context.measure(citerator_at(close_slot), citerator_at(successor_slot), metric), successor_slot});

            // Export the results:
            close_header.clear();
            candidates_view_t top_view =
                refine_(metric, connectivity_max, top, context, context.computed_distances_in_reverse_refines);
            usearch_assert_m(top_view.size(), "This would lead to isolated nodes");
            for (std::size_t idx = 0; idx != top_view.size(); idx++)
                close_header.push_back(top_view[idx].slot);
        }
    }

Steps to reproduce

  1. Write a test program, trying to continuely add the vectors into usearch index one by one, ~1000 vectors.
  2. Set a breakpoint in funtion form_reverse_link_, here is accurate code postion: link
  3. Run the test program and it will hit the breakpoint.
  4. Watch the elements in new_neighbors when you reach the breakpoint.
  5. Then run the program step by step, you will find the change of the new_neighbours, which should not be changed.

Expected behavior

During the execution of the function form_reverse_link_, the new_neighbors should never be changed.

USearch version

latest

Operating System

Ubuntu 24.04

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ashvardanian
Copy link
Contributor

Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉

@ycw66
Copy link
Author

ycw66 commented Mar 21, 2025

Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉

Thank you Ash! USearch is a popular high-performance library for vector search. I am trying to adapt USearch to our project, a universal data layer, to build the distributed vector database. Our system have a universal data layer can provide the ability of auto-scale, cache, ACID transaction and fault-tolerance. I just want to adapt the USearch to our universal data layer. When I do this, I need slightly modify the original code of USearch and find this bug.

@ycw66
Copy link
Author

ycw66 commented Apr 1, 2025

Hi @ycw66! When I've shared the code for USearch, I didn't expect people to go that deep into it's logic, so I'm very excited to see this issue the proposed #575 patch. I hope this waits a few days until the next week, was planning to refactor a few bits and pieces in the USearch and your suggestions would fit right in 😉

Hi ash, I wanna know if this bug is verified and fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants