Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix types in qv_partition_with_scores() and train_no_init() #546

Merged
merged 4 commits into from
Oct 16, 2024

Conversation

jparismorgan
Copy link
Contributor

@jparismorgan jparismorgan commented Oct 14, 2024

First change

Fix types in qv_partition_with_scores() and train_no_init(). I noticed this during debugging when I had an int8 feature vector. When I printed out high_scores, I wouldn't see the value of the score:

      std::ceil(reassign_ratio_ * static_cast<float>(num_partitions_)): 1
      heap_size: 6
      high_scores.dump(): 
      low_degrees.dump(): 
      inserting [0] (0, 0)
            high_scores.dump(): (, 0) 
      inserting [1] (0, 1)
            high_scores.dump(): (, 0) (, 1) 
      inserting [2] (0, 2)
            high_scores.dump(): (, 0) (, 1) (, 2) 
      inserting [3] (0, 3)
            high_scores.dump(): (, 0) (, 1) (, 2) (, 3) 
      degrees (5): [1, 1, 1, 1, 0]
      high_scores.dump(): (, 0) (, 1) (, 2) (, 3) 
      low_degrees.dump(): 
      high_scores.dump(): (, 0) (, 1) (, 2) (, 3) 
      low_degrees.dump(): (1, 0) (1, 1) (1, 2) (1, 3) (0, 4) 
      high_scores.dump(): (, 2) (, 3) (, 1) (, 0) 
      low_degrees.dump(): (0, 4) (1, 3) (1, 1) (1, 2) (1, 0) 
      i: 0 low_degrees: (4, 0) high_scores: (2, 2, )
      i: 1 low_degrees: (3, 1) high_scores: (3, 3, )
      i: 2 low_degrees: (1, 1) high_scores: (1, 1, )
      i: 3 low_degrees: (2, 1) high_scores: (0, 0, )

With this change, we do see the value:

      static_cast<float>(num_partitions_): 5
      std::ceil(reassign_ratio_ * static_cast<float>(num_partitions_)): 1
      heap_size: 6
      high_scores.dump(): 
      low_degrees.dump(): 
      inserting [0] (0, 0)
            high_scores.dump(): (0, 0) 
      inserting [1] (0, 1)
            high_scores.dump(): (0, 0) (0, 1) 
      inserting [2] (0, 2)
            high_scores.dump(): (0, 0) (0, 1) (0, 2) 
      inserting [3] (0, 3)
            high_scores.dump(): (0, 0) (0, 1) (0, 2) (0, 3) 
      degrees (5): [1, 1, 1, 1, 0]
      high_scores.dump(): (0, 0) (0, 1) (0, 2) (0, 3) 
      low_degrees.dump(): 
      high_scores.dump(): (0, 0) (0, 1) (0, 2) (0, 3) 
      low_degrees.dump(): (1, 0) (1, 1) (1, 2) (1, 3) (0, 4) 
      high_scores.dump(): (0, 2) (0, 3) (0, 1) (0, 0) 
      low_degrees.dump(): (0, 4) (1, 3) (1, 1) (1, 2) (1, 0) 
      i: 0 low_degrees: (4, 0) high_scores: (2, 2, 0)
      i: 1 low_degrees: (3, 1) high_scores: (3, 3, 0)
      i: 2 low_degrees: (1, 1) high_scores: (1, 1, 0)
      i: 3 low_degrees: (2, 1) high_scores: (0, 0, 0)

I also believe it's correct because our scores are float, not the type of the input vectors the user provided.

Second change

We also fix an asan error:

=================================================================
==25141==ERROR: AddressSanitizer: container-overflow on address 0x000107a041b0 at pc 0x000103530d7c bp 0x00016da4fcd0 sp 0x00016da4f480
READ of size 16 at 0x000107a041b0 thread T0
    #0 0x103530d78 in __asan_memcpy+0x394 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x50d78)
    #1 0x1023f2aec in void train_no_init<Matrix<float, Kokkos::layout_left, unsigned long>, Matrix<float, Kokkos::layout_left, unsigned long>, _l2_distance::sum_of_squares_distance>(Matrix<float, Kokkos::layout_left, unsigned long> const&, Matrix<float, Kokkos::layout_left, unsigned long>&, unsigned long, unsigned long, unsigned int, float, unsigned long, float, _l2_distance::sum_of_squares_distance) kmeans.h:336
    #2 0x1023b54e0 in CATCH2_INTERNAL_TEST_1() unit_kmeans.cc:386
    #3 0x10272a418 in Catch::TestInvokerAsFunction::invoke() const catch_test_case_registry_impl.cpp:149
    #4 0x1026e6bb4 in Catch::TestCaseHandle::invoke() const catch_test_case_info.hpp:115
    #5 0x1026e6944 in Catch::RunContext::invokeActiveTestCase() catch_run_context.cpp:538
    #6 0x1026dff90 in Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch_run_context.cpp:501
    #7 0x1026de634 in Catch::RunContext::runTest(Catch::TestCaseHandle const&) catch_run_context.cpp:232
    #8 0x10256f134 in Catch::(anonymous namespace)::TestGroup::execute() catch_session.cpp:110
    #9 0x10256cfd8 in Catch::Session::runInternal() catch_session.cpp:332
    #10 0x10256c4cc in Catch::Session::run() catch_session.cpp:263
    #11 0x102470fbc in int Catch::Session::run<char>(int, char const* const*) catch_session.hpp:41
    #12 0x102470cc0 in main catch_main.cpp:36
    #13 0x19db060dc  (<unknown module>)

0x000107a041b0 is located 16 bytes inside of 96-byte region [0x000107a041a0,0x000107a04200)
allocated by thread T0 here:
    #0 0x10354174c in wrap__Znwm+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x6174c)
    #1 0x1023b8494 in void* std::__1::__libcpp_operator_new[abi:ue170006]<unsigned long>(unsigned long) new:298
    #2 0x1023b8418 in std::__1::__libcpp_allocate[abi:ue170006](unsigned long, unsigned long) new:324
    #3 0x102426340 in std::__1::allocator<std::__1::tuple<float, unsigned long>>::allocate[abi:ue170006](unsigned long) allocator.h:114
    #4 0x102425ff4 in std::__1::__allocation_result<std::__1::allocator_traits<std::__1::allocator<std::__1::tuple<float, unsigned long>>>::pointer> std::__1::__allocate_at_least[abi:ue170006]<std::__1::allocator<std::__1::tuple<float, unsigned long>>>(std::__1::allocator<std::__1::tuple<float, unsigned long>>&, unsigned long) allocate_at_least.h:55
    #5 0x102428040 in std::__1::__split_buffer<std::__1::tuple<float, unsigned long>, std::__1::allocator<std::__1::tuple<float, unsigned long>>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::tuple<float, unsigned long>>&) __split_buffer:379
    #6 0x102427870 in std::__1::__split_buffer<std::__1::tuple<float, unsigned long>, std::__1::allocator<std::__1::tuple<float, unsigned long>>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<std::__1::tuple<float, unsigned long>>&) __split_buffer:375
    #7 0x102424678 in std::__1::vector<std::__1::tuple<float, unsigned long>, std::__1::allocator<std::__1::tuple<float, unsigned long>>>::reserve(unsigned long) vector:1581
    #8 0x102423ed0 in fixed_min_tuple_heap<std::__1::tuple<float, unsigned long>, std::__1::greater<float>>::fixed_min_tuple_heap<unsigned long>(unsigned long, std::__1::greater<float>) fixed_min_heap.h:111
    #9 0x102423dcc in fixed_min_pair_heap<float, unsigned long, std::__1::greater<float>>::fixed_min_pair_heap<unsigned long>(unsigned long, std::__1::greater<float>) fixed_min_heap.h:310
    #10 0x1023f9614 in fixed_min_pair_heap<float, unsigned long, std::__1::greater<float>>::fixed_min_pair_heap<unsigned long>(unsigned long, std::__1::greater<float>) fixed_min_heap.h:310
    #11 0x1023f1ed0 in void train_no_init<Matrix<float, Kokkos::layout_left, unsigned long>, Matrix<float, Kokkos::layout_left, unsigned long>, _l2_distance::sum_of_squares_distance>(Matrix<float, Kokkos::layout_left, unsigned long> const&, Matrix<float, Kokkos::layout_left, unsigned long>&, unsigned long, unsigned long, unsigned int, float, unsigned long, float, _l2_distance::sum_of_squares_distance) kmeans.h:279
    #12 0x1023b54e0 in CATCH2_INTERNAL_TEST_1() unit_kmeans.cc:386
    #13 0x10272a418 in Catch::TestInvokerAsFunction::invoke() const catch_test_case_registry_impl.cpp:149
    #14 0x1026e6bb4 in Catch::TestCaseHandle::invoke() const catch_test_case_info.hpp:115
    #15 0x1026e6944 in Catch::RunContext::invokeActiveTestCase() catch_run_context.cpp:538
    #16 0x1026dff90 in Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch_run_context.cpp:501
    #17 0x1026de634 in Catch::RunContext::runTest(Catch::TestCaseHandle const&) catch_run_context.cpp:232
    #18 0x10256f134 in Catch::(anonymous namespace)::TestGroup::execute() catch_session.cpp:110
    #19 0x10256cfd8 in Catch::Session::runInternal() catch_session.cpp:332
    #20 0x10256c4cc in Catch::Session::run() catch_session.cpp:263
    #21 0x102470fbc in int Catch::Session::run<char>(int, char const* const*) catch_session.hpp:41
    #22 0x102470cc0 in main catch_main.cpp:36
    #23 0x19db060dc  (<unknown module>)

HINT: if you don't care about these errors you may set ASAN_OPTIONS=detect_container_overflow=0.
If you suspect a false positive see also: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow.
SUMMARY: AddressSanitizer: container-overflow (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x50d78) in __asan_memcpy+0x394
Shadow bytes around the buggy address:
  0x000107a03f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000107a03f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000107a04000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000107a04080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x000107a04100: fa fa fa fa 00 00 00 00 00 00 fc fc fc fc fc fc
=>0x000107a04180: fa fa fa fa 00 00[fc]fc fc fc fc fc fc fc fc fc
  0x000107a04200: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
  0x000107a04280: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
  0x000107a04300: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
  0x000107a04380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
  0x000107a04400: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==25141==ABORTING

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
unit_kmeans is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
test kmeans train_no_init number of centroids exceeds data points
-------------------------------------------------------------------------------
/Users/parismorgan/repo/TileDB-Vector-Search-5/src/include/test/unit_kmeans.cc:378
...............................................................................

/Users/parismorgan/repo/TileDB-Vector-Search-5/src/include/test/unit_kmeans.cc:378: FAILED:
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

===============================================================================
test cases: 1 | 1 failed
assertions: 1 | 1 failed

[1]    25141 abort      ./src/build/include/test/unit_kmeans

That was happening because when there are less partitions then vectors (i.e. high_scores is smaller than low_degrees), we would still index

for (size_t i = 0; i < size(low_degrees) && std::get<0>(low_degrees[i]) <= lower_degree_bound; ++i) {
        auto [degree, zero_part] = low_degrees[i];
        // THIS WAS AN ERROR IF size(high_scores) < size(low_degrees)
        auto [score, high_vector_id] = high_scores[i];

With the change in this PR we no longer have an asan error:

(TileDB-Vector-Search-5) ~/repo/TileDB-Vector-Search-5 cmake --build ./src/build --target unit_kmeans && ./src/build/include/test/unit_kmeans
[ 94%] Built target Catch2
[ 97%] Built target Catch2WithMain
[ 97%] Built target docopt
[100%] Building CXX object include/test/CMakeFiles/unit_kmeans.dir/unit_kmeans.cc.o
[100%] Linking CXX executable unit_kmeans
[100%] Built target unit_kmeans
Randomness seeded to: 3728485368
[fixed_min_tuple_heap@ctor] k: 6
[fixed_min_tuple_heap@ctor] k: 6
[fixed_min_tuple_heap@ctor] k: 6
[fixed_min_tuple_heap@ctor] k: 6
===============================================================================
test cases: 1 | 1 passed
assertions: - none -

Testing

Tests pass.

@jparismorgan jparismorgan marked this pull request as ready for review October 15, 2024 20:52
Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@jparismorgan jparismorgan merged commit 4f58b38 into main Oct 16, 2024
6 checks passed
@jparismorgan jparismorgan deleted the jparismorgan/fix-types-in-kmeans-train branch October 16, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants