Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IVF-PQ 100M Illegal Memory Access #665

Closed
tarang-jain opened this issue Feb 6, 2025 · 1 comment
Closed

[BUG] IVF-PQ 100M Illegal Memory Access #665

tarang-jain opened this issue Feb 6, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@tarang-jain
Copy link
Contributor

tarang-jain commented Feb 6, 2025

Running cuvs_ivf_pq (through cuvs-bench) on large datasets such as deep-100M gave the following error:
How to Reproduce the error:
python -m cuvs_bench.run --dataset deep-100M --dataset-path /home/datasets/ --algorithms cuvs_ivf_pq --groups base --build

I also had to add deep-100M's config to python/cuvs_bench/cuvs_bench/config/datasets/datasets.yaml:

- name: deep-100M
  dims: 96
  base_file: deep-100M/base.1B.fbin
  distance: euclidean
  groundtruth_neighbors_file: deep-100M/groundtruth.public.10K.ibin
  query_file: deep-100M/query.public.10K.fbin
  subset_size: 100000000

Error:

dataset: deep-100M
dim: 96
distance: euclidean
gpu_driver_version: 12.2
gpu_gpuDirectRDMASupported: 1
gpu_hostNativeAtomicSupported: 0
gpu_mem_bus_width: 5120
gpu_mem_freq: 2619000000.000000
gpu_mem_global_size: 85176483840
gpu_mem_shared_size: 233472
gpu_name: NVIDIA H100 80GB HBM3
gpu_pageableMemoryAccess: 0
gpu_pageableMemoryAccessUsesHostPageTables: 0
gpu_runtime_version: 12.8
gpu_sm_count: 132
gpu_sm_freq: 1980000000.000000
host_cores_used: 56
host_cpu_freq_max: 3800000000
host_cpu_freq_min: 800000000
host_pagesize: 4096
host_processors_sysconf: 224
host_processors_used: 224
host_total_ram_size: 2164185759744
host_total_swap_size: 0
n_records: 100000000
using ivf_pq::index_params nrows 100000000, dim 96, n_lists 1024, pq_dim 64
[2025-02-06 01:05:07.452] [RAFT] [info] inside extend execution
[2025-02-06 01:05:12.324] [RAFT] [info] finished extend execution
[2025-02-06 01:06:02.454] [RAFT] [info] inside batch destructor
CUDA call='cudaStreamSynchronize(stream_)' at file=/home/cuvs/cpp/src/neighbors/ivf_flat/../detail/ann_utils.cuh line=415 failed with an illegal instruction was encountered
[2025-02-06 01:06:02.454] [RAFT] [info] inside batch destructor
-------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------------
cuvs_ivf_pq.nlist1024.pq_dim64.pq_bits8.ratio10.niter25/process_time/real_time ERROR OCCURRED: 'CUDA error encountered at: file=/home/miniconda3/envs/env/include/raft/core/interruptible.hpp line=303: call='query_result', Reason=cudaErrorIllegalInstruction:an illegal instruction was encountered
@tarang-jain tarang-jain added the bug Something isn't working label Feb 6, 2025
@tarang-jain
Copy link
Contributor Author

This was fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant