Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAISS 1.7.2 and FAISS 1.7.3 giving different AUROC (97 vs 21) results even when running on the same GPU configuration. #4216

Open
anindyasdas opened this issue Feb 27, 2025 · 0 comments

Comments

@anindyasdas
Copy link

I am getting huge difference in ROC, when running on two different version on Faiss. In the work, I am extracting features from BERT and populating the index of FaissDB and finding the nearest neighbor

2025-02-27 16:39:38,081 - loader.py:66 - Successfully loaded faiss.
FAISS Version: 1.7.2
Numpy Version: 1.26.4
PyTorch Version: 2.3.0
GPU Enabled: False
GPU Architecture: NVIDIA A100-SXM4-40GB
extracting from layers [-1]
FAISS Version: 1.7.2
Numpy Version: 1.26.4
PyTorch Version: 2.3.0
GPU Enabled: True
GPU Architecture: NVIDIA A100-SXM4-40GB
Extraction embedding, computing support features: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 3367/3367 [00:41<00:00, 81.41it/s]
total_feat: (444444, 768)
self.percentage 0.1
Subsampling...: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 44444/44444 [00:46<00:00, 953.39it/s]
total_feat2: (44444, 768)
fitting the data
Data Type of Features: float32
create index dimension 768
class name: GpuIndexFlatL2
data fitted
Loading test data...
872 872 872
Infering.....: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 436/436 [00:05<00:00, 82.68it/s]
auroc: 0.9748070325900515

####################################

2025-02-27 16:42:26,698 - loader.py:56 - Successfully loaded faiss with AVX2 support.
FAISS Version: 1.7.3
Numpy Version: 1.24.3
PyTorch Version: 2.4.1+cu121
GPU Enabled: False
GPU Architecture: NVIDIA A100-SXM4-40GB
extracting from layers [-1]
FAISS Version: 1.7.3
Numpy Version: 1.24.3
PyTorch Version: 2.4.1+cu121
GPU Enabled: True
GPU Architecture: NVIDIA A100-SXM4-40GB
Extraction embedding, computing support features: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 3367/3367 [00:33<00:00, 100.00it/s]
total_feat: (444444, 768)
self.percentage 0.1
Subsampling...: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 44444/44444 [00:40<00:00, 1104.26it/s]
total_feat2: (44444, 768)
fitting the data
Data Type of Features: float32
create index dimension 768
class name: GpuIndexFlatL2
data fitted
Loading test data...
872 872 872
Infering.....: 100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 436/436 [00:04<00:00, 98.95it/s]
auroc: 0.20863418712472448

`class FaissNN(object):
def init(self, on_gpu: bool = False, num_workers: int = 4) -> None:
"""FAISS Nearest neighbourhood search.

    Args:
        on_gpu: If set true, nearest neighbour searches are done on GPU.
        num_workers: Number of workers to use with FAISS for similarity search.
    """
    faiss.omp_set_num_threads(num_workers)
    faiss.GpuIndexFlatConfig().useFloat16 = False
    self.on_gpu = on_gpu
    self.search_index = None
    print("FAISS Version:", faiss.__version__)
    print("Numpy Version:", np.__version__)
    print("PyTorch Version:", torch.__version__)
    print("GPU Enabled:", self.on_gpu)
    print("GPU Architecture:", torch.cuda.get_device_name(0))

def _gpu_cloner_options(self):
    return faiss.GpuClonerOptions()

def _index_to_gpu(self, index):
    if self.on_gpu:
        # For the non-gpu faiss python package, there is no GpuClonerOptions
        # so we can not make a default in the function header.
        return faiss.index_cpu_to_gpu(
            faiss.StandardGpuResources(), 0, index, self._gpu_cloner_options()
        )
    return index

def _index_to_cpu(self, index):
    if self.on_gpu:
        return faiss.index_gpu_to_cpu(index)
    return index

def _create_index(self, dimension):
    print("create index", "dimension", dimension)
    if self.on_gpu:
        return faiss.GpuIndexFlatL2(
            faiss.StandardGpuResources(), dimension, faiss.GpuIndexFlatConfig()
        )
    return faiss.IndexFlatL2(dimension)

def fit(self, features: np.ndarray) -> None:
    """
    Adds features to the FAISS search index.

    Args:
        features: Array of size NxD.
    """
    print("Data Type of Features:", features.dtype)
    if self.search_index:
        self.reset_index()
    #print("features.shape:", features.shape) #features.shape: (44444, 768)
    self.search_index = self._create_index(features.shape[-1]) #index dimension is token dimension
    print("class name:", self.search_index.__class__.__name__)
    self._train(self.search_index, features)
    self.search_index.add(features)

def _train(self, _index, _features):
    pass

def run(
    self,
    n_nearest_neighbours,
    query_features: np.ndarray,
    index_features: np.ndarray = None,
) -> Union[np.ndarray, np.ndarray, np.ndarray]:
    """
    Returns distances and indices of nearest neighbour search.

    Args:
        query_features: Features to retrieve.
        index_features: [optional] Index features to search in.
    """
    #print("index_features:", index_features)
    if index_features is None: # if index features during query is NONE, search created index features
        return self.search_index.search(query_features, n_nearest_neighbours)

    # Build a search index just for this search. # if index_features are not NULL
    search_index = self._create_index(index_features.shape[-1])
    print("Run:", search_index, index_features)
    self._train(search_index, index_features)
    search_index.add(index_features)
    print("Index Description:", self.search_index)
    return search_index.search(query_features, n_nearest_neighbours)

def save(self, filename: str) -> None:
    faiss.write_index(self._index_to_cpu(self.search_index), filename)

def load(self, filename: str) -> None:
    self.search_index = self._index_to_gpu(faiss.read_index(filename))

def reset_index(self):
    if self.search_index:
        self.search_index.reset()
        self.search_index = None`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant