You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's start GPU accelerating with a Pytorch index. Dot products/cosine similarity are both nearly equivalent to a matrix multiplication, so using hardware accelerators seems to be useful here. On 32 GB of VRAM, we could fit 22 million MiniLM embeddings (384 dimensions on f32 precision) on a single GPU.
The text was updated successfully, but these errors were encountered:
I've been implementing and using pretty much the same ideas you're thinking of in tensorflow and java series.
Of course, I did the exact same thing with Pytorch, and the problem of finding the top k was also considered, as well as batch processing, dynamic batch processing, etc.
If you take a look at my code and agree with the direction I think the implementation should go, I'll contribute to this repository.
Let's start GPU accelerating with a Pytorch index. Dot products/cosine similarity are both nearly equivalent to a matrix multiplication, so using hardware accelerators seems to be useful here. On 32 GB of VRAM, we could fit 22 million MiniLM embeddings (384 dimensions on f32 precision) on a single GPU.
The text was updated successfully, but these errors were encountered: