Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 2.77 KB

README.md

File metadata and controls

24 lines (19 loc) · 2.77 KB

awesome-vector-embeddings

Collection of resources for embeddings

Vector Datasets / Embeddings dumps

  1. khellific/anidb-series-embeddings at main : https://huggingface.co/datasets/khellific/anidb-series-embeddings/tree/main
  2. Hacker News OpenAI Embeddings | Kaggle : https://www.kaggle.com/datasets/julien040/hacker-news-openai-embeddings
  3. Cohere/wikipedia-22-12-en-embeddings · Datasets at Hugging Face : https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings
  4. Glove Embeddings | Kaggle : https://www.kaggle.com/datasets/anmolkumar/glove-embeddings
  5. NLPL word embeddings repository : http://vectors.nlpl.eu/repository/
  6. RxRx19a COVID-19 Image Embeddings | Kaggle : https://www.kaggle.com/datasets/tunguz/rxrx19a
  7. Gensim Word Embeddings | Kaggle : https://www.kaggle.com/datasets/iezepov/gensim-embeddings-dataset
  8. 130k Images (512x512) - Universal Image Embeddings | Kaggle : https://www.kaggle.com/datasets/rhtsingh/130k-images-512x512-universal-image-embeddings
  9. Pre-trained Word Vectors for Spanish | Kaggle : https://www.kaggle.com/datasets/rtatman/pretrained-word-vectors-for-spanish
  10. Embeddings: GloVe, Crawl, etc. | torch cached | Kaggle : https://www.kaggle.com/datasets/leighplt/embeddings-glove-crawl-torch-cached
  11. fasttext embeddings | Kaggle : https://www.kaggle.com/datasets/abhishek/fasttext
  12. OpenAI Embeddings for New York Times Articles | Kaggle : https://www.kaggle.com/datasets/dilwong/openai-embeddings-for-new-york-times-articles?resource=download
  13. GitHub - erikbern/ann-benchmarks: Benchmarks of approximate nearest neighbor libraries in Python : https://github.com/erikbern/ann-benchmarks/tree/main#data-sets

Related efforts

  1. Get early access to the Largest Embedding Marketplace : https://www.embedding.store/
  2. THE ALEXANDRIA INDEX : https://alex.macrocosm.so/download