Code for the "Build Your Own Search Engine"
What we will do:
- Use FAQ documents from free online courses
- Create a search engine for retreiving these documents
- Later the results can be used for a Q&A RAG system
- Preparing the Environment
- Basics of Text Search
- Basics of Information Retrieval
- Introduction to vector spaces, bag of words, and TF-IDF
- Implementing Basic Text Search
- TF-IDF scoring with sklearn
- Keyword filtering using pandas
- Creating a class for relevance search
- Embeddings and Vector Search
- Vector embeddings
- Word2Vec and other approaches for word embeddings
- LSA (Latent Semantic Analysis) for document embeddings
- Implementing vector search with LSA
- BERT embeddings
- Combining Text and Vector Search
- Practical Implementation Aspects and Tools
- Real-world implementation tools:
- Inverted indexes for text search
- LSH for vector search (using random projections)
- Technologies:
- Lucene/Elasticsearch for text search
- FAISS and and other vector databases
- Real-world implementation tools: