This repository contains the code for a Streamlit application that utilizes a hybrid retrieval approach to rank resumes based on a provided job description. Check out the app here:- https://resume-ranking.streamlit.app/
- Hybrid Retrieval: Combines vector embeddings and BM25 similarity for robust search.
- Sentence Transformers: Leverages pre-trained sentence transformers for efficient text representation.
- Streamlit Integration: Provides a user-friendly interface for job description input and ranked resume display.
- Optional Reranking: Implements a sentence transformer-based reranking step for potentially improved results (requires additional model configuration).
- Offline Persistence: Saves the built index to disk for faster loading on subsequent application runs.
- Python 3.7+
- Streamlit
- llama-index
- faiss
- sentence-transformers
- (Optional) Additional libraries for specific reranking models
- Clone this repository.
- Create a new virtual environment (recommended).
- Navigate to the project directory.
- Install required dependencies using
pip install -r requirements.txt
.
- Ensure you have the necessary data (resumes) in the
data/resumes
directory. - Run the application using
streamlit run main.py
. - Enter a job description in the text area provided.
- Click the "Rank Resumes" button.
- The application will display the top 10 most relevant resumes based on the combined retrieval strategy.
- (Optional) The application will also display the top 10 reranked results if a reranking model is specified.
- The code utilizes pre-trained models for sentence embedding and reranking (if enabled). Download the desired models and update the configuration accordingly.
- The current implementation uses a basic text cleaning step. You can customize this process for more advanced text pre-processing based on your data characteristics.
- The application is designed to be a starting point for building a resume ranking system. You can further customize and extend it based on your specific needs.