This is an assignment completed as part of onboarding to the SearchAssist team. Refer to this for full details of the requirements.
- Python: 3.9.7
- Flask: 3.0.3
- Celery: v5.4.0 (opalescent)
- ElasticSearch: 8.15.0
- RabbitMQ: 3.8.2
- MongoDB: 4.4.29 (shell)
- Inserting Json into ES: http://localhost:5000/insert
- Retrieve the status of a task: http://localhost:5000/get_status/<task_id>
- Delete a record: http://localhost:5000/delete?index=&field=&value=
- Return brief stats like how many unique genres, actors and directors are there in the entire dataset: http://localhost:5000/stats?index=movies
- Conduct an exact match search on any field: http://localhost:5000/exact_search?index=&field=&value=
- Insert Json + Embeddings into ES: http://localhost:5000/insert_emb
- Answer any question based on the ingested data: http://localhost:5000/vector_search?index=&size=&query=
Embedding model uses bge-m3 to embed and gpt-4o-mini-2024-07-18 for generation.
Install RabbitMQ, ElasticSearch, MongoDB.
Create a virtual environment and install from the requirements.txt file.
pip install -r requirements.txt
sudo systemctl enable rabbitmq-server
sudo systemctl start rabbitmq-server
./bin/elasticsearch
celery -A tasks worker --loglevel=info --concurrency=2
python receive.py
python run.py