Our project made answering ai using rag and solar llm. Thanks to powerful embedding of solar, our powerful rag extremely increased accuracy of our answers. Furthermore, we leveraged self-querying which provides high performance answering with meta-data.
Moreover, we tried deidentification of personal informations in order to prevent them from leaking to llm server.
python 3.11.2
https://github.com/LimePencil/Email-RAG/blob/main/requirements.txt
pip install -r requirements.txt
Locate your email json file to data/graph_rag/
directory
*Unfortunately, due to the privacy problem, we cannot provide our dataset.
Thus, there might exist some discrepencies from our test environments.
https://github.com/LimePencil/Email-RAG/blob/main/utils/db_upload.py
Upload embedding to elastic cloud
https://github.com/LimePencil/Email-RAG/blob/main/utils/indexing_deidentification.py
-
Upstage Document OCR
-
Upstage Layout Analyzer
-
Embedding, Solar embedding-1-Large
-
LLM, Solar-mini-chat
-
Groundedness Check, Solar-1-mini-groundedness-check
$ uvicorn main:app --reload