Table of Contents
QAnything
(Question and Answer based on Anything) is a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use.
With QAnything
, you can simply drop any locally stored file of any format and receive accurate, fast, and reliable answers.
Currently supported formats include: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),Email(eml),TXT(txt),Image(jpg๏ผjpeg๏ผpng),CSV(csv),Web links(html) and more formats coming soonโฆ
- Data Security, supports installation and usage with network cable unplugged throughout the process.
- Cross-language QA support, freely switch between Chinese and English QA, regardless of the language of the document.
- Supports massive data QA, two-stage retrieval ranking, solving the degradation problem of large-scale data retrieval; the more data, the better the performance.
- High-performance production-grade system, directly deployable for enterprise applications.
- User-friendly, no need for cumbersome configurations, one-click installation and deployment, ready to use.
- Multi knowledge base QA Support selecting multiple knowledge bases for Q&A
In scenarios with a large volume of knowledge base data, the advantages of a two-stage approach are very clear. If only a first-stage embedding retrieval is used, there will be a problem of retrieval degradation as the data volume increases, as indicated by the green line in the following graph. However, after the second-stage reranking, there can be a stable increase in accuracy, the more data, the better the performance.
QAnything uses the retrieval component BCEmbedding, which is distinguished for its bilingual and crosslingual proficiency. BCEmbedding excels in bridging Chinese and English linguistic gaps, which achieves
- A high performance on Semantic Representation Evaluations in MTEB;
- A new benchmark in the realm of RAG Evaluations in LlamaIndex.
Model | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | Avg |
---|---|---|---|---|---|---|---|
bge-base-en-v1.5 | 37.14 | 55.06 | 75.45 | 59.73 | 43.05 | 37.74 | 47.20 |
bge-base-zh-v1.5 | 47.60 | 63.72 | 77.40 | 63.38 | 54.85 | 32.56 | 53.60 |
bge-large-en-v1.5 | 37.15 | 54.09 | 75.00 | 59.24 | 42.68 | 37.32 | 46.82 |
bge-large-zh-v1.5 | 47.54 | 64.73 | 79.14 | 64.19 | 55.88 | 33.26 | 54.21 |
jina-embeddings-v2-base-en | 31.58 | 54.28 | 74.84 | 58.42 | 41.16 | 34.67 | 44.29 |
m3e-base | 46.29 | 63.93 | 71.84 | 64.08 | 52.38 | 37.84 | 53.54 |
m3e-large | 34.85 | 59.74 | 67.69 | 60.07 | 48.99 | 31.62 | 46.78 |
bce-embedding-base_v1 | 57.60 | 65.73 | 74.96 | 69.00 | 57.29 | 38.95 | 59.43 |
- More evaluation details please check Embedding Models Evaluation Summaryใ
Model | Reranking | Avg |
---|---|---|
bge-reranker-base | 57.78 | 57.78 |
bge-reranker-large | 59.69 | 59.69 |
bce-reranker-base_v1 | 60.06 | 60.06 |
- More evaluation details please check Reranker Models Evaluation Summary
NOTE:
- In
WithoutReranker
setting, ourbce-embedding-base_v1
outperforms all the other embedding models. - With fixing the embedding model, our
bce-reranker-base_v1
achieves the best performance. - The combination of
bce-embedding-base_v1
andbce-reranker-base_v1
is SOTA. - If you want to use embedding and rerank separately, please refer to BCEmbedding
The open source version of QAnything is based on QwenLM and has been fine-tuned on a large number of professional question-answering datasets. It greatly enhances the ability of question-answering. If you need to use it for commercial purposes, please follow the license of QwenLM. For more details, please refer to: QwenLM
Star us on GitHub, and be instantly notified for new release!
- ๐ Try QAnything Online
- ๐ Try read.youdao.com | ๆ้้่ฏป
- ๐ ๏ธ Only use our BCEmbedding(embedding & rerank)
- ๐ FAQ
- 2024-01-29: Support for custom large models, including OpenAI API and other open-source large models, with a minimum GPU requirement of GTX 1050Ti, greatly improving deployment, debugging, and user experience. - See More๐ v1.2.0
- 2024-01-23: Enable rerank by default and fix various issues when starting on Windows. - See More๐ v1.1.1
- 2024-01-18: Support one-click startup, support Windows deployment, improve PDF, XLSX, HTML parsing efficiency. - See More๐ v1.1.0
System | Required item | Minimum Requirement | Note |
---|---|---|---|
Linux | NVIDIA GPU Memory | >= 4GB (use OpenAI API) | Minimum: GTX 1050Ti(use OpenAI API) Recommended: RTX 3090 |
NVIDIA Driver Version | >= 525.105.17 | ||
Docker version | >= 20.10.5 | Docker install | |
docker compose version | >= 2.23.3 | docker compose install | |
git-lfs | git-lfs install |
System | Required item | Minimum Requirement | Note |
---|---|---|---|
Windows with WSL Ubuntu Subsystem | NVIDIA GPU Memory | >= 4GB (use OpenAI API) | Minimum: GTX 1050Ti(use OpenAI API) Recommended: RTX 3090 |
GEFORCE EXPERIENCE | >= 546.33 | GEFORCE EXPERIENCE download | |
Docker Desktop | >= 4.26.1๏ผ131620๏ผ | Docker Desktop for Windows | |
git-lfs | git-lfs install |
git clone https://github.com/netease-youdao/QAnything.git
- ๐ QAnything_Startup_Usage
- Get detailed usage of LLM interface by
bash ./run.sh -h
cd QAnything
bash run.sh # Start on GPU 0 by default.
(Note) If automatic download fails, you can manually download the model from one of the three addresses below.
modelscope: https://modelscope.cn/models/netease-youdao/QAnything
wisemodel: https://wisemodel.cn/models/Netease_Youdao/qanything
huggingfase: https://huggingface.co/netease-youdao/QAnything
(Optional) Specify GPU startup
cd QAnything
bash ./run.sh -c local -i 0 -b default # gpu id 0
(Optional) Specify GPU startup - Recommended for Windows10/Windows11 WSL2 User
# For Windows OS: Need to enter the **WSL2** environment.
# Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
# (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
# (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
cd QAnything/assets/custom_models
git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
# Step 2. Execute the service startup command. Here we use "-b hf" to specify the Huggingface transformers backend.
cd ../../
bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything
(Optional) Specify GPU startup - Recommended for GPU Compute Capability >= 8.6 and VRAM >= 24GB
# GPU Compute Capability: https://developer.nvidia.com/cuda-gpus
# Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
# (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
# (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
cd QAnything/assets/custom_models
git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
# Step 2. Execute the service startup command. Here we use "-b vllm" to specify the vllm backend.
cd ../../
bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85
(Optional) Specify multi-GPU startup
cd QAnything
bash ./run.sh -c local -i 0,1 -b default # gpu ids: 0,1, Please confirm how many GPUs are available. Supports up to two cards for startup.
After successful installation, you can experience the application by entering the following addresses in your web browser.
- Front end address: http://
your_host
:5052/qanything/
If you want to visit API, please refer to the following address:
- API address: http://
your_host
:8777/api/ - For detailed API documentation, please refer to QAnything API documentation
If you want to view the relevant logs, please check the log files in the QAnything/logs/debug_logs
directory.
- debug.log
- User request processing log
- sanic_api.log
- Backend service running log
- llm_embed_rerank_tritonserver.log(Single card deployment)
- LLM embedding and rerank tritonserver service startup log
- llm_tritonserver.log(Multi-card deployment)
- LLM tritonserver service startup log
- embed_rerank_tritonserver.log(Multi-card deployment or use of the OpenAI interface.)
- Embedding and rerank tritonserver service startup log
- rerank_server.log
- Rerank service running log
- ocr_server.log
- OCR service running log
- npm_server.log
- Front-end service running log
- llm_server_entrypoint.log
- LLM intermediate server running log
- fastchat_logs/*.log
- FastChat service running log
If you are in the Windows11 system: Need to enter the WSL environment.
bash close.sh
multi_paper_qa.mp4
information_extraction.mp4
various_files_qa.mp4
web_qa.mp4
If you need to access the API, please refer to the QAnything API documentation.
We appreciate your interest in contributing to our project. Whether you're fixing a bug, improving an existing feature, or adding something completely new, your contributions are welcome!
Welcome to the QAnything Discord community
Welcome to scan the QR code below and join the WeChat group.
If you need to contact our team privately, please reach out to us via the following email:
Reach out to the maintainer at one of the following places:
- Github issues
- Contact options listed on this GitHub profile
QAnything
is licensed under Apache 2.0 License
QAnything
adopts dependencies from the following:
- Thanks to our BCEmbedding for the excellent embedding and rerank model.
- Thanks to Qwen for strong base language models.
- Thanks to Triton Inference Server for providing great open source inference serving.
- Thanks to FastChat for providing a fully OpenAI-compatible API server.
- Thanks to FasterTransformer and vllm for highly optimized LLM inference backend.
- Thanks to Langchain for the wonderful llm application framework.
- Thanks to Langchain-Chatchat for the inspiration provided on local knowledge base Q&A.
- Thanks to Milvus for the excellent semantic search library.
- Thanks to PaddleOCR for its ease-to-use OCR library.
- Thanks to Sanic for the powerful web service framework.