Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: Improving Performance #4673

Open
Vova-Ponomarenko1 opened this issue Jan 28, 2025 · 2 comments
Open

[Question]: Improving Performance #4673

Vova-Ponomarenko1 opened this issue Jan 28, 2025 · 2 comments
Labels
question Further information is requested

Comments

@Vova-Ponomarenko1
Copy link

Vova-Ponomarenko1 commented Jan 28, 2025

Hi, I noticed that during chunk search, RAG doesn’t fully utilize the CPU, which results in prolonged information retrieval.

As a suggestion, you could implement multithreading.

(This becomes noticeable when working with multiple databases containing large volumes of data or small chunk sizes.)

Thank you!

@Vova-Ponomarenko1 Vova-Ponomarenko1 added the question Further information is requested label Jan 28, 2025
@yingfeng
Copy link
Member

Hi, if you mean the retrieval performance, you could click the bulb icon above the response answer, and if the retrieval latency is too long, say several or even tens of thousands, it might be caused by reranker, and you could try to remove reranker because it does not help the ranking a lot. By default, the reranker run over CPUs, which is pretty slow.

If you mean the chunking performance, you could config more task executors in the entrypoint.sh to increase the parallelism.

Regarding to the multi threading during search, it's not necessary, because in most cases, the retrieval itself should be finished within single second. So you need to figure out the real component that take more time at first.

@Vova-Ponomarenko1
Copy link
Author

Hi, if you mean the retrieval performance, you could click the bulb icon above the response answer, and if the retrieval latency is too long, say several or even tens of thousands, it might be caused by reranker, and you could try to remove reranker because it does not help the ranking a lot. By default, the reranker run over CPUs, which is pretty slow.

If you mean the chunking performance, you could config more task executors in the entrypoint.sh to increase the parallelism.

Regarding to the multi threading during search, it's not necessary, because in most cases, the retrieval itself should be finished within single second. So you need to figure out the real component that take more time at first.

Well, I’m not using a reranker; I just processed the files, connected the database to the assistant, and asked a question, so the "Searching..." took almost 2 minutes.

P.S. The databases are really large; some files contain up to 45k chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants