You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, if you mean the retrieval performance, you could click the bulb icon above the response answer, and if the retrieval latency is too long, say several or even tens of thousands, it might be caused by reranker, and you could try to remove reranker because it does not help the ranking a lot. By default, the reranker run over CPUs, which is pretty slow.
If you mean the chunking performance, you could config more task executors in the entrypoint.sh to increase the parallelism.
Regarding to the multi threading during search, it's not necessary, because in most cases, the retrieval itself should be finished within single second. So you need to figure out the real component that take more time at first.
Hi, if you mean the retrieval performance, you could click the bulb icon above the response answer, and if the retrieval latency is too long, say several or even tens of thousands, it might be caused by reranker, and you could try to remove reranker because it does not help the ranking a lot. By default, the reranker run over CPUs, which is pretty slow.
If you mean the chunking performance, you could config more task executors in the entrypoint.sh to increase the parallelism.
Regarding to the multi threading during search, it's not necessary, because in most cases, the retrieval itself should be finished within single second. So you need to figure out the real component that take more time at first.
Well, I’m not using a reranker; I just processed the files, connected the database to the assistant, and asked a question, so the "Searching..." took almost 2 minutes.
P.S. The databases are really large; some files contain up to 45k chunks.
Hi, I noticed that during chunk search, RAG doesn’t fully utilize the CPU, which results in prolonged information retrieval.
As a suggestion, you could implement multithreading.
(This becomes noticeable when working with multiple databases containing large volumes of data or small chunk sizes.)
Thank you!
The text was updated successfully, but these errors were encountered: