-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Document Retrieval control #115
Comments
Thanks @Jayaprakash8887. I read through the thread on LangChain that you have linked. I also looked at LangChain's documentation. We have hybrid search enabled like shown in this example below -- relevant source code of the retriever service
Here is a reference to the indexer code I looked at LangChain's source code as well for indexing and retrieving. We are at the moment using LangChain defaults. I am unclear what exactly you want to change as per your point 1 -- Regarding point 2 Regarding point 3 |
Explaining a use-case using MarqoDB as Vector store as I am yet to try document retrieval using PGVector. When I am asking a query "How can I make my child aware of good touch and bad touch", Marqo is fetching me nearest documents containing information about "touch". When the limit (top_k_docs) is 5, you can see from the output below that the document which actually speaks about good/safe touch and bad/unsafe touch does not get retrieved at all. When the limit (top_k_docs) is 20, the document which actually speaks about good/safe touch and bad/unsafe touch gets retrieved though might not be the top similarity score matching document. At present, Retriever code is performing top_k_docs fetch using similarity But not retrieving the score of these documents. If the score can also be retrieved, then, as you mentioned I can implement filtering of documents based on similarity score be done in FSM file. ============================================================================================ Payload 1: Output of Payload 1: ===================================================== Payload 2: Output of Payload 2: |
@KaranrajM, As discussed now, please could you work on this. |
@KaranrajM to work on this. Or Arun/Karan to work internally and check if new joiners can take this up. |
@KaranrajM and @DevvStrange , any update on this please? |
@KaranrajM and @DevvStrange , any update on this please? |
Is your feature request related to a problem? Please describe.
As mentioned in the PGVector documentation, nearest documents are fetched for a given query. However, fetched documents may not contain the intended documents where the necessary information may exist, if these documents are not the nearest.
Describe the solution you'd like
Suggesting this as we have faced the issue of not being able to get the intended document from the VectorDB. We followed above approach to increase accuracy of intended documents fetching.
Note: langchain-ai/langchain#13437
Additional context
No response
The text was updated successfully, but these errors were encountered: