-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve multi doc retrieval #11346
improve multi doc retrieval #11346
Conversation
@@ -57,7 +57,7 @@ def get_nodes_from_node(self, node: TextNode) -> List[BaseNode]: | |||
self.extract_table_summaries(table_elements) | |||
# convert into nodes | |||
# will return a list of Nodes and Index Nodes | |||
return self.get_nodes_from_elements(elements) | |||
return self.get_nodes_from_elements(elements, node.id_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So its not really a filename, but assumes the node id is file name?
Should we just instead inherit all metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.id_
is a filename if node parser is getting documents from local files.
for inherit all metadata
, do you mean we could inherit all the metadata from base nodes to the inherited nodes. but base nodes sometime has just empty metadata. The only useful information from base node is that id_ is filename. so that is the only information useful for inherited nodes.
sorry, maybe have a quick call to understand this better
llama-index-core/llama_index/core/node_parser/relational/base_element.py
Outdated
Show resolved
Hide resolved
* improve multi doc retrieval * cr * cr * cr * cr
* improve multi doc retrieval * cr * cr * cr * cr
* improve multi doc retrieval * cr * cr * cr * cr
Description
Index filename into each node's metadata (for node parser) so that we can filter or retrieve doc based on file name
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods