-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After I retrieve the desired node using QueryFusionRetriever, how do I retrieve the previous and next nodes of the node? #14806
Comments
To retrieve the previous and next nodes of a node obtained using def retrieve_prev_next_nodes(node_with_score, num_nodes, docstore):
# Retrieve next nodes
next_nodes = get_forward_nodes(node_with_score, num_nodes, docstore)
# Retrieve previous nodes
prev_nodes = get_backward_nodes(node_with_score, num_nodes, docstore)
return prev_nodes, next_nodes
# Example usage
# Assuming you have a node_with_score object, num_nodes, and a docstore instance
prev_nodes, next_nodes = retrieve_prev_next_nodes(node_with_score, num_nodes, docstore) This code snippet directly calls the |
There's no better way. Because you are using mixed retrieval, there will be multiple docstore. Is there a way not to use docstore? |
No, there is no way to retrieve the previous and next nodes without using a docstore when using mixed retrieval with |
I see that BaseNode has prev_node and next_node, how to use them |
To retrieve the previous and next nodes of a node obtained using # Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node
# Retrieve the previous node
previous_node = current_node.prev_node
# Retrieve the next node
next_node = current_node.next_node
# Check if previous_node and next_node are not None before accessing their attributes
if previous_node is not None:
print(f"Previous Node ID: {previous_node.node_id}")
# Access other attributes of previous_node as needed
if next_node is not None:
print(f"Next Node ID: {next_node.node_id}")
# Access other attributes of next_node as needed This code snippet directly accesses the |
prev_node and next_node get the RelatedNodeInfo object. How do I get the body text of previous_node |
To get the body text of # Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node
# Retrieve the previous node info
previous_node_info = current_node.prev_node
# Function to get the body text of a node using its node_id
def get_node_body_text(node_id, docstore):
node = docstore.get_document(node_id)
if node:
return node.get_content()
else:
return "Node not found."
# Check if previous_node_info is not None before accessing its attributes
if previous_node_info is not None:
previous_node_body_text = get_node_body_text(previous_node_info.node_id, docstore)
print(f"Previous Node Body Text: {previous_node_body_text}")
else:
print("No previous node found.") This code snippet retrieves the |
Is there a better way to get the text of the previous node and the next node? |
To get the text of the previous node and the next node, you can use the following methods from the provided code:
Here is an example of how you can implement these methods: /**
* Traverse the DOM from node to node, starting at the node passed in.
* Pass true for the second parameter to indicate this node itself
* (and its kids) are going away, and we want the next node over.
*
* Calling this in a loop will traverse the DOM depth-first.
*/
_getNextNode: function (node, ignoreSelfAndKids) {
// First check for kids if those aren't being ignored
if (!ignoreSelfAndKids && node.firstElementChild) {
return node.firstElementChild;
}
// Then for siblings...
if (node.nextElementSibling) {
return node.nextElementSibling;
}
// And finally, move up the parent chain *and* find a sibling
// (because this is depth-first traversal, we will have already
// seen the parent nodes themselves).
do {
node = node.parentNode;
} while (node && !node.nextElementSibling);
return node && node.nextElementSibling;
},
/**
* Traverse the DOM from node to node, starting at the node passed in.
* Pass true for the second parameter to indicate this node itself
* (and its kids) are going away, and we want the previous node over.
*
* Calling this in a loop will traverse the DOM depth-first.
*/
_getPreviousNode: function (node, ignoreSelfAndKids) {
// First check for kids if those aren't being ignored
if (!ignoreSelfAndKids && node.lastElementChild) {
return node.lastElementChild;
}
// Then for siblings...
if (node.previousElementSibling) {
return node.previousElementSibling;
}
// And finally, move up the parent chain *and* find a sibling
// (because this is depth-first traversal, we will have already
// seen the parent nodes themselves).
do {
node = node.parentNode;
} while (node && !node.previousElementSibling);
return node && node.previousElementSibling;
},
/**
* Get the text content of the previous and next nodes.
* @param {Node} node The current node.
* @return {Object} An object containing the text of the previous and next nodes.
*/
getAdjacentNodeTexts: function (node) {
const previousNode = this._getPreviousNode(node, true);
const nextNode = this._getNextNode(node, true);
return {
previousText: previousNode ? previousNode.textContent : null,
nextText: nextNode ? nextNode.textContent : null,
};
} You can use the Alternatively, if you are working with HTML documents, you can use the class HTMLNodeParser(NodeParser):
"""HTML node parser.
Splits a document into Nodes using custom HTML splitting logic.
Args:
include_metadata (bool): whether to include metadata in nodes
include_prev_next_rel (bool): whether to include prev/next relationships
"""
tags: List[str] = Field(
default=DEFAULT_TAGS, description="HTML tags to extract text from."
)
@classmethod
def from_defaults(
cls,
include_metadata: bool = True,
include_prev_next_rel: bool = True,
callback_manager: Optional[CallbackManager] = None,
tags: Optional[List[str]] = DEFAULT_TAGS,
) -> "HTMLNodeParser":
callback_manager = callback_manager or CallbackManager([])
return cls(
include_metadata=include_metadata,
include_prev_next_rel=include_prev_next_rel,
callback_manager=callback_manager,
tags=tags,
)
@classmethod
def class_name(cls) -> str:
"""Get class name."""
return "HTMLNodeParser"
def _parse_nodes(
self,
nodes: Sequence[BaseNode],
show_progress: bool = False,
**kwargs: Any,
) -> List[BaseNode]:
all_nodes: List[BaseNode] = []
nodes_with_progress = get_tqdm_iterable(nodes, show_progress, "Parsing nodes")
for node in nodes_with_progress:
nodes = self.get_nodes_from_node(node)
all_nodes.extend(nodes)
return all_nodes
def get_nodes_from_node(self, node: BaseNode) -> List[TextNode]:
"""Get nodes from document."""
try:
from bs4 import BeautifulSoup
except ImportError:
raise ImportError("bs4 is required to read HTML files.")
text = node.get_content(metadata_mode=MetadataMode.NONE)
soup = BeautifulSoup(text, "html.parser")
html_nodes = []
last_tag = None
current_section = ""
tags = soup.find_all(self.tags)
for tag in tags:
tag_text = self._extract_text_from_tag(tag)
if tag.name == last_tag or last_tag is None:
last_tag = tag.name
current_section += f"{tag_text.strip()}\n"
else:
html_nodes.append(
self._build_node_from_split(
current_section.strip(), node, {"tag": last_tag}
)
)
last_tag = tag.name
current_section = f"{tag_text}\n"
if current_section:
html_nodes.append(
self._build_node_from_split(
current_section.strip(), node, {"tag": last_tag}
)
)
return html_nodes
def _extract_text_from_tag(self, tag: "Tag") -> str:
from bs4 import NavigableString
texts = []
for elem in tag.children:
if isinstance(elem, NavigableString):
if elem.strip():
texts.append(elem.strip())
elif elem.name in self.tags:
continue
else:
texts.append(elem.get_text().strip())
return "\n".join(texts)
def _build_node_from_split(
self,
text_split: str,
node: BaseNode,
metadata: dict,
) -> TextNode:
"""Build node from single text split."""
node = build_nodes_from_splits([text_split], node, id_func=self.id_func)[0]
if self.include_metadata:
node.metadata = {**node.metadata, **metadata}
return node This class can be used to parse HTML documents into nodes and include previous/next relationships, making it easier to navigate and process document nodes [2][3]. |
You know how to define a postprocessor in the case of RecursiveRetriever+QueryFusionRetriever to retrieve both the previous node and the next node. Without using the RetrieverQueryEngine. |
To define a postprocessor that retrieves both the previous node and the next node when using Here is an example of how you can define and use this postprocessor: from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
from llama_index.legacy.storage.docstore import SimpleDocumentStore
# Initialize your document store
docstore = SimpleDocumentStore()
# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")
# Assuming you have a list of nodes retrieved by RecursiveRetriever or QueryFusionRetriever
retrieved_nodes = [...] # Replace with your actual list of nodes
# Postprocess the nodes to get both previous and next nodes
processed_nodes = postprocessor._postprocess_nodes(retrieved_nodes)
# Extract the body text of these nodes
body_texts = [node.node.body for node in processed_nodes]
# Now `body_texts` contains the body text of the retrieved nodes, including both previous and next nodes In this example:
The |
Can you put RecursiveRetriever in the CustomRetriever and implement the postprocessor setting in the _retrieve(). |
To integrate from typing import List, Optional
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
class CustomRetriever(BaseRetriever):
def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None):
self._retriever = retriever
self._node_postprocessors = node_postprocessors or []
def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
for node_postprocessor in self._node_postprocessors:
nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
return nodes
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
nodes = self._retriever.retrieve(query_bundle)
return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
nodes = await self._retriever.aretrieve(query_bundle)
return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()
# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")
# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
index=index, # Assuming you have an index object
similarity_top_k=10,
)
# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor])
# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")
# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)
# Print the body text of the retrieved nodes
for node_with_score in nodes:
print(node_with_score.node.body) In this example:
|
Some problems have occurred, QueryFusionRetriever An error occurs when CustomRetriever is used. |
To resolve the Here is an example of how you can modify the from typing import List, Optional, Dict
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
class CustomRetriever(BaseRetriever):
def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None, object_map: Optional[Dict] = None):
super().__init__(object_map=object_map)
self._retriever = retriever
self._node_postprocessors = node_postprocessors or []
def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
for node_postprocessor in self._node_postprocessors:
nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
return nodes
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
nodes = self._retriever.retrieve(query_bundle)
return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
nodes = await self._retriever.aretrieve(query_bundle)
return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()
# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")
# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
index=index, # Assuming you have an index object
similarity_top_k=10,
)
# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor], object_map={})
# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")
# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)
# Print the body text of the retrieved nodes
for node_with_score in nodes:
print(node_with_score.node.body) This modification ensures that the |
How do I obtain the previous and subsequent nodes after using the QueryFusionRetriever search |
To obtain the previous and subsequent nodes after using the from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.schema import NodeWithScore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
# Initialize your document store
docstore = SimpleDocumentStore()
# Assuming `results` is the output from the QueryFusionRetriever search
results = retriever._reciprocal_rerank_fusion(search_results)
# Initialize the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")
# Postprocess the nodes to get previous and next nodes
processed_nodes = postprocessor._postprocess_nodes(results)
# Function to find previous and subsequent nodes
def get_prev_and_next_nodes(nodes, target_node):
prev_node = None
next_node = None
for i, node in enumerate(nodes):
if node == target_node:
if i > 0:
prev_node = nodes[i - 1]
if i < len(nodes) - 1:
next_node = nodes[i + 1]
break
return prev_node, next_node
# Example usage
target_node = results[2] # Assuming you are interested in the node at index 2
prev_node, next_node = get_prev_and_next_nodes(processed_nodes, target_node)
print("Previous Node:", prev_node)
print("Next Node:", next_node) This code snippet initializes the |
Question Validation
Question
When I use QueryFusionRetriever after searching want node, in the case of not using RetrieverQueryEngine + PrevNextNodePostprocessor, how to obtain the node of a node before and after a node?
The text was updated successfully, but these errors were encountered: