Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After I retrieve the desired node using QueryFusionRetriever, how do I retrieve the previous and next nodes of the node? #14806

Open
1 task done
hingkan opened this issue Jul 18, 2024 · 17 comments
Labels
question Further information is requested

Comments

@hingkan
Copy link

hingkan commented Jul 18, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

When I use QueryFusionRetriever after searching want node, in the case of not using RetrieverQueryEngine + PrevNextNodePostprocessor, how to obtain the node of a node before and after a node?

@hingkan hingkan added the question Further information is requested label Jul 18, 2024
Copy link

dosubot bot commented Jul 18, 2024

To retrieve the previous and next nodes of a node obtained using QueryFusionRetriever without using RetrieverQueryEngine and PrevNextNodePostprocessor, you can directly use the get_forward_nodes and get_backward_nodes functions. Here is how you can do it:

def retrieve_prev_next_nodes(node_with_score, num_nodes, docstore):
    # Retrieve next nodes
    next_nodes = get_forward_nodes(node_with_score, num_nodes, docstore)
    
    # Retrieve previous nodes
    prev_nodes = get_backward_nodes(node_with_score, num_nodes, docstore)
    
    return prev_nodes, next_nodes

# Example usage
# Assuming you have a node_with_score object, num_nodes, and a docstore instance
prev_nodes, next_nodes = retrieve_prev_next_nodes(node_with_score, num_nodes, docstore)

This code snippet directly calls the get_forward_nodes and get_backward_nodes functions to retrieve the previous and next nodes, bypassing the need for RetrieverQueryEngine and PrevNextNodePostprocessor [1].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

To retrieve the previous and next nodes of a node obtained using QueryFusionRetriever without using RetrieverQueryEngine and PrevNextNodePostprocessor, you can directly use the get_forward_nodes and get_backward_nodes functions. Here is how you can do it:

def retrieve_prev_next_nodes(node_with_score, num_nodes, docstore):
    # Retrieve next nodes
    next_nodes = get_forward_nodes(node_with_score, num_nodes, docstore)
    
    # Retrieve previous nodes
    prev_nodes = get_backward_nodes(node_with_score, num_nodes, docstore)
    
    return prev_nodes, next_nodes

# Example usage
# Assuming you have a node_with_score object, num_nodes, and a docstore instance
prev_nodes, next_nodes = retrieve_prev_next_nodes(node_with_score, num_nodes, docstore)

This code snippet directly calls the get_forward_nodes and get_backward_nodes functions to retrieve the previous and next nodes, bypassing the need for RetrieverQueryEngine and PrevNextNodePostprocessor [1].

To continue talking to Dosu, mention @dosu.

There's no better way. Because you are using mixed retrieval, there will be multiple docstore. Is there a way not to use docstore?

Copy link

dosubot bot commented Jul 19, 2024

No, there is no way to retrieve the previous and next nodes without using a docstore when using mixed retrieval with QueryFusionRetriever. The PrevNextNodePostprocessor and AutoPrevNextNodePostprocessor classes both rely on a BaseDocumentStore to fetch additional nodes based on their relationships [1].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

No, there is no way to retrieve the previous and next nodes without using a docstore when using mixed retrieval with . The and classes both rely on a to fetch additional nodes based on their relationships

I see that BaseNode has prev_node and next_node, how to use them

Copy link

dosubot bot commented Jul 19, 2024

To retrieve the previous and next nodes of a node obtained using QueryFusionRetriever without relying on a docstore, you can use the prev_node and next_node attributes directly from the BaseNode class. Here is an example of how you can achieve this:

# Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node

# Retrieve the previous node
previous_node = current_node.prev_node

# Retrieve the next node
next_node = current_node.next_node

# Check if previous_node and next_node are not None before accessing their attributes
if previous_node is not None:
    print(f"Previous Node ID: {previous_node.node_id}")
    # Access other attributes of previous_node as needed

if next_node is not None:
    print(f"Next Node ID: {next_node.node_id}")
    # Access other attributes of next_node as needed

This code snippet directly accesses the prev_node and next_node attributes of the current_node to retrieve the previous and next nodes, respectively [1].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

To retrieve the previous and next nodes of a node obtained using without relying on a , you can use the and attributes directly from the class. Here is an example of how you can achieve this:QueryFusionRetriever``docstore``prev_node``next_node``BaseNode

# Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node

# Retrieve the previous node
previous_node = current_node.prev_node

# Retrieve the next node
next_node = current_node.next_node

# Check if previous_node and next_node are not None before accessing their attributes
if previous_node is not None:
    print(f"Previous Node ID: {previous_node.node_id}")
    # Access other attributes of previous_node as needed

if next_node is not None:
    print(f"Next Node ID: {next_node.node_id}")
    # Access other attributes of next_node as needed

This code snippet directly accesses the and attributes of the to retrieve the previous and next nodes, respectively prev_node``next_node``current_node[1].

To continue talking to Dosu, mention @dosu.

prev_node and next_node get the RelatedNodeInfo object. How do I get the body text of previous_node

Copy link

dosubot bot commented Jul 19, 2024

To get the body text of previous_node (which is a RelatedNodeInfo object), you need to use the node_id from RelatedNodeInfo to fetch the actual node from the document store and then retrieve its content. Here is how you can do it:

# Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node

# Retrieve the previous node info
previous_node_info = current_node.prev_node

# Function to get the body text of a node using its node_id
def get_node_body_text(node_id, docstore):
    node = docstore.get_document(node_id)
    if node:
        return node.get_content()
    else:
        return "Node not found."

# Check if previous_node_info is not None before accessing its attributes
if previous_node_info is not None:
    previous_node_body_text = get_node_body_text(previous_node_info.node_id, docstore)
    print(f"Previous Node Body Text: {previous_node_body_text}")
else:
    print("No previous node found.")

This code snippet retrieves the previous_node_info from the current_node, uses its node_id to fetch the actual node from the docstore, and then retrieves the body text of that node [1][2].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

To get the body text of previous_node (which is a RelatedNodeInfo object), you need to use the node_id from RelatedNodeInfo to fetch the actual node from the document store and then retrieve its content. Here is how you can do it:

# Assuming you have a node object from QueryFusionRetriever
current_node = node_with_score.node

# Retrieve the previous node info
previous_node_info = current_node.prev_node

# Function to get the body text of a node using its node_id
def get_node_body_text(node_id, docstore):
    node = docstore.get_document(node_id)
    if node:
        return node.get_content()
    else:
        return "Node not found."

# Check if previous_node_info is not None before accessing its attributes
if previous_node_info is not None:
    previous_node_body_text = get_node_body_text(previous_node_info.node_id, docstore)
    print(f"Previous Node Body Text: {previous_node_body_text}")
else:
    print("No previous node found.")

This code snippet retrieves the previous_node_info from the current_node, uses its node_id to fetch the actual node from the docstore, and then retrieves the body text of that node [1][2].

To continue talking to Dosu, mention @dosu.

Is there a better way to get the text of the previous node and the next node?

Copy link

dosubot bot commented Jul 19, 2024

To get the text of the previous node and the next node, you can use the following methods from the provided code:

  1. _getNextNode: This method is used to traverse the DOM and get the next node.
  2. _getPreviousNode: This method is not explicitly defined in the provided code, but you can create a similar method to traverse the DOM and get the previous node.

Here is an example of how you can implement these methods:

/**
 * Traverse the DOM from node to node, starting at the node passed in.
 * Pass true for the second parameter to indicate this node itself
 * (and its kids) are going away, and we want the next node over.
 *
 * Calling this in a loop will traverse the DOM depth-first.
 */
_getNextNode: function (node, ignoreSelfAndKids) {
  // First check for kids if those aren't being ignored
  if (!ignoreSelfAndKids && node.firstElementChild) {
    return node.firstElementChild;
  }
  // Then for siblings...
  if (node.nextElementSibling) {
    return node.nextElementSibling;
  }
  // And finally, move up the parent chain *and* find a sibling
  // (because this is depth-first traversal, we will have already
  // seen the parent nodes themselves).
  do {
    node = node.parentNode;
  } while (node && !node.nextElementSibling);
  return node && node.nextElementSibling;
},

/**
 * Traverse the DOM from node to node, starting at the node passed in.
 * Pass true for the second parameter to indicate this node itself
 * (and its kids) are going away, and we want the previous node over.
 *
 * Calling this in a loop will traverse the DOM depth-first.
 */
_getPreviousNode: function (node, ignoreSelfAndKids) {
  // First check for kids if those aren't being ignored
  if (!ignoreSelfAndKids && node.lastElementChild) {
    return node.lastElementChild;
  }
  // Then for siblings...
  if (node.previousElementSibling) {
    return node.previousElementSibling;
  }
  // And finally, move up the parent chain *and* find a sibling
  // (because this is depth-first traversal, we will have already
  // seen the parent nodes themselves).
  do {
    node = node.parentNode;
  } while (node && !node.previousElementSibling);
  return node && node.previousElementSibling;
},

/**
 * Get the text content of the previous and next nodes.
 * @param {Node} node The current node.
 * @return {Object} An object containing the text of the previous and next nodes.
 */
getAdjacentNodeTexts: function (node) {
  const previousNode = this._getPreviousNode(node, true);
  const nextNode = this._getNextNode(node, true);
  return {
    previousText: previousNode ? previousNode.textContent : null,
    nextText: nextNode ? nextNode.textContent : null,
  };
}

You can use the getAdjacentNodeTexts method to get the text content of the previous and next nodes. This method uses the _getPreviousNode and _getNextNode methods to traverse the DOM and find the adjacent nodes [1].

Alternatively, if you are working with HTML documents, you can use the HTMLNodeParser class from the LlamaIndex framework to split a document into nodes and include previous/next relationships. Here is an example of how you can use it:

class HTMLNodeParser(NodeParser):
    """HTML node parser.

    Splits a document into Nodes using custom HTML splitting logic.

    Args:
        include_metadata (bool): whether to include metadata in nodes
        include_prev_next_rel (bool): whether to include prev/next relationships

    """

    tags: List[str] = Field(
        default=DEFAULT_TAGS, description="HTML tags to extract text from."
    )

    @classmethod
    def from_defaults(
        cls,
        include_metadata: bool = True,
        include_prev_next_rel: bool = True,
        callback_manager: Optional[CallbackManager] = None,
        tags: Optional[List[str]] = DEFAULT_TAGS,
    ) -> "HTMLNodeParser":
        callback_manager = callback_manager or CallbackManager([])

        return cls(
            include_metadata=include_metadata,
            include_prev_next_rel=include_prev_next_rel,
            callback_manager=callback_manager,
            tags=tags,
        )

    @classmethod
    def class_name(cls) -> str:
        """Get class name."""
        return "HTMLNodeParser"

    def _parse_nodes(
        self,
        nodes: Sequence[BaseNode],
        show_progress: bool = False,
        **kwargs: Any,
    ) -> List[BaseNode]:
        all_nodes: List[BaseNode] = []
        nodes_with_progress = get_tqdm_iterable(nodes, show_progress, "Parsing nodes")

        for node in nodes_with_progress:
            nodes = self.get_nodes_from_node(node)
            all_nodes.extend(nodes)

        return all_nodes

    def get_nodes_from_node(self, node: BaseNode) -> List[TextNode]:
        """Get nodes from document."""
        try:
            from bs4 import BeautifulSoup
        except ImportError:
            raise ImportError("bs4 is required to read HTML files.")

        text = node.get_content(metadata_mode=MetadataMode.NONE)
        soup = BeautifulSoup(text, "html.parser")
        html_nodes = []
        last_tag = None
        current_section = ""

        tags = soup.find_all(self.tags)
        for tag in tags:
            tag_text = self._extract_text_from_tag(tag)
            if tag.name == last_tag or last_tag is None:
                last_tag = tag.name
                current_section += f"{tag_text.strip()}\n"
            else:
                html_nodes.append(
                    self._build_node_from_split(
                        current_section.strip(), node, {"tag": last_tag}
                    )
                )
                last_tag = tag.name
                current_section = f"{tag_text}\n"

        if current_section:
            html_nodes.append(
                self._build_node_from_split(
                    current_section.strip(), node, {"tag": last_tag}
                )
            )

        return html_nodes

    def _extract_text_from_tag(self, tag: "Tag") -> str:
        from bs4 import NavigableString

        texts = []
        for elem in tag.children:
            if isinstance(elem, NavigableString):
                if elem.strip():
                    texts.append(elem.strip())
            elif elem.name in self.tags:
                continue
            else:
                texts.append(elem.get_text().strip())
        return "\n".join(texts)

    def _build_node_from_split(
        self,
        text_split: str,
        node: BaseNode,
        metadata: dict,
    ) -> TextNode:
        """Build node from single text split."""
        node = build_nodes_from_splits([text_split], node, id_func=self.id_func)[0]

        if self.include_metadata:
            node.metadata = {**node.metadata, **metadata}

        return node

This class can be used to parse HTML documents into nodes and include previous/next relationships, making it easier to navigate and process document nodes [2][3].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

To get the text of the previous node and the next node, you can use the following methods from the provided code:

  1. _getNextNode: This method is used to traverse the DOM and get the next node.
  2. _getPreviousNode: This method is not explicitly defined in the provided code, but you can create a similar method to traverse the DOM and get the previous node.

Here is an example of how you can implement these methods:

/**
 * Traverse the DOM from node to node, starting at the node passed in.
 * Pass true for the second parameter to indicate this node itself
 * (and its kids) are going away, and we want the next node over.
 *
 * Calling this in a loop will traverse the DOM depth-first.
 */
_getNextNode: function (node, ignoreSelfAndKids) {
  // First check for kids if those aren't being ignored
  if (!ignoreSelfAndKids && node.firstElementChild) {
    return node.firstElementChild;
  }
  // Then for siblings...
  if (node.nextElementSibling) {
    return node.nextElementSibling;
  }
  // And finally, move up the parent chain *and* find a sibling
  // (because this is depth-first traversal, we will have already
  // seen the parent nodes themselves).
  do {
    node = node.parentNode;
  } while (node && !node.nextElementSibling);
  return node && node.nextElementSibling;
},

/**
 * Traverse the DOM from node to node, starting at the node passed in.
 * Pass true for the second parameter to indicate this node itself
 * (and its kids) are going away, and we want the previous node over.
 *
 * Calling this in a loop will traverse the DOM depth-first.
 */
_getPreviousNode: function (node, ignoreSelfAndKids) {
  // First check for kids if those aren't being ignored
  if (!ignoreSelfAndKids && node.lastElementChild) {
    return node.lastElementChild;
  }
  // Then for siblings...
  if (node.previousElementSibling) {
    return node.previousElementSibling;
  }
  // And finally, move up the parent chain *and* find a sibling
  // (because this is depth-first traversal, we will have already
  // seen the parent nodes themselves).
  do {
    node = node.parentNode;
  } while (node && !node.previousElementSibling);
  return node && node.previousElementSibling;
},

/**
 * Get the text content of the previous and next nodes.
 * @param {Node} node The current node.
 * @return {Object} An object containing the text of the previous and next nodes.
 */
getAdjacentNodeTexts: function (node) {
  const previousNode = this._getPreviousNode(node, true);
  const nextNode = this._getNextNode(node, true);
  return {
    previousText: previousNode ? previousNode.textContent : null,
    nextText: nextNode ? nextNode.textContent : null,
  };
}

You can use the getAdjacentNodeTexts method to get the text content of the previous and next nodes. This method uses the _getPreviousNode and _getNextNode methods to traverse the DOM and find the adjacent nodes [1].

Alternatively, if you are working with HTML documents, you can use the HTMLNodeParser class from the LlamaIndex framework to split a document into nodes and include previous/next relationships. Here is an example of how you can use it:

class HTMLNodeParser(NodeParser):
    """HTML node parser.

    Splits a document into Nodes using custom HTML splitting logic.

    Args:
        include_metadata (bool): whether to include metadata in nodes
        include_prev_next_rel (bool): whether to include prev/next relationships

    """

    tags: List[str] = Field(
        default=DEFAULT_TAGS, description="HTML tags to extract text from."
    )

    @classmethod
    def from_defaults(
        cls,
        include_metadata: bool = True,
        include_prev_next_rel: bool = True,
        callback_manager: Optional[CallbackManager] = None,
        tags: Optional[List[str]] = DEFAULT_TAGS,
    ) -> "HTMLNodeParser":
        callback_manager = callback_manager or CallbackManager([])

        return cls(
            include_metadata=include_metadata,
            include_prev_next_rel=include_prev_next_rel,
            callback_manager=callback_manager,
            tags=tags,
        )

    @classmethod
    def class_name(cls) -> str:
        """Get class name."""
        return "HTMLNodeParser"

    def _parse_nodes(
        self,
        nodes: Sequence[BaseNode],
        show_progress: bool = False,
        **kwargs: Any,
    ) -> List[BaseNode]:
        all_nodes: List[BaseNode] = []
        nodes_with_progress = get_tqdm_iterable(nodes, show_progress, "Parsing nodes")

        for node in nodes_with_progress:
            nodes = self.get_nodes_from_node(node)
            all_nodes.extend(nodes)

        return all_nodes

    def get_nodes_from_node(self, node: BaseNode) -> List[TextNode]:
        """Get nodes from document."""
        try:
            from bs4 import BeautifulSoup
        except ImportError:
            raise ImportError("bs4 is required to read HTML files.")

        text = node.get_content(metadata_mode=MetadataMode.NONE)
        soup = BeautifulSoup(text, "html.parser")
        html_nodes = []
        last_tag = None
        current_section = ""

        tags = soup.find_all(self.tags)
        for tag in tags:
            tag_text = self._extract_text_from_tag(tag)
            if tag.name == last_tag or last_tag is None:
                last_tag = tag.name
                current_section += f"{tag_text.strip()}\n"
            else:
                html_nodes.append(
                    self._build_node_from_split(
                        current_section.strip(), node, {"tag": last_tag}
                    )
                )
                last_tag = tag.name
                current_section = f"{tag_text}\n"

        if current_section:
            html_nodes.append(
                self._build_node_from_split(
                    current_section.strip(), node, {"tag": last_tag}
                )
            )

        return html_nodes

    def _extract_text_from_tag(self, tag: "Tag") -> str:
        from bs4 import NavigableString

        texts = []
        for elem in tag.children:
            if isinstance(elem, NavigableString):
                if elem.strip():
                    texts.append(elem.strip())
            elif elem.name in self.tags:
                continue
            else:
                texts.append(elem.get_text().strip())
        return "\n".join(texts)

    def _build_node_from_split(
        self,
        text_split: str,
        node: BaseNode,
        metadata: dict,
    ) -> TextNode:
        """Build node from single text split."""
        node = build_nodes_from_splits([text_split], node, id_func=self.id_func)[0]

        if self.include_metadata:
            node.metadata = {**node.metadata, **metadata}

        return node

This class can be used to parse HTML documents into nodes and include previous/next relationships, making it easier to navigate and process document nodes [2][3].

To continue talking to Dosu, mention @dosu.

You know how to define a postprocessor in the case of RecursiveRetriever+QueryFusionRetriever to retrieve both the previous node and the next node. Without using the RetrieverQueryEngine.

Copy link

dosubot bot commented Jul 19, 2024

To define a postprocessor that retrieves both the previous node and the next node when using RecursiveRetriever and QueryFusionRetriever, you can use the PrevNextNodePostprocessor class with the mode set to "both". This class is designed to fetch additional nodes based on their relationships in the document store.

Here is an example of how you can define and use this postprocessor:

from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
from llama_index.legacy.storage.docstore import SimpleDocumentStore

# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Assuming you have a list of nodes retrieved by RecursiveRetriever or QueryFusionRetriever
retrieved_nodes = [...]  # Replace with your actual list of nodes

# Postprocess the nodes to get both previous and next nodes
processed_nodes = postprocessor._postprocess_nodes(retrieved_nodes)

# Extract the body text of these nodes
body_texts = [node.node.body for node in processed_nodes]

# Now `body_texts` contains the body text of the retrieved nodes, including both previous and next nodes

In this example:

  • SimpleDocumentStore is used as the document store implementation.
  • retrieved_nodes should be the list of nodes you have retrieved using RecursiveRetriever or QueryFusionRetriever.

The PrevNextNodePostprocessor will handle fetching the previous and next nodes based on the relationships defined in the document store, and the body_texts list will contain the body text of these nodes [1][2][3][4].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 19, 2024

To define a postprocessor that retrieves both the previous node and the next node when using and , you can use the class with the mode set to "both". This class is designed to fetch additional nodes based on their relationships in the document store.RecursiveRetriever``QueryFusionRetriever``PrevNextNodePostprocessor

Here is an example of how you can define and use this postprocessor:

from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor
from llama_index.legacy.storage.docstore import SimpleDocumentStore

# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Assuming you have a list of nodes retrieved by RecursiveRetriever or QueryFusionRetriever
retrieved_nodes = [...]  # Replace with your actual list of nodes

# Postprocess the nodes to get both previous and next nodes
processed_nodes = postprocessor._postprocess_nodes(retrieved_nodes)

# Extract the body text of these nodes
body_texts = [node.node.body for node in processed_nodes]

# Now `body_texts` contains the body text of the retrieved nodes, including both previous and next nodes

In this example:

  • SimpleDocumentStore is used as the document store implementation.
  • retrieved_nodes should be the list of nodes you have retrieved using or .RecursiveRetriever``QueryFusionRetriever

The will handle fetching the previous and next nodes based on the relationships defined in the document store, and the list will contain the body text of these nodes PrevNextNodePostprocessor``body_texts[1][2][3][4].

To continue talking to Dosu, mention @dosu.

Can you put RecursiveRetriever in the CustomRetriever and implement the postprocessor setting in the _retrieve().

Copy link

dosubot bot commented Jul 19, 2024

To integrate RecursiveRetriever into the CustomRetriever and implement the postprocessor setting in the _retrieve method, you can follow this example:

from typing import List, Optional
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor

class CustomRetriever(BaseRetriever):
    def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None):
        self._retriever = retriever
        self._node_postprocessors = node_postprocessors or []

    def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
        for node_postprocessor in self._node_postprocessors:
            nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
        return nodes

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = self._retriever.retrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = await self._retriever.aretrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
    index=index,  # Assuming you have an index object
    similarity_top_k=10,
)

# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor])

# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")

# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)

# Print the body text of the retrieved nodes
for node_with_score in nodes:
    print(node_with_score.node.body)

In this example:

  • CustomRetriever wraps around RecursiveRetriever and applies node postprocessors to the retrieved nodes.
  • The _apply_node_postprocessors method applies each postprocessor in the list to the nodes.
  • The _retrieve and _aretrieve methods call the corresponding methods on the underlying retriever and then apply the postprocessors to the results.
  • PrevNextNodePostprocessor is used to fetch both previous and next nodes based on their relationships in the document store [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Jul 20, 2024

To integrate RecursiveRetriever into the CustomRetriever and implement the postprocessor setting in the _retrieve method, you can follow this example:

from typing import List, Optional
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor

class CustomRetriever(BaseRetriever):
    def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None):
        self._retriever = retriever
        self._node_postprocessors = node_postprocessors or []

    def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
        for node_postprocessor in self._node_postprocessors:
            nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
        return nodes

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = self._retriever.retrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = await self._retriever.aretrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
    index=index,  # Assuming you have an index object
    similarity_top_k=10,
)

# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor])

# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")

# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)

# Print the body text of the retrieved nodes
for node_with_score in nodes:
    print(node_with_score.node.body)

In this example:

  • CustomRetriever wraps around RecursiveRetriever and applies node postprocessors to the retrieved nodes.
  • The _apply_node_postprocessors method applies each postprocessor in the list to the nodes.
  • The _retrieve and _aretrieve methods call the corresponding methods on the underlying retriever and then apply the postprocessors to the results.
  • PrevNextNodePostprocessor is used to fetch both previous and next nodes based on their relationships in the document store [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

Some problems have occurred, QueryFusionRetriever An error occurs when CustomRetriever is used.
ERROR:AttributeError: Error running coroutine: 'CustomRetriever' object has no attribute 'object_map'

Copy link

dosubot bot commented Jul 20, 2024

To resolve the AttributeError: Error running coroutine: 'CustomRetriever' object has no attribute 'object_map' error when using QueryFusionRetriever with CustomRetriever, you need to ensure that the CustomRetriever class has an object_map attribute. This attribute is expected by the QueryFusionRetriever class, as it is passed to the BaseRetriever's __init__ method.

Here is an example of how you can modify the CustomRetriever class to include the object_map attribute:

from typing import List, Optional, Dict
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor

class CustomRetriever(BaseRetriever):
    def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None, object_map: Optional[Dict] = None):
        super().__init__(object_map=object_map)
        self._retriever = retriever
        self._node_postprocessors = node_postprocessors or []

    def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
        for node_postprocessor in self._node_postprocessors:
            nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
        return nodes

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = self._retriever.retrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = await self._retriever.aretrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
    index=index,  # Assuming you have an index object
    similarity_top_k=10,
)

# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor], object_map={})

# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")

# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)

# Print the body text of the retrieved nodes
for node_with_score in nodes:
    print(node_with_score.node.body)

This modification ensures that the CustomRetriever class has the object_map attribute, which should resolve the AttributeError [1][2].

To continue talking to Dosu, mention @dosu.

@hingkan
Copy link
Author

hingkan commented Aug 1, 2024

To resolve the error when using with , you need to ensure that the class has an attribute. This attribute is expected by the class, as it is passed to the 's method.AttributeError: Error running coroutine: 'CustomRetriever' object has no attribute 'object_map'``QueryFusionRetriever``CustomRetriever``CustomRetriever``object_map``QueryFusionRetriever``BaseRetriever``__init__

Here is an example of how you can modify the class to include the attribute:CustomRetriever``object_map

from typing import List, Optional, Dict
from llama_index.core.base.base_retriever import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor

class CustomRetriever(BaseRetriever):
    def __init__(self, retriever: RecursiveRetriever, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None, object_map: Optional[Dict] = None):
        super().__init__(object_map=object_map)
        self._retriever = retriever
        self._node_postprocessors = node_postprocessors or []

    def _apply_node_postprocessors(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
        for node_postprocessor in self._node_postprocessors:
            nodes = node_postprocessor.postprocess_nodes(nodes, query_bundle=query_bundle)
        return nodes

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = self._retriever.retrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

    async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        nodes = await self._retriever.aretrieve(query_bundle)
        return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

# Example usage
# Initialize your document store
docstore = SimpleDocumentStore()

# Define the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Configure the RecursiveRetriever
retriever = RecursiveRetriever(
    index=index,  # Assuming you have an index object
    similarity_top_k=10,
)

# Create the CustomRetriever with the RecursiveRetriever and postprocessor
custom_retriever = CustomRetriever(retriever=retriever, node_postprocessors=[postprocessor], object_map={})

# Create a QueryBundle
query_bundle = QueryBundle(query_str="What did the author do growing up?")

# Retrieve nodes
nodes = custom_retriever._retrieve(query_bundle)

# Print the body text of the retrieved nodes
for node_with_score in nodes:
    print(node_with_score.node.body)

This modification ensures that the class has the attribute, which should resolve the CustomRetriever``object_map``AttributeError [1][2].

To continue talking to Dosu, mention @dosu.

How do I obtain the previous and subsequent nodes after using the QueryFusionRetriever search

Copy link

dosubot bot commented Aug 1, 2024

To obtain the previous and subsequent nodes after using the QueryFusionRetriever search, you can utilize the PrevNextNodePostprocessor class. This class allows you to fetch additional nodes from the document store based on the relationships of the nodes. Here is an example of how you might achieve this:

from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core.schema import NodeWithScore
from llama_index.legacy.postprocessor.node import PrevNextNodePostprocessor

# Initialize your document store
docstore = SimpleDocumentStore()

# Assuming `results` is the output from the QueryFusionRetriever search
results = retriever._reciprocal_rerank_fusion(search_results)

# Initialize the postprocessor
postprocessor = PrevNextNodePostprocessor(docstore=docstore, num_nodes=1, mode="both")

# Postprocess the nodes to get previous and next nodes
processed_nodes = postprocessor._postprocess_nodes(results)

# Function to find previous and subsequent nodes
def get_prev_and_next_nodes(nodes, target_node):
    prev_node = None
    next_node = None
    for i, node in enumerate(nodes):
        if node == target_node:
            if i > 0:
                prev_node = nodes[i - 1]
            if i < len(nodes) - 1:
                next_node = nodes[i + 1]
            break
    return prev_node, next_node

# Example usage
target_node = results[2]  # Assuming you are interested in the node at index 2
prev_node, next_node = get_prev_and_next_nodes(processed_nodes, target_node)

print("Previous Node:", prev_node)
print("Next Node:", next_node)

This code snippet initializes the PrevNextNodePostprocessor with the document store and the desired mode ("both" to get both previous and next nodes). It then processes the nodes to include the previous and next nodes and defines a function get_prev_and_next_nodes to find the previous and subsequent nodes relative to a given target node. You can use this function to navigate through the results obtained from the QueryFusionRetriever search [1][2].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant