Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip improved object retrieval #10513

Merged
merged 7 commits into from
Feb 16, 2024
Merged

Conversation

logan-markewich
Copy link
Collaborator

@logan-markewich logan-markewich commented Feb 8, 2024

A WIP approach to improve recurisve retrieval.

Basically, if index_node.obj is serializable, then just throw it into the vector db/storage layer, no need for mappings

Example Usage

from llama_index import VectorStoreIndex, StorageContext
from llama_index.schema import IndexNode, TextNode
from llama_index.vector_stores import QdrantVectorStore

index = VectorStoreIndex(nodes=[TextNode(text="bad_node")])

# this isn't serializable, and is maintained as a mapping
bad_node = IndexNode(index_id="bad1", obj=index.as_retriever(), text="bad summary")

# this is serializable, no mapping needed
good_node = IndexNode(index_id="good1", obj=TextNode(text="good_node"), text="good_summary")

index2 = VectorStoreIndex(
        nodes=[good_node],
        objects=[bad_node],
)
nodes = index2.as_retriever(verbose=True).retrieve("test")

print(nodes[0].text)  # -> "good node"
print(nodes[1].text)  # -> "bad node"

# save
index.storage_context.persist()

# loading -- need to provide unserializable objects at load time
load_index_from_storage(storage_context, objects=[bad_node])

TODO

  • unit tests
  • confirm it works with vector dbs

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 8, 2024
llama_index/schema.py Outdated Show resolved Hide resolved
llama_index/schema.py Outdated Show resolved Hide resolved
llama_index/indices/base.py Outdated Show resolved Hide resolved
llama_index/core/base_retriever.py Outdated Show resolved Hide resolved
@@ -144,7 +167,7 @@ def _handle_recursive_retrieval(
node = n.node
score = n.score or 1.0
if isinstance(node, IndexNode):
obj = self.object_map.get(node.index_id, None)
obj = node.obj or self.object_map.get(node.index_id, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help me understand, is object_map now only used for retrievers/query engines?

If it's a Node it should now be serialized/deserialized directly on the IndexNode right?

at a high-level once we make retrievers/query engine serializable i was thinking object_map would go away, and we'd replace with a proper docstore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if query engines/retrievers were serializable, this would go away.

Right now, unserializable index nodes have to be passed in under the objects kwarg -- from there, we can build a map of index id to object

Then we can serialize and retrieve the index node without the object.

If an index node is retrieved, the object map is checked if we have its object

@logan-markewich logan-markewich merged commit 3546490 into main Feb 16, 2024
8 checks passed
@logan-markewich logan-markewich deleted the logan/serialize_recursive_retriever branch February 16, 2024 23:16
@hatianzhang
Copy link
Contributor

nice will upd some notebooks!

Dominastorm pushed a commit to uptrain-ai/llama_index that referenced this pull request Feb 28, 2024
anoopshrma pushed a commit to anoopshrma/llama_index that referenced this pull request Mar 2, 2024
Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants