-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Storing indices on persistent storage and loading #14467
Comments
The issue with your Knowledge Graph (KG) index returning empty responses when loaded from persistent storage, despite the storage index files appearing to be correctly populated, could be due to the In the provided JSON structure, the {
"index_struct": {
"text": null,
"doc_id": "446680de-660e-421a-8fdc-ed244bd466ce",
"embedding": null,
"extra_info": null,
"table": {
...
}
}
} The Additionally, ensure that the To resolve this, ensure that:
Here is an example of how the storage_context = StorageContext(graph_store=your_graph_store)
query_engine = KnowledgeGraphQueryEngine(storage_context=storage_context) Make sure that Additionally, if you are using the if (
len(self.index_struct.table) > 0
and isinstance(self.graph_store, SimpleGraphStore)
and len(self.graph_store._data.graph_dict) == 0
):
logger.warning("Upgrading previously saved KG index to new storage format.")
self.graph_store._data.graph_dict = self.index_struct.rel_map This ensures that the Moreover, you can refer to the def test_load_graph_from_storage_simple(
documents: List[Document],
tmp_path: Path,
mock_service_context: ServiceContext,
) -> None:
storage_context = StorageContext.from_defaults()
vector_index_1 = VectorStoreIndex.from_documents(
documents=documents,
storage_context=storage_context,
service_context=mock_service_context,
)
vector_index_2 = VectorStoreIndex.from_documents(
documents=documents,
storage_context=storage_context,
service_context=mock_service_context,
)
summary_index = SummaryIndex.from_documents(
documents=documents,
storage_context=storage_context,
service_context=mock_service_context,
)
graph = ComposableGraph.from_indices(
SummaryIndex,
children_indices=[vector_index_1, vector_index_2, summary_index],
index_summaries=["vector index 1", "vector index 2", "summary index"],
storage_context=storage_context,
service_context=mock_service_context,
)
query_engine = graph.as_query_engine()
response = query_engine.query("test query")
storage_context.persist(str(tmp_path))
new_storage_context = StorageContext.from_defaults(persist_dir=str(tmp_path))
new_graph = load_graph_from_storage(
new_storage_context, root_id=graph.root_id, service_context=mock_service_context
)
new_query_engine = new_graph.as_query_engine()
new_response = new_query_engine.query("test query")
assert str(response) == str(new_response) This test ensures that the graph is correctly loaded from storage and that the query responses remain consistent [1][2][3][4]. |
Im still confused, how can I update my script to ensure my rel_map is populated when creating and storing my persistent storage? |
To ensure your class SimpleGraphStore(GraphStore):
"""Simple Graph Store.
In this graph store, triplets are stored within a simple, in-memory dictionary.
Args:
simple_graph_store_data_dict (Optional[dict]): data dict
containing the triplets. See SimpleGraphStoreData
for more details.
"""
def __init__(
self,
data: Optional[SimpleGraphStoreData] = None,
fs: Optional[fsspec.AbstractFileSystem] = None,
**kwargs: Any,
) -> None:
"""Initialize params."""
self._data = data or SimpleGraphStoreData()
self._fs = fs or fsspec.filesystem("file")
def upsert_triplet(self, subj: str, rel: str, obj: str) -> None:
"""Add triplet and update rel_map."""
if subj not in self._data.graph_dict:
self._data.graph_dict[subj] = []
if [rel, obj] not in self._data.graph_dict[subj]:
self._data.graph_dict[subj].append([rel, obj])
# Update rel_map after upserting the triplet
self.rel_map = self.get_rel_map(subjs=[subj])
def get_rel_map(
self, subjs: Optional[List[str]] = None, depth: int = 2, limit: int = 30
) -> Dict[str, List[List[str]]]:
"""Get depth-aware rel map."""
return self._data.get_rel_map(subjs=subjs, depth=depth, limit=limit)
# Example usage
graph_store = SimpleGraphStore()
graph_store.upsert_triplet("Software", "Mention in", "Footnotes")
print(graph_store.rel_map) In this example, after upserting the triplet, the |
Hey @logan-markewich I have tried to understand the following from dosubot, but I am struggling to implement correctly. I would just like to be able to load my created vector and graph indices from the persistent directory without having to connect to my nebula graph server. Is this even possible? |
Question Validation
Question
Hello, I am running into some issues loading my indices from my persistent store.
The following script saves both of my vector and graph indices:
Then the following function in my querying script attemps to load these indices, however, the kg index always returns empty responses for some reason:
Additionally, when I run my querying script I have included debugging to give me more information, which I have appended here as well:
Does anyone know why this could be happening?
I have also inspected the files within the storage index and they seem to be correctly populated.
Thanks for your time!
The text was updated successfully, but these errors were encountered: