-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: RAG implementation from existing knowledge graph #17810
Comments
Hey @laurasck14! I'm here to help you with any coding conundrums. Let's tackle this challenge together! To implement a RAG from an existing knowledge graph using
By following these steps, you should be able to implement a RAG from an existing knowledge graph and ensure embeddings are generated for all nodes [1][2]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report |
@laurasck14 If your property graph was created WITHOUT llama-index, your options are limited. You can create a retriever from an existing graph, but likely only the TextToCypherRetriever or the CypherTemplateRetriever will work If your graph was already created with llama-index, any retriever will work fine. In either case, to initialize from an existing graph like nebula, all you need is
|
@dosu @logan-markewich thank you for your answer. Is there a way to generate embeddings from the existing graph? from the text associated to the nodes and the metadata. The idea would be to retrieve relevant nodes based on a semantic similarity but also take into account the embeddings from the nodes and traverse the graph in different depths depending on the question asked. |
To generate embeddings from an existing graph using the text associated with the nodes and their metadata, and to retrieve relevant nodes based on semantic similarity while considering node embeddings and traversing the graph at different depths, you can use the Here's a general approach:
Here's a simplified code snippet demonstrating the setup using from llama_index.core.indices.tree.select_leaf_embedding_retriever import TreeSelectLeafEmbeddingRetriever
from llama_index.core.indices.tree.base import TreeIndex
from llama_index.core.base.embeddings.base import BaseEmbedding
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
from llama_index.core.vector_stores.types import VectorStoreQuery, MetadataFilters, MetadataFilter
# Assume `tree_index` is your TreeIndex and `embed_model` is your embedding model
retriever = TreeSelectLeafEmbeddingRetriever(
index=tree_index,
embed_model=embed_model,
child_branch_factor=2 # Adjust based on how many child nodes to consider
)
# Assume `oceanbase_store` is your OceanBaseVectorStore instance
# Add node embeddings to the vector store
oceanbase_store.add(node_embeddings)
# Query with embedding similarity
query_embedding = [1.0, 0.0, 0.0] # Example query embedding
result = oceanbase_store.query(VectorStoreQuery(query_embedding=query_embedding, similarity_top_k=1))
# Query with metadata filters
filters = MetadataFilters(
filters=[MetadataFilter(key="author", value="Marie Curie", operator="==")]
)
result_with_filters = oceanbase_store.query(VectorStoreQuery(filters=filters, similarity_top_k=1)) This setup allows you to retrieve nodes based on semantic similarity, considering both the text and metadata of the nodes, and traverse the graph at different depths depending on the query [1][2][3][4][5]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@laurasck14 not really an easy way to generate embeddings for an existing graph. It would be pretty complex (need to pull down the data to embed, embed it, and either put the embeddings back into the graph (if supported) or into a standalone vector store that references IDs from your graph) |
Question Validation
Question
I am currently trying to implement RAG from an EXISTING property graph using Llamaindex. I mean that I do already have nodes and edges and I do not need to extract paths from text or embed documents but rather embed nodes.
I have tried two methods in order to make this work:
1. define nodes using
TextNode
with their corresponding relationshipsNodes have a format like:
TextNode(id_='99853', embedding=None, metadata={'node_index': '99853', 'node_id': '43959', 'node_type': 'disease', 'node_name': 'pseudolymphoma', 'node_source': 'MONDO', 'mondo_id': '43959', 'mondo_name': 'pseudolymphoma'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.CHILD: '5'>: [RelatedNodeInfo(node_id='35769', node_type=None, metadata={'relation': 'disease_disease', 'display_relation': 'parent-child'}, hash=None)]}, metadata_template='{key}: {value}', metadata_separator='\n', text="A neoplastic process...(shortened)", mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}')
not all nodes have
text
but all do havemetadata
.and using a property graph store from NebulaGraph to store the nodes and edges and creating the index as:
however this seems like the nodes are bein treated as if they were documents as the
graph_store.json
appears to be empty and the nodes appear in thedocstore.json
from the index.I have also tried defining the nodes as
EntityNodes
andChunkNodes
upserting them manually into the Nebula graph store and constructing the the index from there using the.from_existing
but also doesn't work.2. using the
KnowledgeGraphIndex
function although it's deprecatedThis method seems to be generating and index from existing nodes without considering them as text:
However when processing some nodes no embeddings are generated:
Generating embeddings: 0it [00:00, ?it/s]
and when using the indexas_retriever
or aschat_engine
it retrieves there is no information about that in the database, but there are nodes refering to that.My questions:
PropertyGraphIndex
andNebulaGraphPropertyStore
? if so, how would be the correct way to implement it? (all the documentation I found is based on generating a graph from text documents.)KnowledgeGraphIndex
how can I ensure embeddings are generated from all nodes?I'm kind of new in NLP and implementing RAG, any help or guidance is very much appreciated! :)
The text was updated successfully, but these errors were encountered: