Updated what is graphrag, knowledge graph from the field guide

graphrag · Nov 27, 2024 · b01e837 · b01e837
1 parent 2907a06
commit b01e837
Show file tree

Hide file tree

Showing 2 changed files with 72 additions and 3 deletions.
diff --git a/src/content/docs/concepts/intro-to-graphrag.md b/src/content/docs/concepts/intro-to-graphrag.md
@@ -75,4 +75,45 @@ A modular RAG system contains more complex patterns, which require orchestration
 * Answer generation
 
 Here, we want to focus on the *retrieval phase* and compile a catalog of the most-often referenced GraphRAG retrieval patterns and their required graph patterns. 
-Please note that the patterns here are not an exhaustive list.
+Please note that the patterns here are not an exhaustive list.
+
+### How to GraphRAG
+
+If you’re looking to implement the retrievers discussed here using the [neo4j-graphrag package](https://neo4j.com/blog/graphrag-python-package/), [LangChain](https://neo4j.com/labs/genai-ecosystem/langchain/), [LlamaIndex](https://neo4j.com/labs/genai-ecosystem/llamaindex/), check their GraphRAG retriever integrations.
+<!-- Neo4j Vector 
+We won’t cover setting up your Python project for Neo4j-based retrievers here, as that’s well-documented elsewhere (e.g. the GraphAcademy Courses mentioned below).
+-->
+
+Here we focus on the intriguing part: using the `retrieval_query` to implement the GraphRAG patterns we discuss. The details of each pattern will include the corresponding query.
+
+Remember, when crafting your query, there’s an invisible “first part” that performs the search operation to find your entry points into the graph (which can be vector, fulltext, spatial, hybrid or filters). 
+The search part returns the found nodes and their similarity scores, which you can then use in your retrieval query to execute further traversals. 
+<!-- In the retrieval query can also use additional custom parameters and the `$embedding` parameter for the question embedding. 
+-->
+
+Here is an example of how this might look (example from LangChain Neo4j Vector documentation):
+
+```python
+retrieval_query = """
+RETURN "Name: " + node.name AS text, score, {source:node.url} AS metadata
+"""
+retrieval_example = Neo4jVector.from_existing_index(
+    OpenAIEmbeddings(),
+    url=url,
+    username=username,
+    password=password,
+    index_name="person_index",
+    retrieval_query=retrieval_query,
+)
+retrieval_example.similarity_search("Jon Snow", k=1)
+```
+
+In the above example, a vector similarity search is executed on the existing index `person_index` using the user input `"Jon Snow"` and returning the `name`, the `score`, and some `metadata` are returned for the one node with the best fit `(k=1)`.
+
+<!-- todo continue -->
+
+## Further Reading
+
+* [Neo4j GraphAcademy: Build a Neo4j-backed Chatbot using Python](https://graphacademy.neo4j.com/courses/llm-chatbot-python/) 
+* [Integrating Neo4j into the LangChain ecosystem](https://towardsdatascience.com/integrating-neo4j-into-the-langchain-ecosystem-df0e988344d2)
+* [Neo4j GraphAcademy: Mastering Retrieval-Augmented Generation (RAG)]https://graphacademy.neo4j.com/courses/genai-workshop-graphrag/
diff --git a/src/content/docs/concepts/intro-to-knowledge-graphs.md b/src/content/docs/concepts/intro-to-knowledge-graphs.md
@@ -5,9 +5,37 @@ description: Self-descriptive, interconnected data structure for knowledge repre
 
 ![GraphRAG Overview](../../../assets/images/graphrag-diagram.svg)
 
-## About GraphRAG
+## Intro to Knowledge Graphs
 
-GraphRAG is Retrieval Augmented Generation (RAG) using a Knowledge Graph. 
+A [knowledge graph model](https://neo4j.com/blog/what-is-knowledge-graph/) is especially suitable for representing both structured and unstructured data with connected elements. 
+Unlike traditional databases, they do not require a rigid schema but are more flexible in the data model. 
+The graph model allows efficient storage, management, querying, and processing of the richness of real-world information. 
+In a RAG system, the knowledge graph serves as the flexible memory companion to the language skills of LLMs, such as summarization, translation, and extraction.
+
+In a knowledge graph, facts and entities are represented as *nodes* with attributes connected with typed *relationships*, which also carry attributes for qualification. 
+This graph model can scale from a simple family tree to the complete digital twin of a company encompassing employees, customers, processes, products, partnerships, and resources, with millions or billions of connections.
+
+Graph structures can originate from various sources, from a structured business domain, (hierarchical) document representations, and signals computed by graph algorithms.
+
+
+When we dive into retrieval patterns, we notice how the most advanced techniques rely on the connections within the data. 
+Whether it’s metadata filtering, like searching for articles by a specific author or on a particular topic, or parent-child retrievers, which navigate back to the parent of a text chunk to provide breadth to the LLM for context-enriched answers, these methods leverage the relationships between the data to be retrieved.
+
+Typically, these implementations rely heavily on client-side data structures and extensive Python code connecting the different pieces of information. 
+However, in a graph database, establishing real relationships and querying them with simple patterns is much more efficient.
+
+In the graph pattern of almost every pattern, you will see the following types of entities
+
+* entity or domain nodes that represent your application domain
+* domain relationships
+* document nodes that represent the unstructured documents ingested into the graph
+* chunk nodes that 
+
+![Chunk Node](../../../assets/images/element-chunk-node.svg)
+
+They are the basis for most of the GraphRAG patterns and have at least the following two properties: text and embedding, where text contains the human-readable text string of the chunk, and embedding contains the calculated embedding of the text.
+
+<!-- todo entity nodes -->
 
 ## Further reading