Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama example needed, please~ ☕️ #241

Open
adamwuyu opened this issue Jan 7, 2025 · 3 comments
Open

Ollama example needed, please~ ☕️ #241

adamwuyu opened this issue Jan 7, 2025 · 3 comments

Comments

@adamwuyu
Copy link

adamwuyu commented Jan 7, 2025

I really like the flexibility and high integration of this project, so I have been trying to integrate with ollama today. Now it is 00:30 here and I have not succeeded yet. Can you help me?

Question 1:
This error will appear several times during the parsing process:

LLM response has improper format {'nodes': [{'name': 'Configuring the Prompt', 'type': 'document_section', 'content': '
...
chunk_index=2

If this error occurs, the corresponding chunk nodes and relationships will not be generated, right?

Question 2:
Why does parsing a little more content result in a 500 error, such as a 50-page PDF?

Question 3:
Why sometimes this error happens:

  File "/Users/adam/miniconda3/envs/lightrag/lib/python3.11/site-packages/neo4j/_sync/io/_common.py", line 254, in on_failure
    raise self._hydrate_error(metadata)
neo4j.exceptions.CypherTypeError: {code: Neo.ClientError.Statement.TypeError} {message: Property values can only be of primitive types or arrays thereof. Encountered: Map{title -> String("${movieTitle}"), score -> String("${score}")}.}
run_id='7ec2d19e-4c70-47c8-93d9-5b978228c242' result={'resolver': {'number_of_nodes_to_resolve': 0, 'number_of_created_nodes': None}}

Question 4:
In the latest version, the best practice for using the local ollama model is to use

from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.llm.openai_llm import OpenAILLM

or:

from neo4j_graphrag.embeddings.ollama import OllamaEmbeddings
from neo4j_graphrag.llm.ollama_llm import OllamaLLM

Using the former can start parsing, but it will cause problems 1, 2 and 3 mentioned above; if the latter is used, parsing cannot be started, and an error is prompted:

File "/Users/adam/miniconda3/envs/lightrag/lib/python3.11/site-packages/pydantic/main.py", line 214, in init
validated_self = self.pydantic_validator.validate_python(data, self_instance=self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pydantic_core._pydantic_core.ValidationError: 1 validation error for Neo4jNode embedding_properties.embedding.0 Input should be a valid number [type=float_type, input_value=[0.0035258962, 0.00050194...047494516, -0.006978964], input_type=list] For further information visit https://errors.pydantic.dev/2.10/v/float_type
@adamwuyu
Copy link
Author

adamwuyu commented Jan 7, 2025

This is my full source code, hope it explain everything. Or plese provide a working ollama example?

import asyncio
from pathlib import Path

import neo4j
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.llm.openai_llm import OpenAILLM
# from neo4j_graphrag.embeddings.ollama import OllamaEmbeddings
# from neo4j_graphrag.llm.ollama_llm import OllamaLLM
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
from neo4j_graphrag.llm import LLMInterface

# Neo4j db infos
NEO4J_URI = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "neo4j"
OLLAMA_URL = "http://127.0.0.1:11434/v1"
LLM_MODEL = "qwen2.5:14b"
EMBEDDING_MODEL = "nomic-embed-text:latest"

# Neo4j db infos
URI = NEO4J_URI
AUTH = (NEO4J_USERNAME, NEO4J_PASSWORD)
DATABASE = "neo4j"
# 获取当前文件的父目录
root_dir = Path().resolve().parents[1]
print(root_dir)
file_path = root_dir / "data" / "neo4j-graphrag-python pages 1-36.pdf"
print(file_path)

embedder = OpenAIEmbeddings(
    base_url=OLLAMA_URL, 
    api_key="None", 
    model=EMBEDDING_MODEL
)

llm = OpenAILLM(
    base_url=OLLAMA_URL, 
    api_key="None", 
    model_name=LLM_MODEL, 
        model_params={
            "max_tokens": 2000,
            "response_format": {"type": "json_object"},
        })

# Connect to the Neo4j database
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD), connection_timeout=200, max_connection_pool_size=50)


# Instantiate Entity and Relation objects. This defines the
# entities and relations the LLM will be looking for in the text.
NODES_FROM_PDF = [
    'Chunk', 'Class', 'Component', 'Document', 'Person', 'Package', 'Organization', 
    'Parameter', 'Node', 'Date', 'Configuration', 'House', 'PythonClient', 'Method', 
    'Argument', 'Property', 'Entity', 'Concept', 'Interface', 'Function', 'System', 
    'Submodule', 'LLM', 'Service', 'Authentification', 'DriverInstance', 'FilePath', 
    'KGBuilder', 'Planet', 'EntityAndRelationExtractor', 'SimpleKGBuilder', 
    'Attribute', 'PythonPackage', 'Variable', 'Algorithm', 
    'Driver', 'Retriever', 'Schema', 'Project', 'Version', 'Field', 
    'SchemaDefinition', 'API Key', 'URI', 'AUTH', 'INDEX_NAME', 'Label', 
    'Neo4jConfig', 'LLMConfig', 'EmbedderConfig'
]
# Prompt: 你是neo4j和neo4j-graphrag的开发者,如果要把这两种知识整理成知识图谱,
# 请补充以下nodes之外的nodes,并单独以一个python list的形式输出。
NODES_FROM_CHATGPT = [
    'Graph', 'Relationship', 'Query', 'Cypher', 'Index', 'Transaction', 
    'GraphAlgorithm', 'DataModel', 'GraphVisualization', 'DataIngestion', 
    'ETL', 'API', 'Driver', 'Deployment', 'Scalability', 'Performance', 
    'Security', 'Backup', 'Restore', 'Monitoring', 'Analytics', 
    'MachineLearning', 'Recommendation', 'DataScience', 'GraphTheory', 
]
ENTITIES = NODES_FROM_PDF + NODES_FROM_CHATGPT

RELATIONS = ["INHERITED_FROM", "REQUIRE", "HAS", "IS_INSTANCE_OF", "USES", "BELONGS_TO", "CONNECTED_TO", "PROVIDES", "DEPENDS_ON", "CONTAINS"]
POTENTIAL_SCHEMA = [
    ("Class", "HAS", "Property"),
    ("Person", "BELONGS_TO", "Organization"),
    ("API", "REQUIRE", "API Key"),
    ("Document", "HAS", "Component"),
    ("Chunk", "CONTAINS", "Node"),
    ("Method", "IS_INSTANCE_OF", "Function"),
    ("Package", "HAS", "Submodule"),
    ("Configuration", "REQUIRE", "Schema"),
    ("Driver", "USES", "API"),
    ("Graph", "HAS", "Node"),
    ("Entity", "HAS", "Attribute"),
    ("Project", "BELONGS_TO", "Organization"),
    ("GraphAlgorithm", "DEPENDS_ON", "DataModel"),
]

async def define_and_run_pipeline(
    neo4j_driver: neo4j.Driver,
    llm: LLMInterface,
    embedder=OpenAIEmbeddings,
) -> PipelineResult:
    kg_builder = SimpleKGPipeline(
        llm=llm,
        driver=neo4j_driver,
        embedder=embedder,
        entities=ENTITIES,
        relations=RELATIONS,
        potential_schema=POTENTIAL_SCHEMA,
        neo4j_database=DATABASE,
    )
    return await kg_builder.run_async(file_path=str(file_path))


async def main() -> PipelineResult:
    with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
        res = await define_and_run_pipeline(driver, llm, embedder)
    await llm.async_client.close()
    return res


if __name__ == "__main__":
    res = asyncio.run(main())
    print(res)

@stellasia
Copy link
Contributor

Hi @adamwuyu ,

Thanks you for reaching out and for sharing your code! Trying to answer your questions below:

  1. Yes, sometimes the LLM can't produce valid JSON or the JSON in the format the library expects. In that case, only the 'Chunk' node will be created (in the "lexical graph"), but no entities for this chunk will be created.

  2. I can't say a lot about that, we've not experienced such errors. Two ideas:

    • check if there is somethig in the Ollama configuration that can help
    • also maybe try with a higher chunk sizes to reduce the number of calls
  3. Thanks for reporting this one, we will investigate.

  4. Both methods should work the same. The error you're seeing in the second case is likely a bug, we will also take a look.

@stellasia stellasia mentioned this issue Jan 14, 2025
15 tasks
@stellasia
Copy link
Contributor

Hi,

Regarding the error you reported with Ollama classes, it was a bug that has been fixed in version 1.4.2, released just now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants