text2graph_llm (USGS project)

text2graph_llm is an experimental tool that uses Large Language Models (LLMs) to convert text into structured graph representations by identifying and extracting relationship triplets. This repository is still in development and may change frequently.

System overview

Features

Extract Relationship Triplets: Automatically identifies and extracts (subject, predicate, object) triplets from text, converting natural language to a structured graph. Currently, "subject" is limited to location names and "object" to stratigraphic names.
Integrate Macrostrat Entity Information: Enhances entity recognition by incorporating additional data from the Macrostrat database, which improves graph accuracy and detail.
Incorporate Geo-location Data: Adds geo-location data from external APIs to the graph, enhancing context and utility of the relationships.
Traceable Source Information (Provenance): Implements PROV-O standards to ensure the credibility and traceability of source information.
Support Turtle (TTL) Format: Offers the Turtle (TTL) format for graph data, providing a human-readable option that eases interpretation and sharing.

Demo

Explore our interactive demo

Quick start for using API endpoint

We are using the cached LLM graph for faster processing. However, the hydration step (retrieving entity details) is still processed in real time; we are working on caching this step as well.

import requests

API_ENDPOINT = "http://cosmos0002.chtc.wisc.edu:4510/llm_graph"
API_KEY = "Email [email protected] to request an API key if you need access."

headers = {"Content-Type": "application/json", "Api-Key": API_KEY}
data = {
    "query": "Gold mines in Nevada.",
    "top_k": 1,
    "ttl": True,  # Return in TTL format or not
    "hydrate": False,  # Get additional data from external services (e.g., GPS). Due to rate limit, it is very slow. Do not use with top_k > 3
}

response = requests.post(API_ENDPOINT, headers=headers, json=data)
response.raise_for_status()
print(response.json())

For convenient, you can use this notebook

Links

For developers

Instructions to developers

Steps to setup environment:

Open the project in VSCode.
Press F1, select Reopen in Container to set up the dev environment using the dev-container.
Copy the .env file from the shared Google Drive to the project root.
Copy the extracted graph cache data from Google Drive to app_data/.
Run docker-compose up in bash to deploy locally.

Running Batch Inference on CHTC:

Update text2graph_llm_chtc Container: Ensure the base package is updated and pushed to ghcr.io. See the package script for details.
Create ID Pickle: Ensure the data package storing the document IDs (e.g., ./chtc/geoarchive_paragraph_ids.pkl) is up-to-date.
Add .env File: Ensure ./chtc/.env contains the required credentials. See this example.
Initialize Turso DB: Run hard_reset to initialize Turso DB in ./chtc/db.py.
Login to CHTC Submit Node: Login to your CHTC submit node (e.g., ap2001.chtc.wisc.edu).
Update Test Job Container Name: Update the container name in the test job here and run condor_submit.
Verify Turso Data Reception: Ensure Turso is properly receiving data.
Submit Full Job: Submit the job using this job file and update the Docker container in this file.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
api		api
app_data		app_data
chtc		chtc
data		data
demo		demo
docs		docs
notebooks		notebooks
scripts		scripts
tests		tests
text2graph		text2graph
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
docker-compose-cosmos.yml		docker-compose-cosmos.yml
docker-compose.yml		docker-compose.yml
example.env		example.env
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text2graph_llm (USGS project)

System overview

Features

Demo

Quick start for using API endpoint

Links

Instructions to developers

About

Releases

Packages

Contributors 2

Languages

License

UW-xDD/text2graph_llm

Folders and files

Latest commit

History

Repository files navigation

text2graph_llm (USGS project)

System overview

Features

Demo

Quick start for using API endpoint

Links

Instructions to developers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages