rag-woped-integration

Integration of Retrieval Augmented Generation for WoPeD.

=======

Inhaltsverzeichnis

🚀 Getting Started
📦 Requirements
⚙️ Local Setup
🧱 Architecture Overview
🧩 Technologies Used
Branching & Commit: Process in VS Code
Branching and commit via Terminal

🚀 Getting Started

This project is a modular, testable Retrieval-Augmented Generation (RAG) API built with Flask, LangChain, and ChromaDB — structured using hexagonal architecture.

📦 Requirements

Python 3.10+
pip
(Optional) Docker/Podman
OpenAI API Key (or compatible LLM provider)

⚙️ Local Setup

Clone the repository git clone https://github.com/your-username/rag-api.git
Create a virtual environment

python -m venv venv

source venv/bin/activate

Windows: source venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

New dependecies should be added in file "requirements.txt".

Run the Flask application

python main.py

when using the REST-API Controller, use the subdomain /rag/?query=X for adding the parameter

Run using Docker (alternative)

docker build -t rag-api . or podman build -t rag-api .

docker run -p 5000:5000 rag-api or podman run -p 5000:5000 rag-api

Run with Logging
The default logging is set to warning and will be active automatically when starting the application. The default can be changed by setting the LOG_LEVEL environment variable in app/config/config.env.

Flask
python main.py --loglevel <level>

Docker
docker run -p 5000:5000 rag-api --loglevel <level>

Possible values for <level>:
debug: displays all messages
info: displays infos, warnings, errors, and critical messages
warning: displays warnings, errors, and critical messages
error: displays errors and critical messages
critical: displays only critical messages

🧱 Architecture Overview

This project follows a hexagonal architecture pattern for a clean, testable, and maintainable Retrieval-Augmented Generation (RAG) API.

🎯 ApplicationService - Central Entry Point

The ApplicationService is the main entry point and orchestrator of the entire system:

Technology-agnostic: Contains only business logic, no infrastructure dependencies
Coordinates workflows: PDF processing, RAG pipeline, document management
Delegates to domain services: Each service handles specific business concerns
Singleton pattern: Centralized access point configured in main.py

🏗️ Architecture Layers

🔹 Core Layer

ApplicationService.py: Main business orchestrator and entry point
ports/: Abstract interfaces (DatabasePort, RAGPort, PDFLoaderPort, QueryExtractorPort)
services/: Business logic services that handle success logging
- DatabaseService: Document CRUD operations
- RAGService: Retrieval & augmentation pipeline
- PDFService: Document processing workflows
- QueryExtractionService: Diagram preprocessing logic

🔹 Infrastructure Layer

Infrastructure adapters handle technical implementation and detailed error handling:

db/:
- DatabaseAdapter: ChromaDB integration via LangChainClient
- PDFLoader: Document parsing and chunking
rag/:
- RAGAdapter: RAG pipeline implementation using LangChain
- LangchainClient: LangChain vector operations
preprocessing/:
- BpmnQueryExtractor: BPMN diagram processing
- PnmlQueryExtractor: PNML diagram processing

🔹 Presentation Layer

controller/: REST API endpoints via Flask (RESTController.py)

🧠 Diagram Preprocessing Feature

Key Feature: Improves RAG search quality by extracting meaningful business terms from BPMN/PNML diagrams.

How it works:

Detection: Automatically detects diagram format (BPMN vs PNML)
Extraction: Extracts business-relevant text (activity names, events, etc.)
Technical Filtering: Removes technical IDs (task_12j0pib, p1, t3, etc.)
Structural Filtering: Removes tool/system terms (woped, designer, start, end, etc.)

Configuration: Set ENABLE_DIAGRAM_PREPROCESSING=true/false in app/config/config.env

⚙️ Configuration Management

The system is highly configurable through environment variables in app/config/config.env:

Database settings: ChromaDB persist directory configuration
ChromaDB settings: Telemetry, connection parameters
Document processing: Chunk size, overlap, PDF directory
RAG pipeline: Similarity threshold, result count, embedding model
Diagram preprocessing: Enable/disable semantic extraction
Server configuration: Host, port settings
Logging: Log level configuration

See app/config/config.env for all available configuration options.

🧪 Comprehensive Test Coverage

108 total tests ensuring system reliability:

Unit Tests (`tests/unit/`)

Services: Business logic testing (database, RAG, PDF, query extraction)
Adapters: Infrastructure testing (LangChain, extractors, loaders)
Presentation: API endpoint validation

Integration Tests (`tests/integration/`)

End-to-End workflows: Complete RAG pipeline, diagram preprocessing, PDF ingestion

🧩 Technologies Used

Flask for REST API exposure and web server
LangChain for RAG orchestration and document processing
ChromaDB as vector database for semantic search
HuggingFace Transformers for embedding models and AI/ML components
xml.etree.ElementTree (Python standard library) for BPMN/PNML XML parsing
PyPDF for PDF document processing and text extraction
pytest for comprehensive unit and integration testing
Hexagonal Architecture for clean separation of concerns and testability

Branching & Commit: Process in VS Code

Create new branch

Go to the Source Control panel (or Ctrl + Shift + G)

click on the menue icon and select "Branch" --> "Create Branch"

type a name for the branch and hit ENTER, you are now in the new branch

- If you like to change between branches, click the icon which shows your current branch (here: \Test) and select the branch you would like to change to.

Stage changes

If you want to save your changeslocally to your current branch go to the source panel
select the plus-symbol to stage all your changes

enter a commit message, containing a summary of the changes you made

click on Commit or press Ctrl + Enter

Keep in mind, that a commit only saves your changes locally on your machine. To make the committed changes available to other developers, you need to push your changes too. You can commit regularly during your work, but you don't need to push every time you made a commit.

Push changes

Go to the Source Control panel
click the menu icon
select "Pull, Push" --> "Push"

Branching and commit via Terminal

Create new branch

navigate to your Project-folder (if you opened your project folder with your IDE, the IDE terminal is already in the correct folder.)
type git checkout -b name-of-your-branch in your terminal, this will generate a new branch

Commit changes

type git add . in your terminal & hit ENTER, this stages all your changes
type git commit -m "Your Commit message goes here" & hit ENTER, this commits your changes locally

Push changes

type git push origin name-of-your-branch & hit ENTER, this pushes your committed changes to the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
.woped-hooks @ 8ab46d5		.woped-hooks @ 8ab46d5
PDF		PDF
app		app
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rag-woped-integration

Table of Contents

Inhaltsverzeichnis

🚀 Getting Started

📦 Requirements

⚙️ Local Setup

New dependecies should be added in file "requirements.txt".

🧱 Architecture Overview

🎯 ApplicationService - Central Entry Point

🏗️ Architecture Layers

🔹 Core Layer

🔹 Infrastructure Layer

🔹 Presentation Layer

🧠 Diagram Preprocessing Feature

⚙️ Configuration Management

🧪 Comprehensive Test Coverage

Unit Tests (`tests/unit/`)

Integration Tests (`tests/integration/`)

🧩 Technologies Used

Branching & Commit: Process in VS Code

Branching and commit via Terminal

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

woped/rag

Folders and files

Latest commit

History

Repository files navigation

rag-woped-integration

Table of Contents

Inhaltsverzeichnis

🚀 Getting Started

📦 Requirements

⚙️ Local Setup

New dependecies should be added in file "requirements.txt".

🧱 Architecture Overview

🎯 ApplicationService - Central Entry Point

🏗️ Architecture Layers

🔹 Core Layer

🔹 Infrastructure Layer

🔹 Presentation Layer

🧠 Diagram Preprocessing Feature

⚙️ Configuration Management

🧪 Comprehensive Test Coverage

Unit Tests (tests/unit/)

Integration Tests (tests/integration/)

🧩 Technologies Used

Branching & Commit: Process in VS Code

Branching and commit via Terminal

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Unit Tests (`tests/unit/`)

Integration Tests (`tests/integration/`)

Packages