Skip to content

woped/rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

50 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

rag-woped-integration

Integration of Retrieval Augmented Generation for WoPeD.

=======

Table of Contents

Inhaltsverzeichnis

  1. πŸš€ Getting Started
  2. πŸ“¦ Requirements
  3. βš™οΈ Local Setup
  4. 🧱 Architecture Overview
  5. 🧩 Technologies Used
  6. Branching & Commit: Process in VS Code
  7. Branching and commit via Terminal

πŸš€ Getting Started

This project is a modular, testable Retrieval-Augmented Generation (RAG) API built with Flask, LangChain, and ChromaDB β€” structured using hexagonal architecture.


πŸ“¦ Requirements

  • Python 3.10+
  • pip
  • (Optional) Docker/Podman
  • OpenAI API Key (or compatible LLM provider)

βš™οΈ Local Setup

  1. Clone the repository git clone https://github.com/your-username/rag-api.git

  2. Create a virtual environment

python -m venv venv

source venv/bin/activate

Windows: source venv\Scripts\activate

  1. Install dependencies

pip install -r requirements.txt

New dependecies should be added in file "requirements.txt".

  1. Run the Flask application

python main.py

  • when using the REST-API Controller, use the subdomain /rag/?query=X for adding the parameter
  1. Run using Docker (alternative)

docker build -t rag-api . or podman build -t rag-api .

docker run -p 5000:5000 rag-api or podman run -p 5000:5000 rag-api

  1. Run with Logging
    The default logging is set to warning and will be active automatically when starting the application. The default can be changed by setting the LOG_LEVEL environment variable in app/config/config.env.

    Flask
    python main.py --loglevel <level>

    Docker
    docker run -p 5000:5000 rag-api --loglevel <level>

    Possible values for <level>:
    debug: displays all messages
    info: displays infos, warnings, errors, and critical messages
    warning: displays warnings, errors, and critical messages
    error: displays errors and critical messages
    critical: displays only critical messages

🧱 Architecture Overview

This project follows a hexagonal architecture pattern for a clean, testable, and maintainable Retrieval-Augmented Generation (RAG) API.

🎯 ApplicationService - Central Entry Point

The ApplicationService is the main entry point and orchestrator of the entire system:

  • Technology-agnostic: Contains only business logic, no infrastructure dependencies
  • Coordinates workflows: PDF processing, RAG pipeline, document management
  • Delegates to domain services: Each service handles specific business concerns
  • Singleton pattern: Centralized access point configured in main.py

πŸ—οΈ Architecture Layers

πŸ”Ή Core Layer

  • ApplicationService.py: Main business orchestrator and entry point
  • ports/: Abstract interfaces (DatabasePort, RAGPort, PDFLoaderPort, QueryExtractorPort)
  • services/: Business logic services that handle success logging
    • DatabaseService: Document CRUD operations
    • RAGService: Retrieval & augmentation pipeline
    • PDFService: Document processing workflows
    • QueryExtractionService: Diagram preprocessing logic

πŸ”Ή Infrastructure Layer

Infrastructure adapters handle technical implementation and detailed error handling:

  • db/:
    • DatabaseAdapter: ChromaDB integration via LangChainClient
    • PDFLoader: Document parsing and chunking
  • rag/:
    • RAGAdapter: RAG pipeline implementation using LangChain
    • LangchainClient: LangChain vector operations
  • preprocessing/:
    • BpmnQueryExtractor: BPMN diagram processing
    • PnmlQueryExtractor: PNML diagram processing

πŸ”Ή Presentation Layer

  • controller/: REST API endpoints via Flask (RESTController.py)

🧠 Diagram Preprocessing Feature

Key Feature: Improves RAG search quality by extracting meaningful business terms from BPMN/PNML diagrams.

How it works:

  1. Detection: Automatically detects diagram format (BPMN vs PNML)
  2. Extraction: Extracts business-relevant text (activity names, events, etc.)
  3. Technical Filtering: Removes technical IDs (task_12j0pib, p1, t3, etc.)
  4. Structural Filtering: Removes tool/system terms (woped, designer, start, end, etc.)

Configuration: Set ENABLE_DIAGRAM_PREPROCESSING=true/false in app/config/config.env

βš™οΈ Configuration Management

The system is highly configurable through environment variables in app/config/config.env:

  • Database settings: ChromaDB persist directory configuration
  • ChromaDB settings: Telemetry, connection parameters
  • Document processing: Chunk size, overlap, PDF directory
  • RAG pipeline: Similarity threshold, result count, embedding model
  • Diagram preprocessing: Enable/disable semantic extraction
  • Server configuration: Host, port settings
  • Logging: Log level configuration

See app/config/config.env for all available configuration options.

πŸ§ͺ Comprehensive Test Coverage

108 total tests ensuring system reliability:

Unit Tests (tests/unit/)

  • Services: Business logic testing (database, RAG, PDF, query extraction)
  • Adapters: Infrastructure testing (LangChain, extractors, loaders)
  • Presentation: API endpoint validation

Integration Tests (tests/integration/)

  • End-to-End workflows: Complete RAG pipeline, diagram preprocessing, PDF ingestion

🧩 Technologies Used

  • Flask for REST API exposure and web server
  • LangChain for RAG orchestration and document processing
  • ChromaDB as vector database for semantic search
  • HuggingFace Transformers for embedding models and AI/ML components
  • xml.etree.ElementTree (Python standard library) for BPMN/PNML XML parsing
  • PyPDF for PDF document processing and text extraction
  • pytest for comprehensive unit and integration testing
  • Hexagonal Architecture for clean separation of concerns and testability

Branching & Commit: Process in VS Code

  1. Create new branch
  • Go to the Source Control panel (or Ctrl + Shift + G)
Source Control Panel
  • click on the menue icon and select "Branch" --> "Create Branch"
Create Branch
  • type a name for the branch and hit ENTER, you are now in the new branch
switch branch
- If you like to change between branches, click the icon which shows your current branch (here: \Test) and select the branch you would like to change to.
  1. Stage changes
  • If you want to save your changeslocally to your current branch go to the source panel
  • select the plus-symbol to stage all your changes
stage changes
  • enter a commit message, containing a summary of the changes you made
commit message
  • click on Commit or press Ctrl + Enter

Keep in mind, that a commit only saves your changes locally on your machine. To make the committed changes available to other developers, you need to push your changes too. You can commit regularly during your work, but you don't need to push every time you made a commit.

  1. Push changes
  • Go to the Source Control panel
  • click the menu icon
  • select "Pull, Push" --> "Push"
push

Branching and commit via Terminal

  1. Create new branch
  • navigate to your Project-folder (if you opened your project folder with your IDE, the IDE terminal is already in the correct folder.)
  • type git checkout -b name-of-your-branch in your terminal, this will generate a new branch
  1. Commit changes
  • type git add . in your terminal & hit ENTER, this stages all your changes
  • type git commit -m "Your Commit message goes here" & hit ENTER, this commits your changes locally
  1. Push changes
  • type git push origin name-of-your-branch & hit ENTER, this pushes your committed changes to the repository.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5