Integration of Retrieval Augmented Generation for WoPeD.
=======
- π Getting Started
- π¦ Requirements
- βοΈ Local Setup
- π§± Architecture Overview
- π§© Technologies Used
- Branching & Commit: Process in VS Code
- Branching and commit via Terminal
This project is a modular, testable Retrieval-Augmented Generation (RAG) API built with Flask, LangChain, and ChromaDB β structured using hexagonal architecture.
- Python 3.10+
- pip
- (Optional) Docker/Podman
- OpenAI API Key (or compatible LLM provider)
-
Clone the repository
git clone https://github.com/your-username/rag-api.git -
Create a virtual environment
python -m venv venv
source venv/bin/activate
Windows:
source venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Run the Flask application
python main.py
- when using the REST-API Controller, use the subdomain /rag/?query=X for adding the parameter
- Run using Docker (alternative)
docker build -t rag-api . or podman build -t rag-api .
docker run -p 5000:5000 rag-api or podman run -p 5000:5000 rag-api
-
Run with Logging
The default logging is set towarningand will be active automatically when starting the application. The default can be changed by setting theLOG_LEVELenvironment variable inapp/config/config.env.Flask
python main.py --loglevel <level>Docker
docker run -p 5000:5000 rag-api --loglevel <level>Possible values for
<level>:
debug: displays all messages
info: displays infos, warnings, errors, and critical messages
warning: displays warnings, errors, and critical messages
error: displays errors and critical messages
critical: displays only critical messages
This project follows a hexagonal architecture pattern for a clean, testable, and maintainable Retrieval-Augmented Generation (RAG) API.
The ApplicationService is the main entry point and orchestrator of the entire system:
- Technology-agnostic: Contains only business logic, no infrastructure dependencies
- Coordinates workflows: PDF processing, RAG pipeline, document management
- Delegates to domain services: Each service handles specific business concerns
- Singleton pattern: Centralized access point configured in
main.py
ApplicationService.py: Main business orchestrator and entry pointports/: Abstract interfaces (DatabasePort, RAGPort, PDFLoaderPort, QueryExtractorPort)services/: Business logic services that handle success loggingDatabaseService: Document CRUD operationsRAGService: Retrieval & augmentation pipelinePDFService: Document processing workflowsQueryExtractionService: Diagram preprocessing logic
Infrastructure adapters handle technical implementation and detailed error handling:
db/:DatabaseAdapter: ChromaDB integration via LangChainClientPDFLoader: Document parsing and chunking
rag/:RAGAdapter: RAG pipeline implementation using LangChainLangchainClient: LangChain vector operations
preprocessing/:BpmnQueryExtractor: BPMN diagram processingPnmlQueryExtractor: PNML diagram processing
controller/: REST API endpoints via Flask (RESTController.py)
Key Feature: Improves RAG search quality by extracting meaningful business terms from BPMN/PNML diagrams.
How it works:
- Detection: Automatically detects diagram format (BPMN vs PNML)
- Extraction: Extracts business-relevant text (activity names, events, etc.)
- Technical Filtering: Removes technical IDs (
task_12j0pib,p1,t3, etc.) - Structural Filtering: Removes tool/system terms (
woped,designer,start,end, etc.)
Configuration: Set ENABLE_DIAGRAM_PREPROCESSING=true/false in app/config/config.env
The system is highly configurable through environment variables in app/config/config.env:
- Database settings: ChromaDB persist directory configuration
- ChromaDB settings: Telemetry, connection parameters
- Document processing: Chunk size, overlap, PDF directory
- RAG pipeline: Similarity threshold, result count, embedding model
- Diagram preprocessing: Enable/disable semantic extraction
- Server configuration: Host, port settings
- Logging: Log level configuration
See app/config/config.env for all available configuration options.
108 total tests ensuring system reliability:
- Services: Business logic testing (database, RAG, PDF, query extraction)
- Adapters: Infrastructure testing (LangChain, extractors, loaders)
- Presentation: API endpoint validation
- End-to-End workflows: Complete RAG pipeline, diagram preprocessing, PDF ingestion
- Flask for REST API exposure and web server
- LangChain for RAG orchestration and document processing
- ChromaDB as vector database for semantic search
- HuggingFace Transformers for embedding models and AI/ML components
- xml.etree.ElementTree (Python standard library) for BPMN/PNML XML parsing
- PyPDF for PDF document processing and text extraction
- pytest for comprehensive unit and integration testing
- Hexagonal Architecture for clean separation of concerns and testability
- Create new branch
- Go to the Source Control panel (or Ctrl + Shift + G)
- click on the menue icon and select "Branch" --> "Create Branch"
- type a name for the branch and hit ENTER, you are now in the new branch
- Stage changes
- If you want to save your changeslocally to your current branch go to the source panel
- select the plus-symbol to stage all your changes
- enter a commit message, containing a summary of the changes you made
- click on Commit or press Ctrl + Enter
Keep in mind, that a commit only saves your changes locally on your machine. To make the committed changes available to other developers, you need to push your changes too. You can commit regularly during your work, but you don't need to push every time you made a commit.
- Push changes
- Go to the Source Control panel
- click the menu icon
- select "Pull, Push" --> "Push"
- Create new branch
- navigate to your Project-folder (if you opened your project folder with your IDE, the IDE terminal is already in the correct folder.)
- type
git checkout -b name-of-your-branchin your terminal, this will generate a new branch
- Commit changes
- type
git add .in your terminal & hit ENTER, this stages all your changes - type
git commit -m "Your Commit message goes here"& hit ENTER, this commits your changes locally
- Push changes
- type
git push origin name-of-your-branch& hit ENTER, this pushes your committed changes to the repository.





