Skip to content

OrionExplorer/data-view

Repository files navigation

πŸ“„ DataView API data-view CodeQL Advanced

DataView is an API-based platform designed for secure and efficient conversion of emails and attachments to PDF files.
It focuses on data security, seamless integration, and a flexible credit-based billing system tailored to each user.

The system provides endpoints for:

  • Converting raw email content to PDF with embedded security features
  • Converting various document attachments to PDF for secure viewing
  • Managing API usage through a credit-based billing model

πŸ“š Table of Contents

  1. πŸ“Š Use Cases for DataView API
  2. 🚨 Error Handling
  3. πŸ”‘ Authentication
  4. πŸ’‘ Final Notes
  5. πŸ“Š Usage
  6. πŸ’³ Billing System
  7. πŸš€ Technologies Used
  8. πŸ“¦ Deployment Architecture
  9. βš™οΈ Deployment Instructions
  10. ⚑ Scaling LibreOffice Instances
  11. πŸ“œ License

πŸ“Š Use Cases for DataView API

This section describes common use cases for interacting with the DataView API, focusing on the following endpoints:

  • /api/v1/email-to-pdf/ – Convert emails to PDF
  • /api/v1/attachment-to-pdf/ – Convert attachments to PDF
  • /api/v1/download/ – Download converted PDF files

These endpoints support both simple and advanced workflows, making DataView flexible for different business scenarios.

βš™οΈ Using the mode Parameter

The mode parameter allows you to control the format of the API response. It can be passed as a GET parameter in the URL:

  • mode=file_id (default) – Returns a file_id for downloading the PDF later.
  • mode=inline_pdf – Returns the PDF directly in the response.
  • mode=base64_pdf – Returns the PDF as a Base64-encoded string in JSON.

Example:

POST /api/v1/email-to-pdf/?mode=inline_pdf

Security Note: When using mode=inline_pdf or mode=base64_pdf, the generated PDF is not stored on the server. The file is processed in memory and deleted immediately after the response is sent. This approach helps meet data protection regulations such as GDPR, HIPAA, and ISO/IEC 27001, ensuring that sensitive data is not retained unnecessarily.


πŸ“§ /api/v1/email-to-pdf/ – Email to PDF Conversion

This endpoint converts an entire email into a PDF file, preserving key metadata (subject, sender, recipient, date) and the body content.

βœ… 1. Use Case: Converting EML Files to PDF

Request:

POST /api/v1/email-to-pdf/?mode=file_id
Content-Type: multipart/form-data
X-API-KEY: your_api_key

--boundary
Content-Disposition: form-data; name="file"; filename="email.eml"
Content-Type: message/rfc822

<email content>
--boundary--

Response:

{
  "file_id": "abc123xyz"
}

βœ… 2. Use Case: Sending Email Data as JSON

Request:

POST /api/v1/email-to-pdf/?mode=inline_pdf
Content-Type: application/json
X-API-KEY: your_api_key

{
  "subject": "Project Update",
  "sender": "[email protected]",
  "recipient": "[email protected]",
  "body": "Hello Bob, here is the project update."
}

Response:

  • Returns the PDF directly with Content-Type: application/pdf.

πŸ“Ž /api/v1/attachment-to-pdf/ – Attachment to PDF Conversion

This endpoint is designed to convert document attachments (e.g., DOCX, XLSX, CSV) into PDF files for secure viewing.

βœ… 1. Use Case: Converting Uploaded Files

Request:

POST /api/v1/attachment-to-pdf/?mode=file_id
Content-Type: multipart/form-data
X-API-KEY: your_api_key

--boundary
Content-Disposition: form-data; name="file"; filename="document.docx"
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document

<binary file content>
--boundary--

Response:

{
  "file_id": "def456uvw"
}

βœ… 2. Use Case: Sending Attachment as Base64

Request:

POST /api/v1/attachment-to-pdf/?mode=base64_pdf
Content-Type: application/json
X-API-KEY: your_api_key

{
  "filename": "report.docx",
  "content": "BASE64_ENCODED_CONTENT"
}

Response:

{
  "pdf_base64": "JVBERi0xLjQKJ... (truncated)"
}

πŸ“₯ /api/v1/download/ – Download Converted PDFs

This endpoint allows users to download previously converted PDF files using the unique file_id received after conversion.

βœ… 1. Use Case: Downloading a Converted PDF

Users can request the file in different response formats using the mode parameter.

Available Modes:

  • mode=inline_pdf – Returns the PDF directly in the response.
  • mode=base64_pdf – Returns the PDF as a Base64-encoded string in JSON.

Request:

GET /api/v1/download/abc123xyz?mode=inline_pdf
X-API-KEY: your_api_key

Response:

  • Returns the converted PDF file in the requested format.

βœ… 2. Use Case: Downloading and Removing a Converted PDF

Users can request automatic deletion of the file after download by adding ?remove=true to the request.

Request:

GET /api/v1/download/abc123xyz?mode=inline_pdf&remove=true
X-API-KEY: your_api_key

Response:

  • Returns the converted PDF file and deletes it from the server after successful download.

⚠️ Invalid Mode Handling

If an unsupported mode is provided, such as file_id, the API returns an error:

Request:

GET /api/v1/download/abc123xyz?mode=file_id
X-API-KEY: your_api_key

Response:

{
  "error": "Mode \"file_id\" is invalid for this API endpoint."
}

Security Note: If the remove parameter is not provided or set to false, the file remains on the server until manually deleted.


🚨 Error Handling

❌ 1. Insufficient Credits

If the user's account does not have enough credits to process the request:

{
  "error": "Insufficient credits. Please top up your account.",
  "required_credits": 10,
  "available_credits": 5
}

❌ 2. Invalid API Key

When an incorrect or missing API key is provided:

{
  "error": "Invalid API key."
}

❌ 3. File Not Found (Download Endpoint)

If the provided file_id does not exist or the user lacks permission:

{
  "error": "File does not exist or you do not have access."
}

πŸ”‘ Authentication

All API requests require an x-api-key header for authentication.

-H "x-api-key: YOUR_API_KEY"

Ensure that the API key has sufficient credits for both uploads and downloads, as DataView uses a credit-based billing system.


πŸ’‘ Final Notes

  • Efficient Data Flow: DataView supports both direct file uploads and JSON-based automation for seamless integrations.
  • Flexible Formats: Compatible with common file types, ensuring smooth document conversions.
  • Secure Access: API key authentication and credit-based billing for controlled usage.

πŸ“Š Usage

πŸ”‘ API Key Management

The core model for authentication and billing is the ApiKey model, which is linked to a specific User.
Each ApiKey includes the following properties:

  • api_key – Auto-generated key used for authenticating API requests via the custom header x-api-key.
  • credits – A virtual currency used to track API usage. Credits are consumed based on data transfers during both uploads and downloads.
  • billing_credit_cost – Defines the cost (in credits) for each data chunk processed.
  • billing_chunk_kb – Specifies the size of each data chunk (in KB) used for billing.
  • billing_min_chunk_kb – Defines the minimum data size (in KB) that will be billed, even if the actual data is smaller.

πŸ“Œ API Versioning

DataView API uses versioning to ensure backward compatibility while allowing for continuous improvements and feature updates.

πŸš€ Current Version: v1

All API endpoints are prefixed with the version number:

/api/v1/email-to-pdf/
/api/v1/attachment-to-pdf/
/api/v1/download/<file_id>/

πŸ“Š Versioning Strategy:

  • Major Versions (v1, v2, ...): Introduced when backward-incompatible changes are made.
  • Minor Versions (v1.1, v1.2, ...): For adding new features in a backward-compatible manner.
  • Patch Versions (v1.1.1, v1.1.2, ...): Bug fixes and security updates without affecting functionality.

βš™οΈ How to Use Versions:

Simply include the version number in the API URL:

curl -X POST "http://data-view.local/api/v1/email-to-pdf/" \
     -H "x-api-key: YOUR_API_KEY" \
     -F "file=@/path/to/email.eml"

πŸ’³ Billing System

DataView uses a credit-based billing system to track API usage:

  • Credits are consumed for both uploading data (e.g., emails, attachments) and downloading converted PDFs.
  • Billing is user-specific, based on the API key associated with each account.
  • Credit consumption is proportional to data size and processing demands.

If there are insufficient credits, API requests will return an error with status code 402 (Payment Required).


πŸ’‘ Billing Example

Consider the following scenario where a user uploads a document and later downloads the converted PDF file.

Scenario:

  • Upload: A document with a size of 5 MB (5120 KB) is uploaded for conversion.
  • Download: The converted PDF file has a size of 1.2 MB (1229 KB) and is downloaded.

Billing Configuration:

  • Chunk Size (KB): 10.

Credits are charged for every 10 KB of data transferred.

  • Credit Cost per Chunk: 0.1.

Each 10 KB chunk costs 0.1 credit.

  • Minimum Chunk Size (KB): 10.

The minimum data size charged is 10 KB, even for smaller files.


πŸ“€ Upload Calculation (5 MB file):

  1. File Size: 5120 KB
  2. Chunks: 5120 KB Γ· 10 KB = 512 chunks
  3. Credits Charged: 512 Γ— 0.1 = 51.2 credits

πŸ“₯ Download Calculation (1.2 MB file):

  1. File Size: 1229 KB
  2. Chunks: 1229 KB Γ· 10 KB = 123 chunks (rounded up)
  3. Credits Charged: 123 Γ— 0.1 = 12.3 credits

βœ… Total Credits Charged

  • Upload: 51.2 credits
  • Download: 12.3 credits

Total Credits Used: 63.5 credits


⚠️ Important Notes:

  • Credits are calculated before processing the request.
  • If the user does not have enough credits, the API will return an error before uploading or downloading starts.
  • Minimum chunk size applies: Even files smaller than 10 KB will be charged as 1 full chunk (0.1 credit).

🚨 Example Error (Insufficient Credits):

{
  "error": "Insufficient credits. Please top up your account.",
  "required_credits": 63.5,
  "available_credits": 40.0
}
  • required_credits – Number of credits needed to process the request.
  • available_credits – Current credit balance of the API key.

In this case, the user had 40 credits, which is insufficient for the total cost of 63.5 credits.
The request will be rejected until the user tops up their credits.


πŸ“ˆ API Request Billing History

Every API request is logged using the ApiKeyCreditHistory model. This provides detailed tracking of API usage for billing purposes.

Tracked Information:

  • API Key used for the request
  • Date and Time of the request
  • Requested Endpoint with query parameters
  • Response Size (in KB)
  • Chunk Size (from the model’s billing property at the time of the request)
  • Number of Chunks generated
  • Credit Cost per Chunk (from the model’s billing property)
  • Total Credits Charged for the request
  • Credit Balance Before Charge
  • IP Address of the requester
  • Unique Request Identifier (as stored in system logs)

πŸ’³ Top-up Credits

To add credits to an API key, the system uses the ApiKeyCreditTopUp model.

Top-up Details Tracked:

  • API Key to which the credits are applied
  • Credits Added to the API key’s current balance
  • Date and Time of the top-up transaction

Credits are immediately available after a successful top-up, allowing uninterrupted API usage.


πŸš€ Technologies Used

Core Technologies

  • Django 5.1.5 – Backend framework for API management
  • Gunicorn – WSGI HTTP server for running the Django application
  • Nginx – Reverse proxy for handling incoming HTTP(S) requests

Document Conversion & Processing

  • LibreOffice (headless mode) – For converting document formats to PDF
  • soffice – Command-line interface for LibreOffice
  • WeasyPrint 64.0 – HTML/CSS to PDF converter for rendering email content
  • BeautifulSoup4 4.12.3 – HTML parsing and data extraction from email content

Database & Storage

  • PostgreSQL 17.2 – Relational database for managing user data and billing
  • Volumes:
    • postgres_data – Persistent data storage for PostgreSQL
    • shared_files – Shared volume between containers for file management

Containerization & Orchestration

  • Docker – For containerizing and orchestrating services
  • Bridge Network – Custom Docker network app-network to facilitate inter-container communication

Security & API Management

  • API Key Authentication – Secure access to API endpoints
  • .env Files – Environment variable management for secure configuration (.env.data-view.prod, .env.data-view-db.prod)
  • HTTPS (via Nginx reverse proxy) – Secure data transmission (configurable)

πŸ“¦ Deployment Architecture

The system is composed of multiple Docker containers:

  • core – Runs the Django application using Gunicorn
    • Exposes port 8888 internally
    • Depends on PostgreSQL and LibreOffice services
  • libreoffice – Headless LibreOffice for document conversion
    • Exposes port 5000 for internal communication
  • db – PostgreSQL 17.2 database for data storage
    • Mapped to port 5455 for database management
  • nginx – Reverse proxy for handling incoming HTTP(S) requests
    • Exposes external port 9393 for public API access

The containers communicate via the app-network (bridge network).


βš™οΈ Deployment Instructions

  1. Build and run the containers:
docker-compose up --build -d
  1. Access the API via Nginx:
http://data-view.local:9393/
  1. Check running containers:
docker-compose ps

⚑ Scaling LibreOffice Instances

DataView supports dynamic scaling of LibreOffice instances to improve performance and handle high loads.
This can be achieved by using the --scale option in Docker Compose:

docker compose up -d --scale libreoffice=N

where N is the number of desired LibreOffice instances.

πŸ”„ How Scaling Works

  • DataView automatically detects active LibreOffice instances and distributes conversion requests across them.
  • If no instance is available, the API returns a 503 Service Unavailable error.
  • Load balancing is handled dynamically, ensuring efficient resource usage.

This allows the system to adapt to workload spikes while maintaining high availability and fault tolerance. πŸš€

πŸ“ƒ Further Reading

For more details on scaling containers, including information about allocated CPU/RAM resources, refer to the Docker documentation.


πŸ“œ License

This project is licensed under the Elastic License 2.0 (ELv2).

  • Usage: Free for personal, internal, and non-commercial use.
  • Commercial SaaS Use: Requires a separate commercial license.
  • Restrictions: You may not offer the DataView project as a SaaS or hosted service without explicit permission.

See LICENSE for full details.

About

DataView

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published