DataView is an API-based platform designed for secure and efficient conversion of emails and attachments to PDF files.
It focuses on data security, seamless integration, and a flexible credit-based billing system tailored to each user.
The system provides endpoints for:
- Converting raw email content to PDF with embedded security features
- Converting various document attachments to PDF for secure viewing
- Managing API usage through a credit-based billing model
- π Use Cases for DataView API
- π¨ Error Handling
- π Authentication
- π‘ Final Notes
- π Usage
- π³ Billing System
- π Technologies Used
- π¦ Deployment Architecture
- βοΈ Deployment Instructions
- β‘ Scaling LibreOffice Instances
- π License
This section describes common use cases for interacting with the DataView API, focusing on the following endpoints:
/api/v1/email-to-pdf/
β Convert emails to PDF/api/v1/attachment-to-pdf/
β Convert attachments to PDF/api/v1/download/
β Download converted PDF files
These endpoints support both simple and advanced workflows, making DataView flexible for different business scenarios.
The mode
parameter allows you to control the format of the API response. It can be passed as a GET parameter in the URL:
mode=file_id
(default) β Returns afile_id
for downloading the PDF later.mode=inline_pdf
β Returns the PDF directly in the response.mode=base64_pdf
β Returns the PDF as a Base64-encoded string in JSON.
Example:
POST /api/v1/email-to-pdf/?mode=inline_pdf
Security Note: When using mode=inline_pdf
or mode=base64_pdf
, the generated PDF is not stored on the server. The file is processed in memory and deleted immediately after the response is sent. This approach helps meet data protection regulations such as GDPR, HIPAA, and ISO/IEC 27001, ensuring that sensitive data is not retained unnecessarily.
This endpoint converts an entire email into a PDF file, preserving key metadata (subject, sender, recipient, date) and the body content.
Request:
POST /api/v1/email-to-pdf/?mode=file_id
Content-Type: multipart/form-data
X-API-KEY: your_api_key
--boundary
Content-Disposition: form-data; name="file"; filename="email.eml"
Content-Type: message/rfc822
<email content>
--boundary--
Response:
{
"file_id": "abc123xyz"
}
Request:
POST /api/v1/email-to-pdf/?mode=inline_pdf
Content-Type: application/json
X-API-KEY: your_api_key
{
"subject": "Project Update",
"sender": "[email protected]",
"recipient": "[email protected]",
"body": "Hello Bob, here is the project update."
}
Response:
- Returns the PDF directly with
Content-Type: application/pdf
.
This endpoint is designed to convert document attachments (e.g., DOCX, XLSX, CSV) into PDF files for secure viewing.
Request:
POST /api/v1/attachment-to-pdf/?mode=file_id
Content-Type: multipart/form-data
X-API-KEY: your_api_key
--boundary
Content-Disposition: form-data; name="file"; filename="document.docx"
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
<binary file content>
--boundary--
Response:
{
"file_id": "def456uvw"
}
Request:
POST /api/v1/attachment-to-pdf/?mode=base64_pdf
Content-Type: application/json
X-API-KEY: your_api_key
{
"filename": "report.docx",
"content": "BASE64_ENCODED_CONTENT"
}
Response:
{
"pdf_base64": "JVBERi0xLjQKJ... (truncated)"
}
This endpoint allows users to download previously converted PDF files using the unique file_id
received after conversion.
Users can request the file in different response formats using the mode
parameter.
Available Modes:
mode=inline_pdf
β Returns the PDF directly in the response.mode=base64_pdf
β Returns the PDF as a Base64-encoded string in JSON.
Request:
GET /api/v1/download/abc123xyz?mode=inline_pdf
X-API-KEY: your_api_key
Response:
- Returns the converted PDF file in the requested format.
Users can request automatic deletion of the file after download by adding ?remove=true
to the request.
Request:
GET /api/v1/download/abc123xyz?mode=inline_pdf&remove=true
X-API-KEY: your_api_key
Response:
- Returns the converted PDF file and deletes it from the server after successful download.
If an unsupported mode
is provided, such as file_id
, the API returns an error:
Request:
GET /api/v1/download/abc123xyz?mode=file_id
X-API-KEY: your_api_key
Response:
{
"error": "Mode \"file_id\" is invalid for this API endpoint."
}
Security Note: If the remove
parameter is not provided or set to false
, the file remains on the server until manually deleted.
If the user's account does not have enough credits to process the request:
{
"error": "Insufficient credits. Please top up your account.",
"required_credits": 10,
"available_credits": 5
}
When an incorrect or missing API key is provided:
{
"error": "Invalid API key."
}
If the provided file_id
does not exist or the user lacks permission:
{
"error": "File does not exist or you do not have access."
}
All API requests require an x-api-key
header for authentication.
-H "x-api-key: YOUR_API_KEY"
Ensure that the API key has sufficient credits for both uploads and downloads, as DataView uses a credit-based billing system.
- Efficient Data Flow: DataView supports both direct file uploads and JSON-based automation for seamless integrations.
- Flexible Formats: Compatible with common file types, ensuring smooth document conversions.
- Secure Access: API key authentication and credit-based billing for controlled usage.
The core model for authentication and billing is the ApiKey
model, which is linked to a specific User
.
Each ApiKey
includes the following properties:
api_key
β Auto-generated key used for authenticating API requests via the custom headerx-api-key
.credits
β A virtual currency used to track API usage. Credits are consumed based on data transfers during both uploads and downloads.billing_credit_cost
β Defines the cost (in credits) for each data chunk processed.billing_chunk_kb
β Specifies the size of each data chunk (in KB) used for billing.billing_min_chunk_kb
β Defines the minimum data size (in KB) that will be billed, even if the actual data is smaller.
DataView API uses versioning to ensure backward compatibility while allowing for continuous improvements and feature updates.
All API endpoints are prefixed with the version number:
/api/v1/email-to-pdf/
/api/v1/attachment-to-pdf/
/api/v1/download/<file_id>/
- Major Versions (v1, v2, ...): Introduced when backward-incompatible changes are made.
- Minor Versions (v1.1, v1.2, ...): For adding new features in a backward-compatible manner.
- Patch Versions (v1.1.1, v1.1.2, ...): Bug fixes and security updates without affecting functionality.
Simply include the version number in the API URL:
curl -X POST "http://data-view.local/api/v1/email-to-pdf/" \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@/path/to/email.eml"
DataView uses a credit-based billing system to track API usage:
- Credits are consumed for both uploading data (e.g., emails, attachments) and downloading converted PDFs.
- Billing is user-specific, based on the API key associated with each account.
- Credit consumption is proportional to data size and processing demands.
If there are insufficient credits, API requests will return an error with status code 402 (Payment Required).
Consider the following scenario where a user uploads a document and later downloads the converted PDF file.
- Upload: A document with a size of 5 MB (5120 KB) is uploaded for conversion.
- Download: The converted PDF file has a size of 1.2 MB (1229 KB) and is downloaded.
- Chunk Size (KB): 10.
Credits are charged for every 10 KB of data transferred.
- Credit Cost per Chunk: 0.1.
Each 10 KB chunk costs 0.1 credit.
- Minimum Chunk Size (KB): 10.
The minimum data size charged is 10 KB, even for smaller files.
- File Size: 5120 KB
- Chunks: 5120 KB Γ· 10 KB = 512 chunks
- Credits Charged: 512 Γ 0.1 = 51.2 credits
- File Size: 1229 KB
- Chunks: 1229 KB Γ· 10 KB = 123 chunks (rounded up)
- Credits Charged: 123 Γ 0.1 = 12.3 credits
- Upload: 51.2 credits
- Download: 12.3 credits
Total Credits Used: 63.5 credits
- Credits are calculated before processing the request.
- If the user does not have enough credits, the API will return an error before uploading or downloading starts.
- Minimum chunk size applies: Even files smaller than 10 KB will be charged as 1 full chunk (0.1 credit).
{
"error": "Insufficient credits. Please top up your account.",
"required_credits": 63.5,
"available_credits": 40.0
}
required_credits
β Number of credits needed to process the request.available_credits
β Current credit balance of the API key.
In this case, the user had 40 credits, which is insufficient for the total cost of 63.5 credits.
The request will be rejected until the user tops up their credits.
Every API request is logged using the ApiKeyCreditHistory
model. This provides detailed tracking of API usage for billing purposes.
- API Key used for the request
- Date and Time of the request
- Requested Endpoint with query parameters
- Response Size (in KB)
- Chunk Size (from the modelβs
billing
property at the time of the request) - Number of Chunks generated
- Credit Cost per Chunk (from the modelβs
billing
property) - Total Credits Charged for the request
- Credit Balance Before Charge
- IP Address of the requester
- Unique Request Identifier (as stored in system logs)
To add credits to an API key, the system uses the ApiKeyCreditTopUp
model.
- API Key to which the credits are applied
- Credits Added to the API keyβs current balance
- Date and Time of the top-up transaction
Credits are immediately available after a successful top-up, allowing uninterrupted API usage.
- Django 5.1.5 β Backend framework for API management
- Gunicorn β WSGI HTTP server for running the Django application
- Nginx β Reverse proxy for handling incoming HTTP(S) requests
- LibreOffice (headless mode) β For converting document formats to PDF
- soffice β Command-line interface for LibreOffice
- WeasyPrint 64.0 β HTML/CSS to PDF converter for rendering email content
- BeautifulSoup4 4.12.3 β HTML parsing and data extraction from email content
- PostgreSQL 17.2 β Relational database for managing user data and billing
- Volumes:
postgres_data
β Persistent data storage for PostgreSQLshared_files
β Shared volume between containers for file management
- Docker β For containerizing and orchestrating services
- Bridge Network β Custom Docker network
app-network
to facilitate inter-container communication
- API Key Authentication β Secure access to API endpoints
- .env Files β Environment variable management for secure configuration (
.env.data-view.prod
,.env.data-view-db.prod
) - HTTPS (via Nginx reverse proxy) β Secure data transmission (configurable)
The system is composed of multiple Docker containers:
core
β Runs the Django application using Gunicorn- Exposes port
8888
internally - Depends on PostgreSQL and LibreOffice services
- Exposes port
libreoffice
β Headless LibreOffice for document conversion- Exposes port
5000
for internal communication
- Exposes port
db
β PostgreSQL 17.2 database for data storage- Mapped to port
5455
for database management
- Mapped to port
nginx
β Reverse proxy for handling incoming HTTP(S) requests- Exposes external port
9393
for public API access
- Exposes external port
The containers communicate via the app-network
(bridge network).
- Build and run the containers:
docker-compose up --build -d
- Access the API via Nginx:
http://data-view.local:9393/
- Check running containers:
docker-compose ps
DataView supports dynamic scaling of LibreOffice instances to improve performance and handle high loads.
This can be achieved by using the --scale
option in Docker Compose:
docker compose up -d --scale libreoffice=N
where N
is the number of desired LibreOffice instances.
- DataView automatically detects active LibreOffice instances and distributes conversion requests across them.
- If no instance is available, the API returns a 503 Service Unavailable error.
- Load balancing is handled dynamically, ensuring efficient resource usage.
This allows the system to adapt to workload spikes while maintaining high availability and fault tolerance. π
For more details on scaling containers, including information about allocated CPU/RAM resources, refer to the Docker documentation.
This project is licensed under the Elastic License 2.0 (ELv2).
- Usage: Free for personal, internal, and non-commercial use.
- Commercial SaaS Use: Requires a separate commercial license.
- Restrictions: You may not offer the DataView project as a SaaS or hosted service without explicit permission.
See LICENSE for full details.