DataPorter is an open-source, JSON-configuration–driven tool for validating, transforming, and migrating bulk Excel data — designed as a developer-friendly and feature-rich alternative to YoBulk.dev.
Most data migration/validation tools are either UI-only, closed-source, or limited in scope.
DataPorter is different:
- ⚡ Automation-first → JSON Configuration
- 📊 Excel-native → Validate & transform Excel out-of-the-box
- 🔄 Validation + Transformation + Migration → all in one pipeline
- 🔓 Truly open-source (GPL-3.0)
Feature | YoBulk.dev | DataPorter |
---|---|---|
Open-source | Partially | ✅ 100% GPL-3.0 |
Config style | UI-only | ✅ JSON configs |
Excel validation | ✅ Basic | ✅ Advanced (schema, regex, rules) |
Data transformation | ❌ Not supported | ✅ Built-in (through json configured rules) |
Bulk migration | ❌ Limited | ✅ Scalable |
Extensibility | ❌ Closed | ✅ Pluggable |
- Excel File Validator → enforce schema, regex, ranges, uniqueness
- Transformer Engine → map, clean, and normalize bulk data
- JSON-driven Configs → no vendor lock-in, reusable rules
- CLI & Library Support → run standalone or embed in apps
- Extensible I/O → start with Excel, extend to CSV, JSON, DBs
- Enterprise-scale Ready → repeatable, automation-friendly pipelines
DataPorter is in active development. Below is the breakdown of completed, in-progress, and planned components.
This status section is intended both for users (to know what works today) and contributors (to see where help is needed).
- Excel Processor
- Implemented using Apache POI and Streaming Reader for handling large Excel files efficiently.
- Supports streaming read instead of loading entire workbook into memory.
- Sheet Processor
- Iterates through sheets with streaming support.
- Provides hooks for sheet-level validation and transformation.
- Row Processor
- Iterates through rows sequentially.
- Provides hooks for row-level validation and transformation.
Validation is structured at three levels:
- Workbook-level
- Sheet-level
- Row-level
- Rules not implemented yet
(planned: sheet count limits, sheet naming conventions, metadata checks)
- Implemented Rules:
emptyRowsValidation
→ detects completely empty rowsminRows
→ ensures minimum row countmaxRows
→ ensures maximum row count
- Planned:
- Data type validation (string, number, date, boolean, etc.)
- Regex validation (e.g., email, phone number patterns)
- Range validation (e.g., numeric min/max)
- Uniqueness checks (across rows)
- Nullability rules
- Column Mapping
- Map input column → output column
- Data Normalization
- Trim, lowercase/uppercase, date format conversion
- Computed Fields
- Generate new fields based on formulas or transformations
- Chained Transformations
- Apply multiple transformations in sequence
- Basic JSON parsing (read config file)
- Validation rules defined in JSON
- Transformation rules defined in JSON
- Schema validation for configuration files
- Input: Excel (XLSX, streaming)
- Input: CSV
- Input: JSON
- Input: Database connectors
- Output: Excel
- Output: CSV
- Output: JSON
- Output: Database connectors
- Unit tests implemented for:
- Loading and Parsing Configuration
- Sheet-level validation rules (
emptyRowsValidation
,minRows
,maxRows
) - Processor flow
- Validation flow
- Integration tests for multi-level validation
- Benchmark tests for large Excel files
- Code coverage reporting (CI/CD integration)
- Developer Guide
- End-user Guide
- Config reference (all validation & transformation rules)
- Example configs for real-world datasets