🚀 DataPorter

DataPorter is an open-source, JSON-configuration–driven tool for validating, transforming, and migrating bulk Excel data — designed as a developer-friendly and feature-rich alternative to YoBulk.dev.

✨ Why DataPorter?

Most data migration/validation tools are either UI-only, closed-source, or limited in scope.
DataPorter is different:

⚡ Automation-first → JSON Configuration
📊 Excel-native → Validate & transform Excel out-of-the-box
🔄 Validation + Transformation + Migration → all in one pipeline
🔓 Truly open-source (GPL-3.0)

🆚 YoBulk vs DataPorter

Feature	YoBulk.dev	DataPorter
Open-source	Partially	✅ 100% GPL-3.0
Config style	UI-only	✅ JSON configs
Excel validation	✅ Basic	✅ Advanced (schema, regex, rules)
Data transformation	❌ Not supported	✅ Built-in (through json configured rules)
Bulk migration	❌ Limited	✅ Scalable
Extensibility	❌ Closed	✅ Pluggable

🚀 Features

Excel File Validator → enforce schema, regex, ranges, uniqueness
Transformer Engine → map, clean, and normalize bulk data
JSON-driven Configs → no vendor lock-in, reusable rules
CLI & Library Support → run standalone or embed in apps
Extensible I/O → start with Excel, extend to CSV, JSON, DBs
Enterprise-scale Ready → repeatable, automation-friendly pipelines

🛠 Development Status

DataPorter is in active development. Below is the breakdown of completed, in-progress, and planned components.
This status section is intended both for users (to know what works today) and contributors (to see where help is needed).

1️⃣ Core Processing Engine

Excel Processor
- Implemented using Apache POI and Streaming Reader for handling large Excel files efficiently.
- Supports streaming read instead of loading entire workbook into memory.
Sheet Processor
- Iterates through sheets with streaming support.
- Provides hooks for sheet-level validation and transformation.
Row Processor
- Iterates through rows sequentially.
- Provides hooks for row-level validation and transformation.

2️⃣ Validation Framework

Validation is structured at three levels:

Workbook-level
Sheet-level
Row-level

Workbook-level Validation

Rules not implemented yet
(planned: sheet count limits, sheet naming conventions, metadata checks)

Sheet-level Validation

Implemented Rules:
- emptyRowsValidation → detects completely empty rows
- minRows → ensures minimum row count
- maxRows → ensures maximum row count

Row-level Validation

Planned:
- Data type validation (string, number, date, boolean, etc.)
- Regex validation (e.g., email, phone number patterns)
- Range validation (e.g., numeric min/max)
- Uniqueness checks (across rows)
- Nullability rules

3️⃣ Transformation Engine

Column Mapping
- Map input column → output column
Data Normalization
- Trim, lowercase/uppercase, date format conversion
Computed Fields
- Generate new fields based on formulas or transformations
Chained Transformations
- Apply multiple transformations in sequence

4️⃣ Configuration (JSON-driven)

Basic JSON parsing (read config file)
Validation rules defined in JSON
Transformation rules defined in JSON
Schema validation for configuration files

6️⃣ Input / Output Formats

7️⃣ Testing & Quality

Unit tests implemented for:
- Loading and Parsing Configuration
- Sheet-level validation rules (emptyRowsValidation, minRows, maxRows)
- Processor flow
- Validation flow
Integration tests for multi-level validation
Benchmark tests for large Excel files
Code coverage reporting (CI/CD integration)

8️⃣ Documentation

Developer Guide
End-user Guide
Config reference (all validation & transformation rules)
Example configs for real-world datasets

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 DataPorter

✨ Why DataPorter?

🆚 YoBulk vs DataPorter

🚀 Features

🛠 Development Status

1️⃣ Core Processing Engine

2️⃣ Validation Framework

Workbook-level Validation

Sheet-level Validation

Row-level Validation

3️⃣ Transformation Engine

4️⃣ Configuration (JSON-driven)

6️⃣ Input / Output Formats

7️⃣ Testing & Quality

8️⃣ Documentation

About

Uh oh!

Languages

License

mharisraza/DataPorter

Folders and files

Latest commit

History

Repository files navigation

🚀 DataPorter

✨ Why DataPorter?

🆚 YoBulk vs DataPorter

🚀 Features

🛠 Development Status

1️⃣ Core Processing Engine

2️⃣ Validation Framework

Workbook-level Validation

Sheet-level Validation

Row-level Validation

3️⃣ Transformation Engine

4️⃣ Configuration (JSON-driven)

6️⃣ Input / Output Formats

7️⃣ Testing & Quality

8️⃣ Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages