A flexible data transformation library with a plugin system for Python.
Tukuy (meaning "to transform" or "to become" in Quechua) is a powerful and extensible data transformation library that makes it easy to manipulate, validate, and extract data from various formats. With its plugin architecture, Tukuy provides a unified interface for working with text, HTML, JSON, dates, numbers, and more.
- 🧩 Plugin System: Easily extend functionality with custom plugins
- 🔄 Chainable Transformers: Compose multiple transformations in sequence
- 🧪 Type-safe Transformations: With built-in validation
- 📊 Rich Set of Built-in Transformers:
- 📝 Text manipulation (case conversion, trimming, regex, etc.)
- 🌐 HTML processing and extraction
- 📅 Date parsing and calculations
- 🔢 Numerical operations
- ✅ Data validation
- 📋 JSON parsing and extraction
- 🔍 Pattern-based Data Extraction: Extract structured data from HTML and JSON
- 🛡️ Error Handling: Comprehensive error handling with detailed messages
pip install tukuy
from tukuy import TukuyTransformer
# Create transformer
TUKUY = TukuyTransformer()
# Basic text transformation
text = " Hello World! "
result = TUKUY.transform(text, [
"strip",
"lowercase",
{"function": "truncate", "length": 5}
])
print(result) # "hello..."
# HTML transformation
html = "<div>Hello <b>World</b>!</div>"
result = TUKUY.transform(html, [
"strip_html_tags",
"lowercase"
])
print(result) # "hello world!"
# Date transformation
date_str = "2023-01-01"
age = TUKUY.transform(date_str, [
{"function": "age_calc"}
])
print(age) # 1
# Validation
email = "[email protected]"
valid = TUKUY.transform(email, ["email_validator"])
print(valid) # "[email protected]" or None if invalid
Tukuy uses a plugin system to organize transformers into logical groups and make it easy to extend functionality.
- 📝 text: Basic text transformations (strip, lowercase, regex, etc.)
- 🌐 html: HTML manipulation and extraction
- 📅 date: Date parsing and calculations
- ✅ validation: Data validation and formatting
- 🔢 numerical: Number manipulation and calculations
- 📋 json: JSON parsing and extraction
You can create custom plugins by extending the TransformerPlugin
class:
from tukuy.plugins import TransformerPlugin
from tukuy.base import ChainableTransformer
class ReverseTransformer(ChainableTransformer[str, str]):
def validate(self, value: str) -> bool:
return isinstance(value, str)
def _transform(self, value: str, context=None) -> str:
return value[::-1]
class MyPlugin(TransformerPlugin):
def __init__(self):
super().__init__("my_plugin")
@property
def transformers(self):
return {
'reverse': lambda _: ReverseTransformer('reverse')
}
# Usage
TUKUY = TukuyTransformer()
TUKUY.register_plugin(MyPlugin())
result = TUKUY.transform("hello", ["reverse"]) # "olleh"
See the example plugin for a more detailed example.
Plugins can implement initialize()
and cleanup()
methods for setup and teardown:
class MyPlugin(TransformerPlugin):
def initialize(self) -> None:
super().initialize()
# Load resources, connect to databases, etc.
def cleanup(self) -> None:
super().cleanup()
# Close connections, free resources, etc.
Tukuy provides powerful pattern-based extraction capabilities for both HTML and JSON data.
pattern = {
"properties": [
{
"name": "title",
"selector": "h1",
"transform": ["strip", "lowercase"]
},
{
"name": "links",
"selector": "a",
"attribute": "href",
"type": "array"
}
]
}
data = TUKUY.extract_html_with_pattern(html, pattern)
pattern = {
"properties": [
{
"name": "user",
"selector": "data.user",
"properties": [
{
"name": "name",
"selector": "fullName",
"transform": ["strip"]
}
]
}
]
}
data = TUKUY.extract_json_with_pattern(json_str, pattern)
Tukuy is designed to handle a wide range of data transformation scenarios:
- 🌐 Web Scraping: Extract structured data from HTML pages
- 📊 Data Cleaning: Normalize and validate data from various sources
- 🔄 Format Conversion: Transform data between different formats
- 📝 Text Processing: Apply complex text transformations
- 🔍 Data Extraction: Extract specific information from complex structures
- ✅ Validation: Ensure data meets specific criteria
- 🔗 Chain Transformations: Use chained transformations to avoid intermediate objects
- 🧩 Use Built-in Transformers: Built-in transformers are optimized for performance
- 🔍 Be Specific with Selectors: More specific selectors are faster to process
- 🛠️ Custom Transformers: For performance-critical operations, create custom transformers
- 📦 Batch Processing: Process data in batches for better performance
Tukuy provides comprehensive error handling with detailed error messages:
from tukuy.exceptions import ValidationError, TransformationError, ParseError
try:
result = TUKUY.transform(data, transformations)
except ValidationError as e:
print(f"Validation failed: {e}")
except ParseError as e:
print(f"Parsing failed: {e}")
except TransformationError as e:
print(f"Transformation failed: {e}")
Contributions are welcome! Here's how you can help:
- 🍴 Fork the repository
- 🌿 Create a feature branch (
git checkout -b feature/amazing-feature
) - 💻 Make your changes
- ✅ Run tests with
pytest
- 📝 Update documentation if needed
- 🔄 Commit your changes (
git commit -m 'Add amazing feature'
) - 🚀 Push to the branch (
git push origin feature/amazing-feature
) - 🔍 Open a Pull Request