"Transform Data, Unleash Potential"
To provide a seamless, powerful, and flexible document conversion platform that empowers businesses and developers to transform data across multiple formats with unprecedented ease and efficiency.
We envision a world where data flows freely between formats, breaking down barriers of communication and enabling intelligent, automated document processing.
- Flexibility: Create a modular document conversion ecosystem
- Efficiency: Minimize manual data transformation efforts
- Accessibility: Make complex document conversions simple
- Extensibility: Support continuous innovation in document processing
These examples demonstrate the versatility of Text2Doc in solving real-world data transformation challenges across various industries. The library provides a flexible, powerful solution for:
- Automating complex reporting processes
- Ensuring data consistency and accuracy
- Simplifying data extraction and transformation
- Supporting multiple output formats
- Maintaining data privacy and compliance
Manual creation of sales reports is time-consuming and error-prone, requiring data extraction, formatting, and distribution.
Automated pipeline that extracts sales data, transforms it, and generates professional reports.
from text2doc import DocumentPipeline
def generate_sales_report():
pipeline = DocumentPipeline("monthly_sales_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://sales_database',
'query': '''
SELECT
product_category,
SUM(quantity) as total_quantity,
SUM(total_price) as revenue,
AVG(unit_price) as avg_price
FROM sales
WHERE sale_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
GROUP BY product_category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'sort_by': 'revenue'},
{'top_n': 10}
]
})
pipeline.add_stage('html', {
'template': 'sales_report_template.html'
})
pipeline.add_stage('pdf')
pipeline.add_stage('print', {
'printer': 'management_reports_printer'
})
pipeline.execute()
Difficulty in tracking and analyzing customer support interactions across multiple channels.
Consolidate support ticket data from various sources and generate comprehensive analysis reports.
from text2doc import DocumentPipeline
def support_ticket_analysis():
pipeline = DocumentPipeline("support_ticket_insights")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://support_db',
'query': '''
SELECT
category,
COUNT(*) as ticket_count,
AVG(resolution_time) as avg_resolution_time,
COUNT(CASE WHEN status = 'resolved' THEN 1 END) as resolved_tickets
FROM support_tickets
WHERE created_at >= DATE_TRUNC('quarter', CURRENT_DATE)
GROUP BY category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'calculate_percentages': {
'resolved_percentage': 'resolved_tickets / ticket_count * 100'
}}
]
})
pipeline.add_stage('html', {
'template': 'support_analysis_template.html'
})
pipeline.add_stage('pdf')
report = pipeline.execute()
Complex inventory tracking across multiple warehouses and product lines.
Create dynamic inventory reports with real-time data aggregation and visualization.
from text2doc import DocumentPipeline
def inventory_management_report():
pipeline = DocumentPipeline("inventory_status_report")
pipeline.add_stage('sql', {
'connection_string': 'mysql://inventory_system',
'query': '''
SELECT
warehouse_location,
product_category,
SUM(stock_quantity) as total_stock,
SUM(CASE WHEN stock_quantity < reorder_point THEN 1 ELSE 0 END) as low_stock_items,
AVG(stock_value) as avg_stock_value
FROM inventory
GROUP BY warehouse_location, product_category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'flag_low_stock': 'total_stock < 100'},
{'calculate_total_value': 'total_stock * avg_stock_value'}
]
})
pipeline.add_stage('html', {
'template': 'inventory_report_template.html',
'chart_type': 'pie'
})
pipeline.add_stage('pdf')
pipeline.add_stage('zpl', {
'label_type': 'inventory_warning'
})
pipeline.execute()
Generating standardized financial reports that meet regulatory requirements.
Automated pipeline to extract, transform, and format financial data for compliance reporting.
from text2doc import DocumentPipeline
def financial_compliance_report():
pipeline = DocumentPipeline("quarterly_financial_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://financial_db',
'query': '''
SELECT
account_type,
SUM(total_revenue) as revenue,
SUM(total_expenses) as expenses,
SUM(net_profit) as net_profit,
AVG(profit_margin) as avg_profit_margin
FROM financial_statements
WHERE quarter = CURRENT_QUARTER
GROUP BY account_type
'''
})
pipeline.add_stage('json', {
'transformations': [
{'validate_compliance_rules': True},
{'calculate_ratios': [
'debt_to_equity_ratio',
'current_ratio'
]}
]
})
pipeline.add_stage('html', {
'template': 'financial_compliance_template.html',
'watermark': 'CONFIDENTIAL'
})
pipeline.add_stage('pdf', {
'encryption': True
})
report = pipeline.execute()
Complex tracking of shipments, inventory movement, and logistics performance.
Create comprehensive logistics reports with detailed tracking and performance metrics.
from text2doc import DocumentPipeline
def logistics_performance_report():
pipeline = DocumentPipeline("logistics_tracking_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://logistics_db',
'query': '''
SELECT
shipping_partner,
COUNT(*) as total_shipments,
AVG(delivery_time) as avg_delivery_time,
SUM(CASE WHEN status = 'delayed' THEN 1 ELSE 0 END) as delayed_shipments
FROM shipment_tracking
WHERE shipment_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY shipping_partner
'''
})
pipeline.add_stage('json', {
'transformations': [
{'calculate_performance_score': True},
{'rank_shipping_partners': 'avg_delivery_time'}
]
})
pipeline.add_stage('html', {
'template': 'logistics_performance_template.html',
'include_charts': True
})
pipeline.add_stage('pdf')
pipeline.add_stage('zpl', {
'label_type': 'shipping_performance'
})
pipeline.execute()
Generating anonymized patient reports while maintaining data privacy and compliance.
Create a pipeline that extracts, anonymizes, and reports patient data securely.
from text2doc import DocumentPipeline
def anonymized_patient_report():
pipeline = DocumentPipeline("patient_data_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://medical_records',
'query': '''
SELECT
department,
COUNT(*) as patient_count,
AVG(treatment_duration) as avg_treatment_time,
SUM(treatment_cost) as total_treatment_cost
FROM patient_records
WHERE treatment_date >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
GROUP BY department
'''
})
pipeline.add_stage('json', {
'transformations': [
{'anonymize_data': True},
{'remove_personal_identifiers': ['patient_id']}
]
})
pipeline.add_stage('html', {
'template': 'patient_report_template.html',
'compliance_mode': 'HIPAA'
})
pipeline.add_stage('pdf', {
'encryption': True,
'access_controls': True
})
pipeline.execute()
text2doc/
โ
โโโ text2doc/ # Core Library
โ โโโ __init__.py # Package initialization
โ โ
โ โโโ core/ # Conversion Components
โ โ โโโ base_converter.py # Base conversion logic
โ โ โโโ sql_converter.py # SQL to data converter
โ โ โโโ json_converter.py # JSON transformations
โ โ โโโ html_converter.py # HTML rendering
โ โ โโโ pdf_converter.py # PDF generation
โ โ โโโ zpl_converter.py # ZPL label printing
โ โ โโโ print_converter.py # Printing utilities
โ โ
โ โโโ pipeline/ # Pipeline Management
โ โ โโโ base_pipeline.py # Core pipeline logic
โ โ โโโ document_pipeline.py# Document conversion pipeline
โ โ
โ โโโ utils/ # Utility Modules
โ โ โโโ config_manager.py # Configuration handling
โ โ โโโ logger.py # Logging utilities
โ โ โโโ exceptions.py # Custom exceptions
โ โ โโโ scheduler.py # Pipeline scheduling
โ โ
โ โโโ gui/ # Graphical Interfaces
โ โ โโโ main_window.py # Main application window
โ โ โโโ converter_panel.py # Conversion interface
โ โ โโโ pipeline_builder.py # Pipeline creation UI
โ โ
โ โโโ cli/ # Command Line Interface
โ โโโ main.py # CLI entry point
โ
โโโ frontend/ # React Configuration UI
โ โโโ src/
โ โ โโโ App.js
โ โ โโโ PipelineConfigApp.js
โ โโโ Dockerfile
โ โโโ package.json
โ
โโโ backend/ # Flask Backend
โ โโโ app.py
โ โโโ Dockerfile
โ โโโ requirements.txt
โ
โโโ examples/ # Usage Examples
โ โโโ simple_conversion.py
โ โโโ pipeline_example.py
โ โโโ advanced_pipeline.py
โ
โโโ tests/ # Testing Suite
โ โโโ test_converters.py
โ โโโ test_pipeline.py
โ โโโ test_config.py
โ
โโโ docs/ # Documentation
โ โโโ index.md
โ โโโ installation.md
โ โโโ usage.md
โ
โโโ setup.py
โโโ pyproject.toml
โโโ docker-compose.yml
- SQL to various formats
- JSON transformation
- HTML rendering
- PDF generation
- ZPL label printing
- Modular stage-based conversions
- Flexible configuration
- Error handling
- Logging and monitoring
- Cron-based scheduling
- Retry mechanisms
- Notification support
- Multi-process execution
- Web-based configuration
- CLI support
- Graphical pipeline builder
- Python
- Flask
- React
- SQLAlchemy
- Jinja2
- Pandas
- WeasyPrint
- Modularity: Each component should be independent and replaceable
- Configurability: Maximum flexibility for diverse use cases
- Performance: Efficient data processing
- Reliability: Robust error handling and logging
- Python 3.8+
- pip
- Docker (optional)
pip install text2doc
docker-compose up
from text2doc import DocumentPipeline
pipeline = DocumentPipeline("sales_report")
pipeline.add_stage('sql')
pipeline.add_stage('json')
pipeline.add_stage('html')
pipeline.add_stage('pdf')
report = pipeline.execute()
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Create Pull Request
Apache License 2.0
- GitHub: https://github.com/text2doc/python
- Email: [email protected]
- Slack Channel
- Discussion Forums
- Regular Meetups
- Machine Learning Integration
- More Converter Types
- Enhanced Scheduling
- Cloud Service Support
Remember: Data transformation is not just about changing formatsโit's about unlocking the potential hidden within your information.