"Transform Data, Unleash Potential"
To provide a seamless, powerful, and flexible document conversion platform that empowers businesses and developers to transform data across multiple formats with unprecedented ease and efficiency.
We envision a world where data flows freely between formats, breaking down barriers of communication and enabling intelligent, automated document processing.
- Flexibility: Create a modular document conversion ecosystem
- Efficiency: Minimize manual data transformation efforts
- Accessibility: Make complex document conversions simple
- Extensibility: Support continuous innovation in document processing
These examples demonstrate the versatility of Text2Doc in solving real-world data transformation challenges across various industries. The library provides a flexible, powerful solution for:
- Automating complex reporting processes
- Ensuring data consistency and accuracy
- Simplifying data extraction and transformation
- Supporting multiple output formats
- Maintaining data privacy and compliance
Manual creation of sales reports is time-consuming and error-prone, requiring data extraction, formatting, and distribution.
Automated pipeline that extracts sales data, transforms it, and generates professional reports.
from text2doc import DocumentPipeline
def generate_sales_report():
pipeline = DocumentPipeline("monthly_sales_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://sales_database',
'query': '''
SELECT
product_category,
SUM(quantity) as total_quantity,
SUM(total_price) as revenue,
AVG(unit_price) as avg_price
FROM sales
WHERE sale_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
GROUP BY product_category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'sort_by': 'revenue'},
{'top_n': 10}
]
})
pipeline.add_stage('html', {
'template': 'sales_report_template.html'
})
pipeline.add_stage('pdf')
pipeline.add_stage('print', {
'printer': 'management_reports_printer'
})
pipeline.execute()
Difficulty in tracking and analyzing customer support interactions across multiple channels.
Consolidate support ticket data from various sources and generate comprehensive analysis reports.
from text2doc import DocumentPipeline
def support_ticket_analysis():
pipeline = DocumentPipeline("support_ticket_insights")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://support_db',
'query': '''
SELECT
category,
COUNT(*) as ticket_count,
AVG(resolution_time) as avg_resolution_time,
COUNT(CASE WHEN status = 'resolved' THEN 1 END) as resolved_tickets
FROM support_tickets
WHERE created_at >= DATE_TRUNC('quarter', CURRENT_DATE)
GROUP BY category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'calculate_percentages': {
'resolved_percentage': 'resolved_tickets / ticket_count * 100'
}}
]
})
pipeline.add_stage('html', {
'template': 'support_analysis_template.html'
})
pipeline.add_stage('pdf')
report = pipeline.execute()
Complex inventory tracking across multiple warehouses and product lines.
Create dynamic inventory reports with real-time data aggregation and visualization.
from text2doc import DocumentPipeline
def inventory_management_report():
pipeline = DocumentPipeline("inventory_status_report")
pipeline.add_stage('sql', {
'connection_string': 'mysql://inventory_system',
'query': '''
SELECT
warehouse_location,
product_category,
SUM(stock_quantity) as total_stock,
SUM(CASE WHEN stock_quantity < reorder_point THEN 1 ELSE 0 END) as low_stock_items,
AVG(stock_value) as avg_stock_value
FROM inventory
GROUP BY warehouse_location, product_category
'''
})
pipeline.add_stage('json', {
'transformations': [
{'flag_low_stock': 'total_stock < 100'},
{'calculate_total_value': 'total_stock * avg_stock_value'}
]
})
pipeline.add_stage('html', {
'template': 'inventory_report_template.html',
'chart_type': 'pie'
})
pipeline.add_stage('pdf')
pipeline.add_stage('zpl', {
'label_type': 'inventory_warning'
})
pipeline.execute()
Generating standardized financial reports that meet regulatory requirements.
Automated pipeline to extract, transform, and format financial data for compliance reporting.
from text2doc import DocumentPipeline
def financial_compliance_report():
pipeline = DocumentPipeline("quarterly_financial_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://financial_db',
'query': '''
SELECT
account_type,
SUM(total_revenue) as revenue,
SUM(total_expenses) as expenses,
SUM(net_profit) as net_profit,
AVG(profit_margin) as avg_profit_margin
FROM financial_statements
WHERE quarter = CURRENT_QUARTER
GROUP BY account_type
'''
})
pipeline.add_stage('json', {
'transformations': [
{'validate_compliance_rules': True},
{'calculate_ratios': [
'debt_to_equity_ratio',
'current_ratio'
]}
]
})
pipeline.add_stage('html', {
'template': 'financial_compliance_template.html',
'watermark': 'CONFIDENTIAL'
})
pipeline.add_stage('pdf', {
'encryption': True
})
report = pipeline.execute()
Complex tracking of shipments, inventory movement, and logistics performance.
Create comprehensive logistics reports with detailed tracking and performance metrics.
from text2doc import DocumentPipeline
def logistics_performance_report():
pipeline = DocumentPipeline("logistics_tracking_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://logistics_db',
'query': '''
SELECT
shipping_partner,
COUNT(*) as total_shipments,
AVG(delivery_time) as avg_delivery_time,
SUM(CASE WHEN status = 'delayed' THEN 1 ELSE 0 END) as delayed_shipments
FROM shipment_tracking
WHERE shipment_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY shipping_partner
'''
})
pipeline.add_stage('json', {
'transformations': [
{'calculate_performance_score': True},
{'rank_shipping_partners': 'avg_delivery_time'}
]
})
pipeline.add_stage('html', {
'template': 'logistics_performance_template.html',
'include_charts': True
})
pipeline.add_stage('pdf')
pipeline.add_stage('zpl', {
'label_type': 'shipping_performance'
})
pipeline.execute()
Generating anonymized patient reports while maintaining data privacy and compliance.
Create a pipeline that extracts, anonymizes, and reports patient data securely.
from text2doc import DocumentPipeline
def anonymized_patient_report():
pipeline = DocumentPipeline("patient_data_report")
pipeline.add_stage('sql', {
'connection_string': 'postgresql://medical_records',
'query': '''
SELECT
department,
COUNT(*) as patient_count,
AVG(treatment_duration) as avg_treatment_time,
SUM(treatment_cost) as total_treatment_cost
FROM patient_records
WHERE treatment_date >= DATE_SUB(CURRENT_DATE, INTERVAL 3 MONTH)
GROUP BY department
'''
})
pipeline.add_stage('json', {
'transformations': [
{'anonymize_data': True},
{'remove_personal_identifiers': ['patient_id']}
]
})
pipeline.add_stage('html', {
'template': 'patient_report_template.html',
'compliance_mode': 'HIPAA'
})
pipeline.add_stage('pdf', {
'encryption': True,
'access_controls': True
})
pipeline.execute()
text2doc/
│
├── text2doc/ # Core Library
│ ├── __init__.py # Package initialization
│ │
│ ├── core/ # Conversion Components
│ │ ├── base_converter.py # Base conversion logic
│ │ ├── sql_converter.py # SQL to data converter
│ │ ├── json_converter.py # JSON transformations
│ │ ├── html_converter.py # HTML rendering
│ │ ├── pdf_converter.py # PDF generation
│ │ ├── zpl_converter.py # ZPL label printing
│ │ └── print_converter.py # Printing utilities
│ │
│ ├── pipeline/ # Pipeline Management
│ │ ├── base_pipeline.py # Core pipeline logic
│ │ └── document_pipeline.py# Document conversion pipeline
│ │
│ ├── utils/ # Utility Modules
│ │ ├── config_manager.py # Configuration handling
│ │ ├── logger.py # Logging utilities
│ │ ├── exceptions.py # Custom exceptions
│ │ └── scheduler.py # Pipeline scheduling
│ │
│ ├── gui/ # Graphical Interfaces
│ │ ├── main_window.py # Main application window
│ │ ├── converter_panel.py # Conversion interface
│ │ └── pipeline_builder.py # Pipeline creation UI
│ │
│ └── cli/ # Command Line Interface
│ └── main.py # CLI entry point
│
├── frontend/ # React Configuration UI
│ ├── src/
│ │ ├── App.js
│ │ └── PipelineConfigApp.js
│ ├── Dockerfile
│ └── package.json
│
├── backend/ # Flask Backend
│ ├── app.py
│ ├── Dockerfile
│ └── requirements.txt
│
├── examples/ # Usage Examples
│ ├── simple_conversion.py
│ ├── pipeline_example.py
│ └── advanced_pipeline.py
│
├── tests/ # Testing Suite
│ ├── test_converters.py
│ ├── test_pipeline.py
│ └── test_config.py
│
├── docs/ # Documentation
│ ├── index.md
│ ├── installation.md
│ └── usage.md
│
├── setup.py
├── pyproject.toml
└── docker-compose.yml
- SQL to various formats
- JSON transformation
- HTML rendering
- PDF generation
- ZPL label printing
- Modular stage-based conversions
- Flexible configuration
- Error handling
- Logging and monitoring
- Cron-based scheduling
- Retry mechanisms
- Notification support
- Multi-process execution
- Web-based configuration
- CLI support
- Graphical pipeline builder
- Python
- Flask
- React
- SQLAlchemy
- Jinja2
- Pandas
- WeasyPrint
- Modularity: Each component should be independent and replaceable
- Configurability: Maximum flexibility for diverse use cases
- Performance: Efficient data processing
- Reliability: Robust error handling and logging
- Python 3.8+
- pip
- Docker (optional)
pip install text2doc
docker-compose up
from text2doc import DocumentPipeline
pipeline = DocumentPipeline("sales_report")
pipeline.add_stage('sql')
pipeline.add_stage('json')
pipeline.add_stage('html')
pipeline.add_stage('pdf')
report = pipeline.execute()
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Create Pull Request
Apache License 2.0
- GitHub: https://github.com/text2doc/python
- Email: [email protected]
- Slack Channel
- Discussion Forums
- Regular Meetups
- Machine Learning Integration
- More Converter Types
- Enhanced Scheduling
- Cloud Service Support
Remember: Data transformation is not just about changing formats—it's about unlocking the potential hidden within your information.