Skip to content

OpenPecha/webuddhist_text_upload_pipeline

Repository files navigation

Payload Generator Pecha

A Python application for uploading text segments, table of contents, and commentary mappings to the WebBuddhist API. This project provides tools for processing and uploading Buddhist texts with their associated metadata and structure.

πŸ“‹ Table of Contents

πŸš€ Features

  • Segment Upload: Upload text segments to WebBuddhist API
  • Table of Contents Upload: Upload structured TOC with segment references
  • Text Mapping: Map commentary texts to root texts
  • Metadata Upload: Upload text metadata and information
  • Comprehensive Testing: Full test suite with mocking
  • Logging: Detailed logging for all operations
  • Error Handling: Robust error handling and validation

πŸ“ Project Structure

payload_generator_pecha/
β”œβ”€β”€ README.md                           # This file
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”œβ”€β”€ .venv/                             # Virtual environment
β”œβ”€β”€ utils.py                           # Utility functions
β”œβ”€β”€ segment_uploader_webuddhist.py     # Main segment uploader
β”œβ”€β”€ toc_uploader_webuddhist.py         # Table of contents uploader
β”œβ”€β”€ src/
β”‚   └── metadata_uploader.py           # Metadata uploader
β”œβ”€β”€ mapping/
β”‚   β”œβ”€β”€ text_mapping.py                # Text mapping functionality
β”‚   β”œβ”€β”€ mapping_models.py              # Data models for mapping
β”‚   └── mapping_data/                  # Mapping data files
β”œβ”€β”€ test/
β”‚   β”œβ”€β”€ test_segment_uploader_webuddhist.py  # Test suite
β”‚   └── data/                          # Test data files
β”‚       β”œβ”€β”€ dummy_segment_upload_payload.json
β”‚       β”œβ”€β”€ dummy_api_response.json
β”‚       └── dummy_segment_content_with_segment_id.json
β”œβ”€β”€ data/
β”‚   └── metadata.json                  # Project metadata
β”œβ”€β”€ [text_name]/                       # Text-specific directories
β”‚   β”œβ”€β”€ [text_name]_payload/           # Payload files
β”‚   β”‚   β”œβ”€β”€ [text_name]_root_text_segment_payload.json
β”‚   β”‚   β”œβ”€β”€ [text_name]_root_text_toc_payload.json
β”‚   β”‚   └── [text_name]_commentary_[1-3]_text_segment_payload.json
β”‚   └── [text_name]_api_response/      # API response files
β”‚       └── [text_name]_segment_content_with_segment_id.json
└── pecha_segment_upload_payload/      # Pre-generated payloads
    β”œβ”€β”€ choejuk_root_text.json
    β”œβ”€β”€ choejuk_commentary_1.json
    └── ... (other text files)

πŸ”§ Prerequisites

  • Python 3.12 or higher
  • Virtual environment support
  • Internet connection for API calls
  • WebBuddhist API credentials

πŸ“¦ Installation

1. Clone the Repository

git clone <repository-url>
cd payload_generator_pecha

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

If requirements.txt doesn't exist, install manually:

pip install requests pytest Levenshtein pydantic

4. Verify Installation

python -m pytest --version
python -c "import requests; print('Dependencies installed successfully')"

βš™οΈ Configuration

Environment Variables Setup

The application uses environment variables for configuration to keep sensitive information secure.

1. Create Environment File

# Copy the template file
cp env.template .env

# Edit the .env file with your actual values
nano .env  # or use your preferred editor

2. Configure Your .env File

# WebBuddhist API Configuration
WEBUDDHIST_API_BASE_URL=https://your-api-domain.com
WEBUDDHIST_SEGMENTS_ENDPOINT=/api/v1/segments
WEBUDDHIST_TOC_ENDPOINT=/api/v1/texts/table-of-content
WEBUDDHIST_AUTH_ENDPOINT=/api/v1/auth/login

# Optional: Pre-configure credentials (not recommended for production)
# [email protected]
# WEBUDDHIST_PASSWORD=your-password

# Environment
ENVIRONMENT=development
LOG_LEVEL=INFO

3. Security Notes

  • Never commit your .env file to version control
  • The .env file is already in .gitignore
  • For production, use proper secret management systems
  • If credentials are not in .env, the application will prompt for them

Authentication

You'll need WebBuddhist API credentials. You can either:

  1. Set them in .env file (not recommended for production)
  2. Enter them when prompted (recommended for security)

🎯 Usage

Segment Upload

Upload text segments to the WebBuddhist API.

Command Line Usage:

python segment_uploader_webuddhist.py

Interactive Prompts:

  1. Text name: Enter the name of your text (e.g., heart_sutra, diamond_cutter)
  2. Root or commentary: Enter root or commentary_1, commentary_2, commentary_3
  3. Email/Password: Enter your WebBuddhist API credentials

Required File Structure:

[text_name]/
β”œβ”€β”€ [text_name]_payload/
β”‚   └── [text_name]_[root_or_commentary]_text_segment_payload.json
└── [text_name]_api_response/  # Created automatically
    └── [text_name]_[root_or_commentary]_segment_content_with_segment_id.json

Example:

$ python segment_uploader_webuddhist.py
Enter the text name: heart_sutra
Enter the root or commentary_[1,2,3]: root
Enter your email: [email protected]
Enter your password: ********

This will:

  1. Read heart_sutra/heart_sutra_payload/heart_sutra_root_text_segment_payload.json
  2. Upload segments to the API
  3. Save response to heart_sutra/heart_sutra_api_response/heart_sutra_root_segment_content_with_segment_id.json
  4. Log activities to segment_upload_log.txt

Table of Contents Upload

Upload table of contents with segment references.

Command Line Usage:

python toc_uploader_webuddhist.py

Prerequisites:

  • Segments must be uploaded first (requires segment ID mapping file)
  • TOC payload file must exist

Required Files:

[text_name]/
β”œβ”€β”€ [text_name]_payload/
β”‚   └── [text_name]_[root_or_commentary]_text_toc_payload.json
└── [text_name]_api_response/
    └── [text_name]_[root_or_commentary]_segment_content_with_segment_id.json

Text Mapping

Map commentary texts to root texts.

Command Line Usage:

python mapping/text_mapping.py

Metadata Upload

Upload text metadata.

Command Line Usage:

python src/metadata_uploader.py

πŸ“„ File Structure Requirements

Segment Payload Format

{
    "text_id": "uuid-string",
    "segments": [
        {
            "content": "Text content here\n",
            "type": "source",
            "mapping": []
        }
    ]
}

TOC Payload Format

{
    "text_id": "uuid-string",
    "sections": [
        {
            "title": "Chapter 1",
            "segments": [
                {
                    "segment_id": "segment-content-to-match"
                }
            ]
        }
    ]
}

πŸ§ͺ Testing

Run All Tests

# Activate virtual environment first
source .venv/bin/activate

# Run tests with pytest
python -m pytest

# Run with verbose output
python -m pytest -v

# Run specific test file
python -m pytest test/test_segment_uploader_webuddhist.py -v

Run Tests with Coverage

python -m pytest --cov=segment_uploader_webuddhist test/

Test Structure

The test suite includes:

  • 16 comprehensive test cases
  • Mock external dependencies (API calls, file I/O, user input)
  • Error handling tests (network errors, file permissions, invalid data)
  • Edge case testing (empty data, malformed responses)
  • Integration tests (full workflow testing)

Test Data

Test data is stored in test/data/:

  • dummy_segment_upload_payload.json - Sample input payload
  • dummy_api_response.json - Mock API response
  • dummy_segment_content_with_segment_id.json - Expected output format

πŸ“Š Logging

Log Files

  • segment_upload_log.txt - Segment upload activities
  • toc_upload_log.txt - TOC upload activities

Log Format

2024-01-01 12:00:00,000 [INFO] Uploading segments to webuddhist
2024-01-01 12:00:01,000 [INFO] Segments uploaded, 200
2024-01-01 12:00:01,100 [INFO] Segments uploaded successfully for text_id: uuid-string

🌐 API Endpoints

The application uses configurable API endpoints defined in your .env file:

Segments Endpoint

  • URL: {WEBUDDHIST_API_BASE_URL}{WEBUDDHIST_SEGMENTS_ENDPOINT}
  • Method: POST
  • Headers: Authorization: Bearer <token>
  • Body: JSON payload with segments

Table of Contents Endpoint

  • URL: {WEBUDDHIST_API_BASE_URL}{WEBUDDHIST_TOC_ENDPOINT}
  • Method: POST
  • Headers: Authorization: Bearer <token>
  • Body: JSON payload with TOC structure

Authentication Endpoint

  • URL: {WEBUDDHIST_API_BASE_URL}{WEBUDDHIST_AUTH_ENDPOINT}
  • Method: POST
  • Body: JSON with email and password

πŸ” Troubleshooting

Common Issues

1. Module Import Errors

ModuleNotFoundError: No module named 'segment_uploader_webuddhist'

Solution: Make sure virtual environment is activated and you're in the project root:

source .venv/bin/activate
cd /path/to/payload_generator_pecha
python -m pytest

2. File Not Found Errors

FileNotFoundError: [Errno 2] No such file or directory: 'text_name/text_name_payload/...'

Solution: Ensure your file structure matches the expected format:

  • Create the text directory: mkdir [text_name]
  • Create payload directory: mkdir [text_name]/[text_name]_payload
  • Place your JSON payload file in the correct location

3. Authentication Errors

401 Unauthorized

Solution:

  • Verify your WebBuddhist API credentials
  • Check if your account has the necessary permissions
  • Ensure the API endpoints are correct

4. Network Connection Issues

ConnectionError: Failed to establish a new connection

Solution:

  • Check your internet connection
  • Verify the API endpoints are accessible
  • Check if there are any firewall restrictions

Debug Mode

For detailed debugging, you can modify the logging level in the scripts:

logging.basicConfig(level=logging.DEBUG)

Testing Without API Calls

Use the test suite to verify functionality without making actual API calls:

python -m pytest test/test_segment_uploader_webuddhist.py::TestSegmentUploader::test_upload_segments_to_webuddhist_success -v

πŸ“ Example Workflow

Complete Upload Process

  1. Prepare your text files:

    mkdir heart_sutra
    mkdir heart_sutra/heart_sutra_payload
    # Place your segment payload JSON file
  2. Upload segments:

    python segment_uploader_webuddhist.py
    # Enter: heart_sutra
    # Enter: root
    # Enter credentials
  3. Upload table of contents:

    python toc_uploader_webuddhist.py
    # Enter: heart_sutra  
    # Enter: root
    # Enter credentials
  4. Verify uploads:

    # Check log files
    tail segment_upload_log.txt
    tail toc_upload_log.txt
    
    # Check generated response files
    ls heart_sutra/heart_sutra_api_response/

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass: python -m pytest
  5. Submit a pull request

πŸ“„ License

[Add your license information here]

πŸ“ž Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the test cases for usage examples
  3. Check log files for detailed error information
  4. Create an issue in the repository

Note: Always test with small datasets first before uploading large collections of texts.

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages