Skip to content

Serverless multi-language audio/video transcription API using AWS Transcribe, Lambda & S3. Supports English, Chinese, Malay & Indonesian with REST endpoints and auto-deployment.

Notifications You must be signed in to change notification settings

MyGovHub-Goodbye-World/transcribe-api

Repository files navigation

AWS Transcribe API

A serverless AWS Lambda function that provides multi-language audio/video transcription services using AWS Transcribe. This service supports English, Chinese, Malay, and Indonesian languages and accepts S3 URLs for audio/video files.

Β© 2025 Goodbye World team, for Great AI Hackathon Malaysia 2025 usage.

βœ… Deployment Status

Service: Successfully deployed and operational
Last Updated: October 1, 2025

Live API Endpoints:

  • POST https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/transcribe
  • GET https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/status
  • GET https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/health
  • POST https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url

Infrastructure:

  • βœ… Lambda Functions: 4 functions deployed (25 MB each)
  • βœ… S3 Bucket: Auto-created wenhao1223-transcribe-aws-transcribe-api-dev-20251001
  • βœ… IAM Roles: Auto-generated with proper Transcribe and S3 permissions
  • βœ… API Gateway: Configured with CORS support
  • βœ… Security: Private bucket with encryption enabled
  • βœ… Cost Optimization: 30-day lifecycle policy for auto-cleanup

🎀 Features

  • πŸ€– Automatic Language Detection: Automatically detects single or multiple languages in audio files
  • 🌍 Multi-Language Support: Seamlessly handles mixed-language conversations and language switching
  • 🎯 Smart Language Filtering: Optional candidate languages for improved accuracy and speed
  • πŸ—£οΈ Language Support: English (en-us), Chinese (zh-cn), Malay (ms-my), Indonesian (id-id)
  • ☁️ S3 Integration: Accept S3 URLs for audio/video files
  • 🌐 RESTful API: Simple HTTP endpoints for transcription operations
  • ⏱️ Real-time Status: Check transcription job status and retrieve results
  • ⚑ Synchronous Processing: Process URLs and get immediate results
  • πŸ“Š Enhanced Response: Returns transcript with detected language information
  • πŸ”— CORS Enabled: Frontend-friendly with CORS support
  • πŸ—‘οΈ Auto S3 Cleanup: Automatic deletion of old transcripts after 30 days

πŸš€ Quick Test Commands

# 1. Health check
curl "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/health"

# 2. Quick transcribe with automatic language detection
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-s3-bucket.s3.amazonaws.com/sample-audio.m4a"}'

# 3. Quick transcribe with candidate languages (optional optimization)
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-s3-bucket.s3.amazonaws.com/sample-audio.m4a", "candidate_languages": ["en-us", "zh-cn"]}'

# 4. Start async transcription job (for very long audio files)
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/transcribe" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-bucket.s3.amazonaws.com/audio.mp3", "language": "en-us"}'

# 5. Check job status (replace JOB_NAME with actual job name)
curl "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/status?job_name=JOB_NAME"

πŸ“‹ API Endpoints

1. Health Check

GET /health

Check if the service is running and get supported languages.

curl "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/health"

Response Format:

{
  "status": {
    "statusCode": 200,
    "message": "Service is healthy"
  },
  "data": {
    "service": "aws-transcribe-api",
    "supported_languages": ["en-us", "zh-cn", "ms-my", "id-id"],
    "language_detection": "automatic_multi_language_by_default",
    "features": {
      "automatic_language_identification": true,
      "multi_language_support": true,
      "single_and_mixed_language_audio": true,
      "speaker_labeling": true,
      "alternative_transcriptions": true,
      "candidate_language_filtering": true
    },
    "timestamp": "2023-12-01T12:34:56.789Z",
    "request_id": "test-request-id-123"
  }
}

2. Start Transcription Job

POST /transcribe

Start a new asynchronous transcription job for an audio/video file.

Request Body:

{
  "url": "https://your-bucket.s3.amazonaws.com/audio-file.mp3",
  "language": "en-us"
}

Response Format:

{
  "status": {
    "statusCode": 200,
    "message": "Transcription job started successfully"
  },
  "data": {
    "job_name": "transcribe_job_20231201_123456_abc123",
    "job_status": "IN_PROGRESS",
    "language_code": "en-us",
    "media_url": "https://your-bucket.s3.amazonaws.com/audio-file.mp3",
    "creation_time": "2023-12-01T12:34:56.321000+00:00",
    "estimated_completion_time": "Processing time varies based on audio length"
  }
}

3. Check Status

GET /status?job_name=<job_name>

Check the status of a transcription job and retrieve results if completed.

Response Format:

{
  "status": {
    "statusCode": 200,
    "message": "Job status retrieved successfully"
  },
  "data": {
    "job_name": "transcribe_job_20231201_123456_abc123",
    "status": "COMPLETED",
    "creation_time": "2023-12-01T12:34:56.789000+00:00",
    "completion_time": "2023-12-01T12:36:45.123000+00:00", // null if status is 'IN_PROGRESS'
    "language_code": "en-US",
    "transcript": "Hello, this is the transcribed text from your audio file.", // undefined if status is 'IN_PROGRESS'
    "transcript_uri": "https://s3.amazonaws.com/bucket/transcript.json", // undefined if status is 'IN_PROGRESS'
    "identitfied_language_code": "en-US" // undefined if status is 'IN_PROGRESS'
  }
}

4. Process URL (Quick Transcribe) πŸ†• Enhanced with Automatic Language Detection

POST /process-url

Process an S3 URL and return the completed transcript immediately with automatic multi-language detection. No need to specify languages - AWS Transcribe automatically handles single or multiple languages in your audio.

Simple Request (Automatic Detection):

{
  "url": "https://your-bucket.s3.amazonaws.com/audio-file.mp3"
}

Request with Language Candidates (Optional - Improves Accuracy):

{
  "url": "https://your-bucket.s3.amazonaws.com/audio-file.mp3",
  "candidate_languages": ["en-us", "zh-cn", "ms-my"]
}

Response Format:

{
  "status": {
    "statusCode": 200,
    "message": "Transcription completed successfully"
  },
  "data": {
    "transcript": "Hello, δ½ ε₯½, mixed language transcript.",
    "detected_languages": [
      {"LanguageCode": "en-US", "DurationInSeconds": 120.5},
      {"LanguageCode": "zh-CN", "DurationInSeconds": 45.2}
    ],
    "language_identification": [
      {"LanguageCode": "en-US", "Score": 0.95},
      {"LanguageCode": "zh-CN", "Score": 0.85}
    ]
  }
}

Key Features:

  • πŸ€– Zero Configuration: Works out of the box with any supported language
  • 🌍 Multi-Language Ready: Handles language switching automatically
  • 🎯 Optional Optimization: Use candidate_languages for better accuracy
  • ⚑ Simple API: Just provide the URL, everything else is automatic

Usage Examples:

# Using the sample audio file
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url" \
  -H "Content-Type: application/json" \
  -d "{\"url\": \"https://your-s3-bucket-id.s3.us-east-1.amazonaws.com/sample-audio.m4a\"}"

# Using your own S3 URL
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url" \
  -H "Content-Type: application/json" \
  -d "{\"url\": \"https://your-bucket.s3.amazonaws.com/audio-file.mp3\"}"

Features:

  • βœ… Synchronous: Returns completed transcript immediately
  • βœ… Automatic Detection: Detects language(s) automatically
  • βœ… No Job Tracking: No need to check status separately
  • βœ… Quick Response: Best for shorter audio files
  • ⚠️ Timeout: May timeout for very long audio files (use /transcribe endpoint instead)

πŸš€ Supported Languages

Language Code AWS Transcribe Code
English (US) en-us en-US
Chinese (Simplified) zh-cn zh-CN
Malay (Malaysia) ms-my ms-MY
Indonesian id-id id-ID

πŸ€– Automatic Language Detection

How It Works

The API now automatically detects and handles languages using AWS Transcribe's multi-language identification:

Default Behavior (Zero Configuration):

{
  "url": "https://bucket.s3.amazonaws.com/audio.mp3"
}
  • βœ… Single Languages: Automatically detects dominant language (English, Chinese, Malay, Indonesian)
  • βœ… Multiple Languages: Handles language switching within the same audio
  • βœ… Mixed Conversations: Perfect for international meetings or multilingual content
  • βœ… No Setup Required: Just provide the URL, everything else is automatic

Optional Optimization with Candidate Languages:

{
  "url": "https://bucket.s3.amazonaws.com/audio.mp3",
  "candidate_languages": ["en-us", "zh-cn"]
}
  • 🎯 Improved Accuracy: Narrows detection to specific languages you expect
  • ⚑ Faster Processing: Reduces detection time by limiting language options
  • πŸŽͺ Smart Filtering: Only considers languages you specify

Benefits of Automatic Detection

  • πŸ”„ Backward Compatible: Existing code works without changes
  • 🌐 Universal: Handles any combination of supported languages
  • πŸ“Š Detailed Results: Returns confidence scores and language identification info
  • ⚑ Optimized: Uses AWS Transcribe's latest multi-language capabilities

πŸ› οΈ Setup and Deployment

Prerequisites

  • Node.js 18+
  • Python 3.10+
  • AWS CLI configured with proper credentials
  • Serverless Framework 4.x

Quick Deployment

  1. Install dependencies:

    npm install
    pip install -r requirements.txt
  2. Deploy to AWS:

    serverless deploy

The deployment will automatically:

  • βœ… Create S3 bucket for transcription outputs
  • βœ… Set up IAM roles with proper permissions
  • βœ… Deploy Lambda functions
  • βœ… Configure API Gateway with CORS
  • βœ… Set up lifecycle policies for cost optimization

Alternative Deployment Script

# Install, test, and deploy using the deployment script
python deploy_lambda.py install
python deploy_lambda.py test
python deploy_lambda.py deploy

Environment Variables (.env)

Create a .env file in the project root to configure AWS credentials and API endpoints:

# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
AWS_REGION1=us-east-1  # Default AWS region

# LAMBDA API Endpoint (optional - for testing deployed APIs)
TRANSCRIBE_API_BASE_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev

TRANSCRIBE_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/transcribe
TANSCRIBE_PROCESS_URL_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url
TRANSCRIBE_STATUS_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/status
TRANSCRIBE_HEALTH_API_URL=https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/health

SAMPLE_S3_AUDIO_URL=https://wenhao1223-sample-test-dev.s3.us-east-1.amazonaws.com/sample-audio-en.m4a

πŸ§ͺ Testing

Using Python Test Script

# Install requests library (if using HTTP tests)
pip install requests

# Generate HTML test interface (no API URL required)
python test_lambda.py --create-html

# Generate HTML test interface with pre-filled API URL
python test_lambda.py --create-html --api-url https://your-api-id.execute-api.us-east-1.amazonaws.com/dev

# Run automated tests
python test_lambda.py --api-url https://your-api-id.execute-api.us-east-1.amazonaws.com/dev

# Test with specific file
python test_lambda.py --api-url https://your-api-id.execute-api.us-east-1.amazonaws.com/dev --file document.pdf

Test Script Options:

  • --create-html: Generate an interactive HTML test interface (API URL optional)
  • --api-url: API Gateway URL (required for testing, optional for HTML generation)
  • --file: Specific file to upload (optional, creates test file if not provided)

Web Interface Testing

Open test_lambda.html in your browser for an interactive testing interface with:

  • Health check testing
  • File upload and transcription
  • Status monitoring
  • Real-time response display

Manual Testing with curl

Quick test commands:

# 1. Health Check
curl "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/health"

# 2. Quick Transcribe (immediate result)
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/process-url" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-s3-bucket.s3.amazonaws.com/sample-audio.m4a"}'

# 3. Start async transcription job
curl -X POST "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/transcribe" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://your-bucket.s3.amazonaws.com/audio.mp3", "language": "en-us"}'

# 4. Check job status (replace JOB_NAME with actual job name)
curl "https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/status?job_name=JOB_NAME"

Note: Replace your-s3-bucket and your-bucket with actual S3 bucket names containing your audio files.

πŸ€– Language Detection Behavior

Understanding how the automatic language detection works in different scenarios:

Single Language Detection:

When AWS Transcribe detects a single dominant language:

// AWS Internal Response Format
{
  "LanguageCode": "en-US",
  "IdentifiedLanguageScore": {
    "LanguageCode": "en-US", 
    "Score": 0.95
  }
}

Multi-Language Detection:

When AWS Transcribe detects multiple languages in the same audio:

// AWS Internal Response Format  
{
  "LanguageCodes": [
    {"LanguageCode": "en-US", "DurationInSeconds": 120.5},
    {"LanguageCode": "zh-CN", "DurationInSeconds": 45.2}
  ],
  "LanguageIdSettings": {...}
}

API Response Examples:

Single Language Detected:

{
  "status": {
    "statusCode": "200",
    "message": "Transcription completed successfully"
  },
  "data": {
    "message": "Hello, this is the transcript text.",
    "detected_language": "en-US",
    "detected_languages": [
      {"LanguageCode": "en-US", "DurationInSeconds": 120.5}
    ],
    "language_identification": [
      {"LanguageCode": "en-US", "Score": 0.95}
    ]
  }
}

Multiple Languages Detected:

{
  "status": {
    "statusCode": "200",
    "message": "Transcription completed successfully"
  },
  "data": {
    "message": "Hello, δ½ ε₯½, mixed language transcript.",
    "detected_language": "en-US",
    "detected_languages": [
      {"LanguageCode": "en-US", "DurationInSeconds": 120.5},
      {"LanguageCode": "zh-CN", "DurationInSeconds": 45.2}
    ],
    "language_identification": [
      {"LanguageCode": "en-US", "Score": 0.95},
      {"LanguageCode": "zh-CN", "Score": 0.85}
    ]
  }
}

Key Points:

  • πŸ€– Automatic: No configuration required, works out of the box
  • 🌍 Universal: Handles single and multi-language scenarios seamlessly
  • 🎯 Optimizable: Use candidate_languages for improved accuracy
  • πŸ“Š Detailed: Returns confidence scores and timing information

πŸ“ Project Structure

transcribe-api/
β”œβ”€β”€ lambda_handler.py      # Main Lambda function handlers
β”œβ”€β”€ serverless.yml         # Serverless Framework configuration
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ package.json          # Node.js dependencies
β”œβ”€β”€ deploy_lambda.py      # Deployment automation script
β”œβ”€β”€ test_lambda.py        # Local testing script
β”œβ”€β”€ test_lambda.html      # Web testing interface (use `python test_lambda.py --create-html` to generate)
β”œβ”€β”€ .env                  # Environment variables (local)
β”œβ”€β”€ README.md            # This documentation
└── media/               # Sample audio files for testing
    β”œβ”€β”€ sample-audio-en.m4a
    └── sample-audio-mix.m4a

πŸ”§ Configuration

Serverless Framework Configuration

The serverless.yml includes:

  • Auto IAM Roles: Automatic creation with Transcribe and S3 permissions
  • S3 Bucket: Auto-creation with security and lifecycle policies
  • API Gateway: RESTful endpoints with CORS support
  • Environment Variables: Automatic configuration
  • Security: Private bucket with encryption enabled
  • Cost Optimization: 30-day lifecycle rules

IAM Permissions

The auto-generated IAM role includes:

# Transcribe permissions
- transcribe:StartTranscriptionJob
- transcribe:GetTranscriptionJob  
- transcribe:ListTranscriptionJobs
- transcribe:DeleteTranscriptionJob

# S3 permissions
- s3:GetObject, s3:PutObject, s3:DeleteObject
- s3:ListBucket, s3:GetBucketLocation

# CloudWatch Logs permissions
- logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents

Error Handling

Common error scenarios:

  • 400 Bad Request: Missing URL, invalid language, malformed JSON
  • 404 Not Found: Transcription job not found
  • 500 Internal Server Error: AWS service errors, unexpected exceptions

πŸ”’ Security Features

  • Auto IAM Roles: Least privilege access with auto-generated policies
  • Private S3 Bucket: No public access, encryption enabled
  • CORS Configuration: Proper cross-origin resource sharing
  • Input Validation: URL and parameter validation
  • No Hardcoded Credentials: Uses AWS IAM roles and environment variables

πŸ“ˆ Monitoring and Logging

  • CloudWatch Logs: Automatic logging for all Lambda functions
  • Health Endpoint: Service status and availability monitoring
  • Error Tracking: Detailed error responses and logging
  • Performance Metrics: Lambda duration and invocation metrics

πŸ’Έ Cost Optimization

  • Lifecycle Policies: Auto-delete transcripts after 30 days
  • Efficient Packaging: Minimal deployment size
  • Resource Limits: Optimized memory and timeout settings
  • Pay-per-use: Only pay for actual transcription usage

πŸ—‘οΈ Cleanup

To remove the entire deployment and all resources:

# Using Serverless Framework
serverless remove

# Or using the deployment script
python deploy_lambda.py remove

This will delete:

  • Lambda functions
  • API Gateway
  • S3 bucket and all contents
  • IAM roles and policies
  • CloudWatch log groups

πŸ†˜ Troubleshooting

Common Issues:

  1. 403 Forbidden Errors:

    • Ensure S3 URLs are accessible
    • Check IAM permissions (auto-generated roles should work)
  2. Timeout Issues:

    • Use /process-url for shorter audio files
    • Use /transcribe + /status for longer files
  3. Invalid Language Codes:

    • Use supported codes: en-us, zh-cn, ms-my, id-id
  4. S3 URL Format:

    • Ensure URLs are public S3 URLs or pre-signed URLs
    • Format: https://bucket-name.s3.region.amazonaws.com/file-key
  5. πŸ”§ Multi-Language Detection Issues:

    • Error: "Unexpected error: 'LanguageCode'"
    • Cause: AWS Transcribe returns different response formats for single vs multi-language jobs
    • Solution: Updated code handles both LanguageCode (single) and LanguageCodes (multi) fields
    • Status: βœ… Fixed in latest deployment

Debug Steps:

  1. Test health endpoint: GET /health - Check service status
  2. Check CloudWatch logs - View detailed error messages in AWS Console
  3. Validate S3 URL accessibility - Ensure files are accessible
  4. Use test_lambda.html - Interactive debugging interface
  5. Test with known working files - Use English audio first to verify setup
  6. Check deployment status - Ensure latest code is deployed with serverless deploy

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Test changes locally using test_lambda.py
  4. Test web interface using test_lambda.html
  5. Submit a pull request

πŸ“š References


Last Updated: October 1, 2025
Service Status: βœ… Fully Operational
Infrastructure: Auto-managed via Serverless Framework

About

Serverless multi-language audio/video transcription API using AWS Transcribe, Lambda & S3. Supports English, Chinese, Malay & Indonesian with REST endpoints and auto-deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages