High-performance async Python library for AWS Kinesis with production-ready resharding support.
Producer:
from kinesis import Producer
async with Producer(stream_name="my-stream") as producer:
await producer.put({"message": "hello world"})
Consumer:
from kinesis import Consumer
async with Consumer(stream_name="my-stream") as consumer:
async for message in consumer:
print(message)
π New to async-kinesis? Check out our comprehensive Getting Started guide for step-by-step tutorials and examples.
- Key Features
- Installation
- Basic Usage
- Configuration
- Rate Limiting & API Optimization
- Shard Management
- Architecture Comparison
- Checkpointers
- Processors
- Benchmark/Example
- Development & Testing
- Documentation
- Production-Ready Resharding: Automatic shard discovery and topology management
- Async/Await Native: Built for modern Python async patterns
- High Performance: Queue-based architecture with configurable batching
- AWS Best Practices: Parent-child shard ordering and proper error handling
- Rate Limit Optimization: Skip DescribeStream or use ListShards for better API limits
- Stream Addressing: Support for both stream names and ARNs
- Custom Partition Keys: Control record distribution and ordering across shards
- Multi-Consumer Support: Redis-based checkpointing with heartbeats
- Flexible Processing: Pluggable serialization (JSON, MessagePack, KPL)
- Operational Visibility: Rich monitoring APIs for production debugging
- Metrics & Observability: Optional Prometheus/CloudWatch metrics with zero overhead when disabled
Unlike basic Kinesis libraries, async-kinesis provides enterprise-grade resharding capabilities:
- Automatic discovery of new shards during resharding operations
- Parent-child ordering enforcement following AWS best practices
- Graceful handling of closed shards and iterator expiration
- Real-time monitoring with detailed topology and status reporting
- Seamless coordination between multiple consumer instances
π Architecture Details | Why Another Library?
pip install async-kinesis
# With optional dependencies
pip install async-kinesis[prometheus] # For Prometheus metrics
pip install async-kinesis[redis] # For Redis checkpointing
pip install async-kinesis[dynamodb] # For DynamoDB checkpointing
As required by boto3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
The stream_name
argument to consumer and producer accepts either the StreamName
like test
, or a full StreamARN like arn:aws:kinesis:eu-central-1:842404475894:stream/test
.
from kinesis import Producer
async with Producer(stream_name="test") as producer:
# Put item onto queue to be flushed via put_records()
await producer.put({'my': 'data'})
# Use custom partition key for record distribution control
await producer.put({'user_id': '123', 'action': 'login'}, partition_key='user-123')
With Metrics:
from kinesis import Producer, PrometheusMetricsCollector
# Enable Prometheus metrics (optional)
metrics = PrometheusMetricsCollector(namespace="my_app")
async with Producer(stream_name="test", metrics_collector=metrics) as producer:
await producer.put({'my': 'data'})
# Metrics are automatically collected: throughput, errors, batch sizes, etc.
Control shard distribution and maintain record ordering by specifying custom partition keys:
async with Producer(stream_name="my-stream") as producer:
# Group related records with same partition key
await producer.put({'user_id': '123', 'action': 'login'}, partition_key='user-123')
await producer.put({'user_id': '123', 'action': 'view_page'}, partition_key='user-123')
# Different partition key for different user
await producer.put({'user_id': '456', 'action': 'login'}, partition_key='user-456')
# Use default generated partition key
await producer.put({'system': 'health_check'}) # Auto-generated key
Key Benefits:
- Ordered processing: Records with same partition key go to same shard, maintaining order
- Load distribution: Spread records across shards using different partition keys
- Grouping: Keep related records together (e.g., all events for a user)
- Backward compatible: Existing code without partition keys continues to work
Validation:
- Maximum 256 bytes when UTF-8 encoded
- Supports Unicode characters
- Cannot be empty string
- Must be string type
Aggregation Behavior:
- SimpleAggregator: Records with different partition keys are batched separately
- KPL Aggregators: Custom partition keys not supported (raises
ValueError
) - Other Aggregators: Automatically group records by partition key within batches
AWS Kinesis APIs have strict rate limits that can impact performance in high-throughput or multi-consumer scenarios. async-kinesis provides optimization options to work around these constraints:
- DescribeStream: 10 operations/second account-wide (shared across all applications)
- ListShards: 100 operations/second per stream (much higher limit)
Skip DescribeStream entirely (best for pre-provisioned streams):
# Assumes stream exists and is active - no API calls during startup
async with Consumer(stream_name="my-stream", skip_describe_stream=True) as consumer:
async for message in consumer:
print(message)
Use ListShards API for better rate limits:
# Uses ListShards (100 ops/s) instead of DescribeStream (10 ops/s)
async with Consumer(stream_name="my-stream", use_list_shards=True) as consumer:
async for message in consumer:
print(message)
Combined approach for maximum optimization:
# Skip startup calls + use ListShards for any later shard discovery
async with Consumer(
stream_name="my-stream",
skip_describe_stream=True, # Skip startup API calls
use_list_shards=True # Use ListShards if shard refresh needed
) as consumer:
async for message in consumer:
print(message)
Scenario | Recommendation | Reason |
---|---|---|
Single consumer, known stream | skip_describe_stream=True |
Eliminates rate limit issues entirely |
Multiple consumers | use_list_shards=True |
10x better rate limits than DescribeStream |
Short-lived consumers | skip_describe_stream=True |
Fastest startup, no API overhead |
Stream discovery needed | use_list_shards=True |
Need shard info but want better limits |
Production deployment | Both options | Maximum resilience to rate limiting |
skip_describe_stream=True
assumes the stream exists and is active. Use only when you're certain the stream is available.
Parameter | Default | Description |
---|---|---|
session | None | AioSession (to use non default profile etc) |
region_name | None | AWS Region |
buffer_time | 0.5 | Buffer time in seconds before auto flushing records |
put_rate_limit_per_shard | 1000 | "A single shard can ingest up to 1 MiB of data per second (including partition keys) or 1,000 records per second for writes" |
put_bandwidth_limit_per_shard | 1024 | Kb per sec. max is 1024 per shard (ie 1 MiB). Keep below to minimize ProvisionedThroughputExceeded" errors * |
batch_size | 500 | "Each PutRecords request can support up to 500 records" |
max_queue_size | 10000 | put() method will block when queue is at max |
after_flush_fun | None | async function to call after doing a flush (err put_records()) call |
processor | JsonProcessor() | Record aggregator/serializer. Default is JSON without aggregation. Note this is highly inefficient as each record can be up to 1Mib |
retry_limit | None | How many connection attempts should be made before raising a exception |
expo_backoff | None | Exponential Backoff when connection attempt fails |
expo_backoff_limit | 120 | Max amount of seconds Exponential Backoff can grow |
skip_describe_stream | False | Skip DescribeStream API calls for better rate limits (assumes stream exists and is active) |
use_list_shards | False | Use ListShards API instead of DescribeStream for better rate limits (100 ops/s vs 10 ops/s) |
create_stream | False | Creates a Kinesis Stream based on the stream_name keyword argument. Note if stream already existing it will ignore |
create_stream_shards | 1 | Sets the amount of shard you want for your new stream. Note if stream already existing it will ignore |
- Throughput exceeded. The docs (for Java/KPL see: https://docs.aws.amazon.com/streams/latest/dev/kinesis-producer-adv-retries-rate-limiting.html) state:
You can lower this limit to reduce spamming due to excessive retries. However, the best practice is for each producer is to retry for maximum throughput aggressively and to handle any resulting throttling determined as excessive by expanding the capacity of the stream and implementing an appropriate partition key strategy.
Even though our default here is to limit at this threshold (1024kb) in reality the threshold seems lower (~80%). If you wish to avoid excessive throttling or have multiple producers on a stream you will want to set this quite a bit lower.
from kinesis import Consumer
async with Consumer(stream_name="test") as consumer:
async for item in consumer:
print(item)
# Consumer continues to wait for new messages after catching up
Options:
(comments in quotes are Kinesis Limits as per AWS Docs)
Arg | Default | Description |
---|---|---|
session | None | AioSession (to use non default profile etc) |
region_name | None | AWS Region |
max_queue_size | 10000 | the fetch() task shard will block when queue is at max |
max_shard_consumers | None | Max number of shards to use. None = all |
record_limit | 10000 | Number of records to fetch with get_records() |
sleep_time_no_records | 2 | No of seconds to sleep when caught up |
iterator_type | TRIM_HORIZON | Default shard iterator type for new/unknown shards (ie start from start of stream). Alternatives are "LATEST" (ie end of stream), "AT_TIMESTAMP" (ie particular point in time, requires defining timestamp arg) |
shard_fetch_rate | 1 | No of fetches per second (max = 5). 1 is recommended as allows having multiple consumers without hitting the max limit. |
checkpointer | MemoryCheckPointer() | Checkpointer to use |
processor | JsonProcessor() | Record aggregator/serializer. Must Match processor used by Producer() |
retry_limit | None | How many connection attempts should be made before raising a exception |
expo_backoff | None | Exponential Backoff when connection attempt fails |
expo_backoff_limit | 120 | Max amount of seconds Exponential Backoff can grow |
skip_describe_stream | False | Skip DescribeStream API calls for better rate limits (assumes stream exists and is active) |
use_list_shards | False | Use ListShards API instead of DescribeStream for better rate limits (100 ops/s vs 10 ops/s) |
create_stream | False | Creates a Kinesis Stream based on the stream_name keyword argument. Note if stream already existing it will ignore |
create_stream_shards | 1 | Sets the amount of shard you want for your new stream. Note if stream already existing it will ignore |
timestamp | None | Timestamp to start reading stream from. Used with iterator type "AT_TIMESTAMP" |
The consumer includes sophisticated shard management capabilities for handling Kinesis stream resharding. Our approach combines automatic topology management with lightweight coordination, providing production-ready resharding support without the overhead of external databases or complex lease management systems.
- Dynamic detection: Automatically discovers new shards during resharding operations
- Configurable refresh: Checks for new shards every 60 seconds (configurable via
_shard_refresh_interval
) - State preservation: Maintains existing shard iterators and statistics when discovering new shards
- Resharding events: Proactively detects and logs resharding events when child shards appear
- Topology mapping: Automatically builds parent-child relationship maps from shard metadata
- Sequential consumption: Enforces reading parent shards before child shards to maintain data order
- Exhaustion tracking: Tracks when parent shards are fully consumed to enable child consumption
- Smart allocation: Only allocates child shards after their parents are exhausted or closed
- Graceful closure: Properly handles shards that reach end-of-life (
NextShardIterator
returnsnull
) - Resource cleanup: Automatically deallocates closed shards from checkpointers
- Parent transitioning: Marks exhausted parent shards to enable their children for consumption
- Coordination: Enables other consumers to take over child shards immediately
- Iterator expiration: Automatically recreates expired shard iterators from last known position
- Missing shards: Handles cases where shards are deleted or become unavailable
- Connection recovery: Robust retry logic with exponential backoff
- Topology resilience: Maintains parent-child relationships even during connection failures
Use consumer.get_shard_status()
to get comprehensive operational visibility:
status = consumer.get_shard_status()
# Overall shard metrics
print(f"Total shards: {status['total_shards']}")
print(f"Active shards: {status['active_shards']}")
print(f"Closed shards: {status['closed_shards']}")
print(f"Allocated shards: {status['allocated_shards']}")
# Topology information (resharding awareness)
print(f"Parent shards: {status['parent_shards']}")
print(f"Child shards: {status['child_shards']}")
print(f"Exhausted parents: {status['exhausted_parents']}")
# Parent-child relationships
for parent, children in status['topology']['parent_child_map'].items():
print(f"Parent {parent} β Children {children}")
# Per-shard details with topology info
for shard in status['shard_details']:
print(f"Shard {shard['shard_id']}: "
f"allocated={shard['is_allocated']}, "
f"parent={shard['is_parent']}, "
f"child={shard['is_child']}, "
f"can_allocate={shard['can_allocate']}")
This provides detailed information about shard allocation, closure status, parent-child relationships, and iterator health for production monitoring and debugging resharding scenarios.
External Database Coordination: Some enterprise solutions rely on external databases (like DynamoDB) for lease management and shard coordination. While sophisticated, this approach requires additional infrastructure, increases operational complexity, and adds latency to shard allocation decisions.
Manual Thread-per-Shard: Basic implementations use simple iterator-based loops with one thread per shard. These require manual resharding detection, lack parent-child ordering enforcement, and provide limited coordination between multiple consumers.
Reactive Closed Shard Detection: Most basic approaches only detect closed shards when NextShardIterator
returns null, without proactive resharding awareness or topology management.
Lightweight Async Coordination: Uses Redis-based checkpointing with heartbeat mechanisms for multi-consumer coordination without requiring external database infrastructure.
Proactive Topology Management: Automatically builds and maintains parent-child shard relationships from AWS metadata, enforcing proper consumption ordering according to AWS best practices.
Intelligent Allocation: Only allocates child shards after their parents are exhausted, preventing data ordering issues during resharding events.
Production-Ready Error Recovery: Comprehensive handling of iterator expiration, connection failures, and shard state transitions with exponential backoff and automatic recovery.
- Operational Simplicity: No additional database infrastructure required beyond optional Redis for multi-consumer scenarios
- AWS Best Practices: Automatic compliance with parent-child consumption ordering
- Async Performance: Built for modern async/await patterns with non-blocking operations
- Resource Efficiency: Intelligent shard allocation reduces unnecessary resource consumption
- Production Monitoring: Rich operational visibility for debugging and monitoring resharding scenarios
Checkpointers manage shard allocation and progress tracking across multiple consumer instances.
The default checkpointer (but only useful for single-consumer testing):
MemoryCheckPointer()
Redis-based checkpointer for distributed consumers:
RedisCheckPointer(
name="consumer-group",
session_timeout=60,
heartbeat_frequency=15,
is_cluster=False
)
Requirements:
- Install:
pip install async-kinesis[redis]
- Environment:
REDIS_HOST
(and optionallyREDIS_PORT
,REDIS_PASSWORD
,REDIS_DB
)
DynamoDB-based checkpointer for serverless deployments:
DynamoDBCheckPointer(
name="consumer-group",
table_name=None, # Optional: defaults to kinesis-checkpoints-{name}
session_timeout=60,
heartbeat_frequency=15,
create_table=True, # Auto-create table if needed
ttl_hours=24 # Automatic cleanup of old records
)
Requirements:
- Install:
pip install async-kinesis[dynamodb]
- AWS credentials with DynamoDB permissions
Benefits over Redis:
- No infrastructure to manage
- Pay-per-request pricing (no idle costs)
- Automatic scaling
- Built-in backup and recovery
π DynamoDB Checkpointing Guide - Detailed setup and configuration
Aggregation enable batching up multiple records to more efficiently use the stream. Refer https://aws.amazon.com/blogs/big-data/implementing-efficient-and-reliable-producers-with-the-amazon-kinesis-producer-library/
Class | Aggregator | Serializer | Description |
---|---|---|---|
StringProcessor | SimpleAggregator | StringSerializer | Single String record |
JsonProcessor | SimpleAggregator | JsonSerializer | Single JSON record |
JsonLineProcessor | NewlineAggregator | JsonSerializer | Multiple JSON record separated by new line char |
JsonListProcessor | ListAggregator | JsonSerializer | Multiple JSON record returned by list |
MsgpackProcessor | NetstringAggregator | MsgpackSerializer | Multiple Msgpack record framed with Netstring Protocol (https://en.wikipedia.org/wiki/Netstring) |
KPLJsonProcessor | KPLAggregator | JsonSerializer | Multiple JSON record in a KPL Aggregated Record (https://github.com/awslabs/amazon-kinesis-producer/blob/master/aggregation-format.md) |
KPLStringProcessor | KPLAggregator | StringSerializer | Multiple String record in a KPL Aggregated Record (https://github.com/awslabs/amazon-kinesis-producer/blob/master/aggregation-format.md) |
Note you can define your own processor easily as it's simply a class inheriting the Aggregator + Serializer.
class MsgpackProcessor(Processor, NetstringAggregator, MsgpackSerializer):
pass
Just define a new Serializer class with serialize() and deserialize() methods.
Note:
- Json will use
pip install ujson
if installed - Msgpack requires
pip install msgpack
to be installed - KPL requires
pip install aws-kinesis-agg
to be installed
See benchmark.py for code
The benchmark tool allows you to test the performance of different processors with AWS Kinesis streams.
# Run with default settings (50k records)
python benchmark.py
# Dry run without creating AWS resources
python benchmark.py --dry-run
# Custom parameters
python benchmark.py --records 10000 --shards 2 --processors json msgpack
# Generate markdown output
python benchmark.py --markdown
# Run multiple iterations
python benchmark.py --iterations 3
50k items of approx 1k (python) in size, using single shard:
Processor | Iteration | Python Bytes | Kinesis Bytes | Time (s) | Records/s | Python MB/s | Kinesis MB/s |
---|---|---|---|---|---|---|---|
StringProcessor | 1 | 2.7 MB | 50.0 MB | 51.2 | 977 | 53.9 kB | 1000.0 kB |
JsonProcessor | 1 | 2.7 MB | 50.0 MB | 52.1 | 960 | 52.9 kB | 982.5 kB |
JsonLineProcessor | 1 | 2.7 MB | 41.5 MB | 43.5 | 1149 | 63.4 kB | 976.7 kB |
JsonListProcessor | 1 | 2.7 MB | 2.6 MB | 2.8 | 17857 | 982.1 kB | 946.4 kB |
MsgpackProcessor | 1 | 2.7 MB | 33.9 MB | 35.8 | 1397 | 77.0 kB | 969.8 kB |
Note: MsgpackProcessor performance varies significantly with record size. While it appears slower here with small records due to netstring framing overhead (~9%) and msgpack serialization costs, it can be faster than JSON processors with larger, complex data structures where msgpack's binary efficiency provides benefits.
Choose the optimal processor based on your use case:
Use Case | Recommended Processor | Reason |
---|---|---|
High-frequency small messages (<500 bytes) | JsonLineProcessor | Minimal aggregation overhead, simple parsing |
Individual JSON messages | JsonProcessor | Maximum compatibility, no aggregation complexity |
Batch processing arrays | JsonListProcessor | Highest throughput for compatible consumers |
Large complex data (>1KB) | MsgpackProcessor | Binary efficiency outweighs overhead |
Raw text/logs | StringProcessor | Minimal processing overhead |
Binary data or deeply nested objects | MsgpackProcessor | Compact binary representation |
Real-time streaming | JsonLineProcessor | Simple, fast parsing with good compression |
Bandwidth-constrained environments | MsgpackProcessor | Smaller payload sizes |
Performance Testing: Use the benchmark tool with different --record-size-kb
and --processors
options to determine the best processor for your specific data patterns.
Uses LocalStack for integration testing:
# Run full test suite via Docker
docker-compose up --abort-on-container-exit --exit-code-from test
# Local development setup
docker-compose up kinesis redis
pip install -r test-requirements.txt
pytest
This project uses automated code formatting and linting:
# Install development tools
pip install -r test-requirements.txt
# Run formatting and linting
black .
isort .
flake8 .
# Or use pre-commit hooks
pre-commit install
pre-commit run --all-files
Some tests require actual AWS Kinesis. Create .env
file:
TESTING_USE_AWS_KINESIS=1
Comprehensive resharding test suite available:
# Unit tests (no AWS required)
python tests/resharding/test_resharding_simple.py
# Integration tests (requires LocalStack)
python tests/resharding/test_resharding_integration.py
# Production testing (requires AWS)
python tests/resharding/resharding_test.py --scenario scale-up-small
- Getting Started Guide - Step-by-step tutorials for beginners
- Common Patterns - Real-world use cases and examples
- Metrics & Observability - Prometheus integration and monitoring
- DynamoDB Checkpointing - DynamoDB checkpointer setup and best practices
- Troubleshooting Guide - Solutions for common issues
- Architecture Details - Technical deep dive into the implementation
- Why Another Library? - Comparison with other Kinesis libraries