π High-performance PDF to image converter optimized for Windows threading issues
A drop-in replacement and optimization layer for pdf2image that solves Windows threading bottlenecks and provides multiple conversion strategies for different use cases.
I think the real issue would be to investigate in poppler code itself, however, this is a quick solution to memory error or slow conversion
The popular pdf2image library suffers from significant performance issues on Windows when using multiple threads (often 2-4x slower than single-threading). This optimizer:
- β Fixes Windows threading bottlenecks - Automatically detects and optimizes for Windows
- β faster conversions on Windows systems
- β Multiple optimization strategies for different file sizes and use cases
- β
Drop-in replacement - Works with existing
pdf2imagecode - β Memory-efficient processing for large PDFs
- β Auto-detection of optimal conversion strategy
π Speed improvement vs slowest: 4.9x
results:
- 'single': {'time': 878.5427498817444, 'pages': 3096, 'success': True, 'pages_per_second': 3.5240174714511445},
- 'batch': {'time': 1024.443242073059, 'pages': 3091, 'success': True, 'pages_per_second': 3.017248660594476},
- 'async': {'time': 245.07, 'pages': 3096, 'success': True, 'pages_per_second': 12.6},
- 'memory': {'time': 1207.1273488998413, 'pages': 3096, 'success': True, 'pages_per_second': 2.564766677535432}}
Benchmarked on Windows 10, 13th Gen Intel(R) Core(TM) i5-13500H, 32GB RAM
pip install pdf2image pillowAdditional Requirements:
- Windows: Download Poppler for Windows and add to PATH
- Linux:
sudo apt-get install poppler-utils - macOS:
brew install poppler
NB: I would suggest to launch benchmarks and then choose solution that will give best result suiting your documents
from pdf_optimizer import apply_patch
# Apply optimization patch once
apply_patch()
# Now use pdf2image normally - it's automatically optimized!
from pdf2image import convert_from_path
images = convert_from_path("document.pdf") # faster on Windows!from pdf_optimizer import convert_pdf_fast
# Auto-optimized conversion
images = convert_pdf_fast("document.pdf", dpi=200, fmt="png")
print(f"Converted {len(images)} pages")from pdf_optimizer import convert_pdf_memory_efficient
# For large PDFs - saves to disk instead of loading into memory
paths = convert_pdf_memory_efficient("large_document.pdf", dpi=150)from pdf_optimizer import OptimizedPDFConverter
converter = OptimizedPDFConverter()
# Choose specific optimization strategy
images = converter.convert_from_path_optimized(
"document.pdf",
optimization_strategy="batch", # single, batch, async, memory
batch_size=5,
dpi=300,
fmt="png"
)from pdf_optimizer import benchmark_strategies
results = benchmark_strategies("test.pdf")
# Find fastest strategy
fastest = min((k for k, v in results.items() if v['success']),
key=lambda x: results[x]['time'])
print(f"Fastest strategy: {fastest}")from pdf_optimizer import get_pdf_info
info = get_pdf_info("document.pdf")
print(f"Pages: {info['pages']}, Size: {info['size_mb']:.1f}MB")The optimizer automatically selects the best strategy, but you can choose manually:
| Strategy | Best For | Description |
|---|---|---|
auto |
All files | Auto-detects optimal strategy (recommended) |
single |
Small files (β€5 pages) | Single-threaded processing |
batch |
Medium files (6-20 pages) | Process in small batches |
async |
Unix systems - or windows ;-) | Async I/O optimization |
memory |
Large files (>50MB) | Memory-efficient streaming |
The optimizer automatically applies these Windows optimizations:
- Single Threading: Forces
thread_count=1to avoid subprocess overhead - pdftocairo: Uses
pdftocairoinstead ofpdftoppm(often faster on Windows) - Process Creation Flags: Optimized subprocess creation with
CREATE_NO_WINDOW - Batch Processing: Automatic batching for large files to reduce memory pressure
- Extended Timeouts: Accounts for Windows process creation delays
Quick conversion with auto-optimization.
- pdf_path: Path to PDF file
- kwargs: Standard pdf2image arguments (dpi, fmt, output_folder, etc.)
- Returns: List of PIL Images
Memory-efficient conversion for large files.
- Returns: List of file paths (images saved to disk)
Applies optimization patch to the original pdf2image.convert_from_path.
- Returns: Boolean indicating success
Benchmark different conversion strategies.
- strategies: List of strategies to test (default: all)
- Returns: Dictionary with performance results
1. ModuleNotFoundError: No module named 'pdf2image'
pip install pdf2image pillow2. Unable to get page count from pdf path
Install a PDF library for fallback page detection:
pip install PyPDF2
# or
pip install pypdf
# or
pip install pdfplumber3. pdftoppm not found
Install Poppler:
- Windows: Download from Poppler releases
- Linux:
sudo apt-get install poppler-utils - macOS:
brew install poppler
4. Still slow on Windows? Make sure the patch is applied:
from pdf_optimizer import apply_patch
success = apply_patch()
print(f"Patch applied: {success}")Run the built-in tests:
python pdf_optimizer.pyThis will:
- Check all dependencies
- Display optimization strategies
- Run benchmarks if a test PDF is found
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
- Clone the repository:
git clone https://github.com/yourusername/pdf2image-optimizer.git
cd pdf2image-optimizer- Install dependencies:
pip install pdf2image pillow PyPDF2- Run tests:
python pdf_optimizer.pyMIT License - see LICENSE file for details.
If this project helped you, please consider giving it a star! β
- pdf2image - The original library this optimizes
- Poppler - The underlying PDF rendering library
- Pillow - Python Imaging Library
- Issues: Please use GitHub Issues for bug reports and feature requests
- Discussions: Use GitHub Discussions for questions and community support
- Email: For private inquiries (add your contact info)
Made with β€οΈ for the Python community
Tired of slow PDF conversions on Windows? This optimizer has got you covered! π