Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing ColPali for Retrieval at Scale #1323

Merged
merged 7 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions qdrant-landing/content/blog/colpali-optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
title: "Optimizing ColPali for Retrieval at Scale, 13x Faster Results"
draft: false
slug: colpali-qdrant-optimization # Change this slug to your page slug if needed
short_description: Learn how we optimized ColPali, a document retrieval system, to be 13x faster for datasets of 20,000+ PDFs. Discover how pooling and reranking boost scalability while maintaining precision. # Change this
description: ColPali combines text and visual contexts for precise document retrieval, but scaling posed challenges. We achieved 13x faster retrieval using pooling and reranking, reducing vector counts by ~30x while retaining near-original accuracy. Explore the open-source demo to optimize your retrieval workflows! # Change this

preview_image: /blog/colpali-optimization/preview-2.png # Change this
social_preview_image: /blog/colpali-optimization/preview-2.png # Optional image used for link previews
# title_preview_image: /blog/Article-Image.png # Optional image used for blog post title
# small_preview_image: /blog/colpali-optimization.png # Optional image used for small preview in the list of blog posts

date: 2024-11-27T00:40:24-03:00
author: Evgeniya Sukhodolskaya # Change this
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
featured: yes # if true, this post will be featured on the blog page
tags: # Change this, related by tags posts will be shown on the blog page
- colpali
- reranker
- optimization
- qdrant
- quantization
- multimodal


---

ColPali is a fascinating leap in document retrieval. Its precision in handling visually rich PDFs is phenomenal, but scaling it to handle real-world datasets is of course not without computational challenges.

Here's how we solved these challenges to make ColPali 13x faster without sacrificing the precision it’s known for.

sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
## The Scaling Dilemma

ColPali generates **~1,030 vectors for just one page of a PDF.** While this is manageable for small-scale tasks, in a real-world production setting where you may need to store hundreds od thousands of PDFs, the challenge of scaling becomes significant.

Consider this scenario:

- **Dataset Size:** 20,000 PDF pages.
- **Vector Explosion:** Each page generates ~1,030 vectors of 128 dimensions.
- **Index Complexity:** Trillions of comparisons to build the index comparing every vector pair.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

Even advanced indexing algorithms like **HNSW** struggle with this scale, as computational costs grow quadratically.

sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
We turned to a hybrid optimization strategy combining **pooling** (to reduce computational overhead) and **reranking** (to preserve accuracy).

Before we go any deeper, watch our [Webinar video](https://www.youtube.com/live/_h6SN1WwnLs?si=n8gwiIjJ5dnfucXC) for the full demo walkthrough.
<iframe width="560" height="315" src="https://www.youtube.com/embed/_h6SN1WwnLs?si=HqJTcwDbdk4j_L0Y&amp;start=922" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

For those eager to explore, the [codebase is available here](https://github.com/qdrant/demo-colpali-optimized).


## Two-Stage Retrieval Process

### Pooling

Pooling is well-known in machine learning as a way to compress data while keeping important information intact. For ColPali, we reduced ~1,030 vectors per page to just 38 vectors by pooling rows in the document's 32x32 grid.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

The most popular types of pooling are:

- **Mean Pooling:** Averages values across rows.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- **Max Pooling:** Selects the maximum value for each feature.

32 vectors represent the pooled rows, while an additional 6 vectors encode contextual information derived from ColPali’s special tokens (e.g., <bos> for the beginning of the sequence, and task-specific instructions like “Describe the image”).
mrscoopers marked this conversation as resolved.
Show resolved Hide resolved

For our experiments, we chose to preserve these 6 additional vectors.

### The "ColPali as a Reranker" Experiment

Pooling drastically reduces retrieval costs, but there’s a risk of losing fine-grained precision. To address this, we implemented a **two-stage retrieval system** with the pooled vectors for the initial retrieval stage and the original ColPali model for reranking:

1. **Pooled Retrieval:** Quickly retrieves the top 200 candidates using lightweight pooled embeddings.
2. **Full Reranking:** Refines these candidates using the original, high-resolution embeddings, delivering the final top 20 results.

This approach delivers speed without compromising on retrieval quality.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

### Implementation

We created a custom dataset with over 20,000 unique PDF pages by merging:

- **ViDoRe Benchmark:** Designed for document retrieval evaluation.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- **UFO Dataset:** Visually rich documents paired with synthetic queries.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- **DocVQA Dataset:** A large set of document-derived Q&A pairs.

Each document was processed into 32x32 grids, generating both full-resolution and pooled embeddings.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

![](/blog/colpali-optimization/rows.png)
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

These embeddings were stored in the **Qdrant vector database**, configured for speed:
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

- **Full-Resolution Embeddings:** ~1,030 vectors per page.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- **Pooled Embeddings:** Mean and max pooling variants.

All embeddings were kept in RAM to ensure consistent retrieval performance.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved


### Experiment Setup

We evaluated retrieval quality using 1,000 task-specific queries and the retrieval process followed the two-stage approach:
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
1. **Pooled embeddings** retrieved the top 200 candidates.
2. **Full-resolution embeddings** reranked these candidates to produce the final top 20 results.

To measure performance, we used:

- **NDCG@20:** Measures ranking quality (how well the top results align with expectations).
- **Recall@20:** Measures the overlap between pooled and full-resolution retrievals.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

## Results

The experiments gave us some very promissing results:

- **Speed:** Retrieval time improved **13x** compared to full-resolution embeddings alone.
- **Accuracy:** Mean pooling preserved retrieval quality nearly identical to the original ColPali.

### Metrics

| Pooling Type | NDCG@20 | Recall@20 |
|--------------|---------|-----------|
| **Mean** | 0.952 | 0.917 |
| **Max** | 0.759 | 0.656 |


Mean pooling offered the ideal balance, combining speed and precision, while max pooling delivered faster results but at the cost of noticeable quality degradation.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

### Other Insights

We also explored additional optimizations, including removing `<pad>` tokens, which slightly improved speed without significant quality loss, and applying binary quantization, which increased retrieval speed but introduced minor accuracy drops.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved

## What’s Next?
Future experiments could push these results even further:

- Investigating column-wise pooling for additional compression.
- Testing hybrid half-precision (float16) vectors to balance memory use and speed.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- Skipping special multivectors during prefetch to streamline retrieval.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved
- Combining quantization with oversampling for even faster search.
sabrinaaquino marked this conversation as resolved.
Show resolved Hide resolved


### Try It Yourself

Curious to see this in action? Explore the full codebase and experiment with ColPali optimizations:

- **Demo Notebook:** [GitHub Repository](https://github.com/qdrant/demo-colpali-optimized)
- **Webinar Walkthrough:** [Watch Here](https://www.youtube.com/live/_h6SN1WwnLs?si=n8gwiIjJ5dnfucXC)

[Join the community](https://discord.com/invite/qdrant) and share your results!

---
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.