qdrant · sabrinaaquino · Dec 2, 2024 · Oct 22, 2024 · Nov 27, 2024 · Nov 27, 2024
diff --git a/qdrant-landing/content/blog/colpali-optimization.md b/qdrant-landing/content/blog/colpali-optimization.md
@@ -11,7 +11,7 @@ social_preview_image: /blog/colpali-optimization/preview-2.png # Optional image
 # small_preview_image: /blog/colpali-optimization.png # Optional image used for small preview in the list of blog posts
 
 date: 2024-11-27T00:40:24-03:00
-author: Evgeniya Sukhodolskaya # Change this
+author: Evgeniya Sukhodolskaya, Sabrina Aquino # Change this
 featured: yes # if true, this post will be featured on the blog page
 tags: # Change this, related by tags posts will be shown on the blog page
   - colpali
@@ -30,15 +30,22 @@ Here's how we solved these challenges to make ColPali 13x faster without sacrifi
 
 ## The Scaling Dilemma
 
-ColPali generates **~1,030 vectors for just one page of a PDF.** While this is manageable for small-scale tasks, in a real-world production setting where you may need to store hundreds od thousands of PDFs, the challenge of scaling becomes significant.
+ColPali generates **1,030 vectors for just one page of a PDF.** While this is manageable for small-scale tasks, in a real-world production setting where you may need to store hundreds od thousands of PDFs, the challenge of scaling becomes significant.
 
 Consider this scenario:
 
 - **Dataset Size:** 20,000 PDF pages.
-- **Vector Explosion:** Each page generates ~1,030 vectors of 128 dimensions.
-- **Index Complexity:** Trillions of comparisons to build the index comparing every vector pair.
+- **Number of Vectors:** Each page generates ~1,000 vectors of 128 dimensions.
 
-Even advanced indexing algorithms like **HNSW** struggle with this scale, as computational costs grow quadratically. 
+The total number of comparisons is calculated as:
+
+$$
+1,000 \cdot 1,000 \cdot 20,000 \cdot 128 = 2.56 \times 10^{16} \text{ comparisons!}
+$$
+
+
+
+That's trillions of comparisons needed to build the index. Even advanced indexing algorithms like **HNSW** struggle with this scale, as computational costs grow quadratically. 
 
 We turned to a hybrid optimization strategy combining **pooling** (to reduce computational overhead) and **reranking** (to preserve accuracy).
 
@@ -54,54 +61,52 @@ For those eager to explore, the [codebase is available here](https://github.com/
 
 Pooling is well-known in machine learning as a way to compress data while keeping important information intact. For ColPali, we reduced ~1,030 vectors per page to just 38 vectors by pooling rows in the document's 32x32 grid.
 
-The most popular types of pooling are:
+Max and mean pooling are the two most popular types, so we decided to test both approaches on the rows of the grid. Likewise, we could apply pooling on columns, which we plan to explore in the future.
 
 - **Mean Pooling:** Averages values across rows.
 - **Max Pooling:** Selects the maximum value for each feature.
 
-32 vectors represent the pooled rows, while an additional 6 vectors encode contextual information derived from ColPali’s special tokens (e.g., <bos> for the beginning of the sequence, and task-specific instructions like “Describe the image”).
+32 vectors represent the pooled rows, while the final 6 vectors encode contextual information derived from ColPali’s special tokens (e.g., <bos> for the beginning of the sequence, and task-specific instructions like “Describe the image”).
 
 For our experiments, we chose to preserve these 6 additional vectors.
 
 ### The "ColPali as a Reranker" Experiment
 
-Pooling drastically reduces retrieval costs, but there’s a risk of losing fine-grained precision. To address this, we implemented a **two-stage retrieval system** with the pooled vectors for the initial retrieval stage and the original ColPali model for reranking:
+Pooling drastically reduces retrieval costs, but there’s a risk of losing fine-grained precision. To address this, we implemented a **two-stage retrieval system**, where embeddings generated with ColPali were max/mean pooled by grid rows to create lightweight vectors for the initial retrieval stage, followed by reranking with the original high-resolution embeddings:
 
 1. **Pooled Retrieval:** Quickly retrieves the top 200 candidates using lightweight pooled embeddings.
 2. **Full Reranking:** Refines these candidates using the original, high-resolution embeddings, delivering the final top 20 results.
 
-This approach delivers speed without compromising on retrieval quality.
-
 ### Implementation
 
 We created a custom dataset with over 20,000 unique PDF pages by merging:
 
-- **ViDoRe Benchmark:** Designed for document retrieval evaluation.
+- **ViDoRe Benchmark:** Designed for PDF documents retrieval evaluation.
 - **UFO Dataset:** Visually rich documents paired with synthetic queries.
 - **DocVQA Dataset:** A large set of document-derived Q&A pairs.
 
 Each document was processed into 32x32 grids, generating both full-resolution and pooled embeddings. 
 
 ![](/blog/colpali-optimization/rows.png)
 
-These embeddings were stored in the **Qdrant vector database**, configured for speed:
+These embeddings were stored in the **Qdrant vector database**:
 
 - **Full-Resolution Embeddings:** ~1,030 vectors per page.
 - **Pooled Embeddings:** Mean and max pooling variants.
 
-All embeddings were kept in RAM to ensure consistent retrieval performance.
+All embeddings were kept in RAM to avoid caching effect in experiments realted to the speed of retrieval.
 
 
 ### Experiment Setup
 
-We evaluated retrieval quality using 1,000 task-specific queries and the retrieval process followed the two-stage approach:
+We evaluated retrieval quality using 1,000 random sampled queries and the retrieval process followed the two-stage approach:
 1. **Pooled embeddings** retrieved the top 200 candidates.
 2. **Full-resolution embeddings** reranked these candidates to produce the final top 20 results.
 
 To measure performance, we used:
 
 - **NDCG@20:** Measures ranking quality (how well the top results align with expectations).
-- **Recall@20:** Measures the overlap between pooled and full-resolution retrievals.
+- **Recall@20:** Measures the overlap between this method and the original ColPali retrieval.
 
 ## Results
 
@@ -118,17 +123,13 @@ The experiments gave us some very promissing results:
 | **Max**      | 0.759   | 0.656     |
 
 
-Mean pooling offered the ideal balance, combining speed and precision, while max pooling delivered faster results but at the cost of noticeable quality degradation.
-
-### Other Insights
-
-We also explored additional optimizations, including removing `<pad>` tokens, which slightly improved speed without significant quality loss, and applying binary quantization, which increased retrieval speed but introduced minor accuracy drops.
+Mean pooling offered the ideal balance, combining speed and precision. Max pooling did not perform well enough to be considered viable since it sacrificed significant accuracy without delivering a meaningful speed advantage.
 
 ## What’s Next?
 Future experiments could push these results even further:
 
 - Investigating column-wise pooling for additional compression.
-- Testing hybrid half-precision (float16) vectors to balance memory use and speed.
+- Testing half-precision (float16) vectors to balance memory use and speed.
 - Skipping special multivectors during prefetch to streamline retrieval.
 - Combining quantization with oversampling for even faster search.