Skip to content

Commit

Permalink
Merge pull request milvus-io#1443 from wxywb/master
Browse files Browse the repository at this point in the history
Correct ColPali name.
  • Loading branch information
zc277584121 authored Oct 24, 2024
2 parents 057424b + 839181c commit c3415b9
Showing 1 changed file with 6 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/use_ColPALI_with_milvus.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/use_ColPALI_with_milvus.ipynb\" target=\"_blank\">\n",
"<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/use_ColPali_with_milvus.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/use_ColPali_with_milvus.ipynb\" target=\"_blank\">\n",
" <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use ColPALI for Multi-Modal Retrieval with Milvus\n",
"# Use ColPali for Multi-Modal Retrieval with Milvus\n",
"\n",
"Modern retrieval models typically use a single embedding to represent text or images. ColBERT, however, is a neural model that utilizes a list of embeddings for each data instance and employs a \"MaxSim\" operation to calculate the similarity between two texts. Beyond textual data, figures, tables, and diagrams also contain rich information, which is often disregarded in text-based information retrieval.\n",
"\n",
Expand All @@ -21,9 +21,9 @@
"$$\n",
"MaxSim function compares a query with a document (what you're searching in) by looking at their token embeddings. For each word in the query, it picks the most similar word from the document (using cosine similarity or squared L2 distance) and sums these maximum similarities across all words in the query\n",
"\n",
"ColPALI is a method that combines ColBERT's multi-vector representation with PaliGemma (a multimodal large language model) to leverage its strong understanding capabilities. This approach enables a page with both text and images to be represented using a unified multi-vector embedding. The embeddings within this multi-vector representation can capture detailed information, improving the performance of retrieval-augmented generation (RAG) for multimodal data.\n",
"ColPali is a method that combines ColBERT's multi-vector representation with PaliGemma (a multimodal large language model) to leverage its strong understanding capabilities. This approach enables a page with both text and images to be represented using a unified multi-vector embedding. The embeddings within this multi-vector representation can capture detailed information, improving the performance of retrieval-augmented generation (RAG) for multimodal data.\n",
"\n",
" In this notebook, we refer to this kind of multi-vector representation as \"ColBERT embeddings\" for generality. However, the actual model being used is the **ColPALI model**. We will demonstrate how to use Milvus for multi-vector retrieval. Building on that, we will introduce how to use ColPALI for retrieving pages based on a given query.\n",
" In this notebook, we refer to this kind of multi-vector representation as \"ColBERT embeddings\" for generality. However, the actual model being used is the **ColPali model**. We will demonstrate how to use Milvus for multi-vector retrieval. Building on that, we will introduce how to use ColPali for retrieving pages based on a given query.\n",
"\n"
]
},
Expand Down Expand Up @@ -52,7 +52,7 @@
"metadata": {},
"source": [
"## Prepare the data\n",
"We will use PDF RAG as our example. You can download [ColBERT](https://arxiv.org/pdf/2004.12832) paper and put it into `./pdf`. ColPALI does not process text directly; instead, the entire page is rasterized into an image. The ColPALI model excels at understanding the textual information contained within these images. Therefore, we will convert each PDF page into an image for processing."
"We will use PDF RAG as our example. You can download [ColBERT](https://arxiv.org/pdf/2004.12832) paper and put it into `./pdf`. ColPali does not process text directly; instead, the entire page is rasterized into an image. The ColPali model excels at understanding the textual information contained within these images. Therefore, we will convert each PDF page into an image for processing."
]
},
{
Expand Down Expand Up @@ -448,7 +448,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we retrieve the original page name. With ColPALI, we can retrieve multimodal documents without the need for complex processing techniques to extract text and images from the documents. By leveraging large vision models, more information—such as tables and figures—can be analyzed without significant information loss."
"Finally, we retrieve the original page name. With ColPali, we can retrieve multimodal documents without the need for complex processing techniques to extract text and images from the documents. By leveraging large vision models, more information—such as tables and figures—can be analyzed without significant information loss."
]
}
],
Expand Down

0 comments on commit c3415b9

Please sign in to comment.