Skip to content

Commit

Permalink
Add multi-modal use case section (#8823)
Browse files Browse the repository at this point in the history
  • Loading branch information
jerryjliu authored Nov 10, 2023
1 parent 9f8a08d commit 336a88d
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Associated projects
use_cases/chatbots.md
use_cases/agents.md
use_cases/extraction.md
use_cases/multimodal.md

.. toctree::
:maxdepth: 2
Expand Down
39 changes: 39 additions & 0 deletions docs/use_cases/multimodal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Multi-modal

LlamaIndex offers capabilities to not only build language-based applications, but also **multi-modal** applications - combining language and images.

## Types of Multi-modal Use Cases

This space is actively being explored right now, but there are some fascinating use cases popping up.

### Multi-Modal RAG

All the core RAG concepts: indexing, retrieval, and synthesis, can be extended into the image setting.

- The input could be text or image.
- The stored knowledge base can consist of text or images.
- The inputs to response generation can be text or image.
- The final response can be text or image.

Check out our guides below:

```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/gpt4v_multi_modal_retrieval.ipynb
[Old] Multi-modal retrieval with CLIP </examples/multi_modal/multi_modal_retrieval.ipynb>
```

### Retrieval-Augmented Image Captioning

Oftentimes understanding an image requires looking up information from a knowledge base. A flow here is retrieval-augmented image captioning - first caption the image with a multi-modal model, then refine the caption by retrieving from a text corpus.

Check out our guides below:

```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/llava_multi_modal_tesla_10q.ipynb
```

0 comments on commit 336a88d

Please sign in to comment.