Embed arbitrary modalities (images, audio, documents, etc) into large language models.
multimodal
multi-modality
large-language-models
llm
vision-language-model
llava
large-context
large-multimodal-models
-
Updated
Mar 27, 2024 - Python