Video-LLaVA with Transformers #29630

Kamakshi8104 · 2024-03-13T13:14:50Z

Feature request

I want to use Video-LLaVA - https://huggingface.co/LanguageBind/Video-LLaVA-Pretrain-7B model with outlines for constrained generation.
Can video-LLaVA be used with Transformers for me to be able to do this?
This is the connector I want to use for outlines: dottxt-ai/outlines#728

Motivation

Video-LLaVA is a good open source model for video based question answering and compatibility with libraries such as outlines through huggingface would be beneficial.

Your contribution

I have tried loading video-LLaVA with transformers but I keep getting an error. I hope this can be implemented
on trying to use : https://huggingface.co/LanguageBind/Video-LLaVA-Pretrain-7B
I get this error:
python3 video.py
config.json: 100%|█████████████████████████| 1.12k/1.12k [00:00<00:00, 1.11MB/s]
You are using a model of type llava to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):
File "/Users/kamakshiramamurthy/Desktop/GSoC/outline/video.py", line 5, in
pipe = pipeline("text-generation", model="LanguageBind/Video-LLaVA-Pretrain-7B")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/pipelines/init.py", line 905, in pipeline
framework, model = infer_framework_load_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model
raise ValueError(
ValueError: Could not load model LanguageBind/Video-LLaVA-Pretrain-7B with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>). See the original errors:

while loading with AutoModelForCausalLM, an error is thrown:
Traceback (most recent call last):
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/pipelines/base.py", line 279, in infer_framework_load_model
model = model_class.from_pretrained(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, FuyuConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MptConfig, MusicgenConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

while loading with LlamaForCausalLM, an error is thrown:
Traceback (most recent call last):
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/pipelines/base.py", line 279, in infer_framework_load_model
model = model_class.from_pretrained(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kamakshiramamurthy/miniconda3/envs/out_test/lib/python3.12/site-packages/transformers/modeling_utils.py", line 3234, in from_pretrained
raise EnvironmentError(
OSError: LanguageBind/Video-LLaVA-Pretrain-7B does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

amyeroberts · 2024-03-13T15:52:39Z

Hi @Kamakshi8104, thanks for raising an issue!

The instructions for using the VideoLlaVa model are on the model page of the checkpoint. If these aren't working, you can open a discussion on the model page, detailing the issues you're encountering.

There isn't currently a model implementation which is compatible with the transformers library either on the hub or in the repo (to the best of my knowledge). If you would like it to be added, you can open a new model request in this repo. I'd suggest opening a discussion on the model page on the hub too, as the authors might be interested in contributing this.

Kamakshi8104 · 2024-03-13T16:38:49Z

Thanks !! Ive opened an issue #29640 for compatibility with transformers

amyeroberts closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video-LLaVA with Transformers #29630

Video-LLaVA with Transformers #29630

Kamakshi8104 commented Mar 13, 2024 •

edited

Loading

amyeroberts commented Mar 13, 2024

Kamakshi8104 commented Mar 13, 2024

Video-LLaVA with Transformers #29630

Video-LLaVA with Transformers #29630

Comments

Kamakshi8104 commented Mar 13, 2024 • edited Loading

Feature request

Motivation

Your contribution

amyeroberts commented Mar 13, 2024

Kamakshi8104 commented Mar 13, 2024

Kamakshi8104 commented Mar 13, 2024 •

edited

Loading