Scope of adding vision-language model? #937

innat · 2023-03-29T15:28:29Z

innat
Mar 29, 2023

Originally posted here.

Due to the current kaggle-competition (this one), I've noticed some popular vision-language model, which could be part of keras-cv/keras-nlp model component. But due to the mix components (cv, nlp), it might be confusing to add them in core API, either keras-cv, or keras-nlp. Here are some models (current sota):

Model like BLIP consist of vision transformer + Bert (with modification, paper contribute), for BLIP 2, the text decoder maybe T5 or something. Adding model like BLIP to keras-nlp would be (comparatively) straightforward than keras-cv, but can't promise.

cc. @mattdangerw @chenmoneygithub @bhack @abheesht17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope of adding vision-language model? #937

{{title}}

Replies: 0 comments

Select a reply

Scope of adding vision-language model? #937

innat Mar 29, 2023

Replies: 0 comments

innat
Mar 29, 2023