Vision multimodal #369

jlamypoirier · 2025-09-26T03:31:26Z

✨ Description

An attempt at integrating multimodal vision models to main. Still a lot of work to do...

tscholak · 2025-10-04T13:16:19Z

fast_llm/layers/vision/config.py

+        hint=FieldHint.architecture,
+    )
+    # TODO: ====== Appropriate name?? ======
+    decoder: BlockSequenceConfig = Field(


tscholak · 2025-10-04T13:27:03Z

fast_llm/layers/vision/vision_encoder.py

+            peft=self._peft,
+        )
+        # TODO: ====== Appropriate name?? ======
+        self.decoder = self._config.decoder.get_layer(


tscholak · 2025-10-04T13:28:17Z

fast_llm/layers/vision/vision_encoder.py

+            peft=self._peft,
+        )
+        # TODO: ====== Hidden dim ======
+        self.adapter = self._config.adapter.get_layer(


I don't think we want to make the adapter part of the encoder, because adapter tensor shapes depend on decoder. And we also want to mix and match existing pre trained encoders and decoders...

It's the same with every module basically, their shapes all need to match. I'm organizing the modules so thy manage their internal hidden shapes, but input and output shapes are managed by the parent modules (hidden_dim argument), so in that case it makes sense to keep the adapter here.

The todo refers to the MLP assuming matching input and output dimensions, that's an easy fix but I haven't gotten to it yet.

jlamypoirier added 9 commits September 22, 2025 17:34

clean history

ecd1918

Vision multimodal

9114ce2

Drop varlen mamba

a44642c

cleanup

ddf2143

cleanup

8ee7d5e

cleanup

43ca913

cleanup

15405a1

stuff

a3dc89d

stuff

414f87e

jlamypoirier mentioned this pull request Sep 26, 2025

Base model interface review #370

Merged

jlamypoirier added 2 commits September 26, 2025 16:25

stuff

4a21360

Merge branch 'jlp/mlp_block' into jlp/vision_multimodal

2180ea5

jlamypoirier changed the base branch from main to jlp/mlp_block September 26, 2025 20:29

Embeddings

bb7c62d

jlamypoirier mentioned this pull request Sep 30, 2025

[Workspace] Dev branch merge attempt #367

Draft

Model interface

47b9a44

Base automatically changed from jlp/mlp_block to main October 3, 2025 23:18

jlamypoirier added 5 commits October 3, 2025 19:33

Merge branch 'main' into jlp/vision_multimodal

09b0215

Fix merge

f31a313

model

3d84972

cleanup

8f8ef19

language_model

6084122

tscholak reviewed Oct 4, 2025

View reviewed changes

jlamypoirier added 3 commits October 6, 2025 16:27

fixes

4a96980

fixes

7854138

Merge branch 'jlp/language_model_block' into jlp/vision_multimodal

0350e17

jlamypoirier changed the base branch from main to jlp/language_model_block October 6, 2025 21:11

Base automatically changed from jlp/language_model_block to main October 6, 2025 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vision multimodal #369

Vision multimodal #369

Uh oh!

jlamypoirier commented Sep 26, 2025

Uh oh!

tscholak Oct 4, 2025

Uh oh!

tscholak Oct 4, 2025

Uh oh!

tscholak Oct 4, 2025

Uh oh!

jlamypoirier Oct 6, 2025

Uh oh!

Uh oh!

Vision multimodal #369

Are you sure you want to change the base?

Vision multimodal #369

Uh oh!

Conversation

jlamypoirier commented Sep 26, 2025

✨ Description

Uh oh!

tscholak Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

tscholak Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

tscholak Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

jlamypoirier Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!