🚧 [WiP] Add Janus model #36053

yaswanth19 · 2025-02-05T16:39:56Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yaswanth19 · 2025-02-05T17:03:34Z

@zucchini-nlp I’ve started working on Janus and would love to get some guidance. Right now, I’ve just created a skeleton and implemented the ImageProcessing class.

My first major hurdle is the CONFIG. The Janus config on the Hub is quite composite and non-standard. Standard values like hidden_size, num_attention_heads, etc., seem to be hardcoded in their implementation.

From a testing perspective, how should I approach writing the config class? Loading this config directly using AutoConfig.from_pretrained() doesn’t work.

Shall I write an ad hoc script to convert this config into a standard Hugging Face config (similar to convert_weights_to_hf.py, but for config)?

Config: https://huggingface.co/deepseek-ai/Janus-Pro-1B/blob/main/config.json

zucchini-nlp · 2025-02-05T18:43:59Z

@yaswanth19 super cool to see a draft PR!

Yeah, that reminds me of Molmo is modeled, also with hardcoded values for configuration. I am not sure how you usually approach testing. If you will test by matching with the actual weights, then yes, converting a config will be helpful

As a first step, I'd suggest to make a working model code and then convert weights/config. The vision backbone should be very similar to existing CLIP models, and for VQ part feel free to look at Emu3 model here. When the model is converted, we can try to match logits and see in which modules the logits start to diverge. LMK if you are stuck in any place and need help :)

Btw, I would be very interested to see if Janus can handle interleaved generation of image + text in one go 👀 If that's possible, would be super super nice

geetu040 · 2025-02-06T05:20:08Z

Hi @yaswanth19, this is really nice. I am also really interested in the model, do you think I can collaborate with you on this one? I can help with the implementation.

yaswanth19 · 2025-02-06T05:31:18Z

@geetu040 Thanks for your interest and for offering to collaborate! This is quite an ambitious PR for myself, and I’d like to take on the challenge of tackling it myself. That said, I’ll definitely reach out if I get stuck or unable to continue working on it :)

geetu040 · 2025-02-06T15:58:29Z

@geetu040 Thanks for your interest and for offering to collaborate! This is quite an ambitious PR for myself, and I’d like to take on the challenge of tackling it myself. That said, I’ll definitely reach out if I get stuck or unable to continue working on it :)

Sure, understood, I wish you a very Good Luck.

Iterative generation using input embeds

7175c69

yaswanth19 marked this pull request as draft February 5, 2025 16:40

yaswanth19 changed the title ~~Add janus model~~ 🚧 [WiP] Add janus model Feb 5, 2025

Add Janus model

5d6d37a

yaswanth19 force-pushed the add-janus-model branch from 89d564a to 5d6d37a Compare February 5, 2025 16:52

discard changes

fb1b57e

yaswanth19 changed the title ~~🚧 [WiP] Add janus model~~ 🚧 [WiP] Add Janus model Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 [WiP] Add Janus model #36053

🚧 [WiP] Add Janus model #36053

yaswanth19 commented Feb 5, 2025 •

edited

Loading

yaswanth19 commented Feb 5, 2025 •

edited

Loading

zucchini-nlp commented Feb 5, 2025

geetu040 commented Feb 6, 2025

yaswanth19 commented Feb 6, 2025

geetu040 commented Feb 6, 2025

🚧 [WiP] Add Janus model #36053

Are you sure you want to change the base?

🚧 [WiP] Add Janus model #36053

Conversation

yaswanth19 commented Feb 5, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

yaswanth19 commented Feb 5, 2025 • edited Loading

zucchini-nlp commented Feb 5, 2025

geetu040 commented Feb 6, 2025

yaswanth19 commented Feb 6, 2025

geetu040 commented Feb 6, 2025

yaswanth19 commented Feb 5, 2025 •

edited

Loading

yaswanth19 commented Feb 5, 2025 •

edited

Loading