Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚧 [WiP] Add Janus model #36053

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yaswanth19
Copy link
Contributor

@yaswanth19 yaswanth19 commented Feb 5, 2025

What does this PR do?

Fixes #35928

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yaswanth19 yaswanth19 marked this pull request as draft February 5, 2025 16:40
@yaswanth19 yaswanth19 changed the title Add janus model 🚧 [WiP] Add janus model Feb 5, 2025
@yaswanth19
Copy link
Contributor Author

yaswanth19 commented Feb 5, 2025

@zucchini-nlp I’ve started working on Janus and would love to get some guidance. Right now, I’ve just created a skeleton and implemented the ImageProcessing class.

My first major hurdle is the CONFIG. The Janus config on the Hub is quite composite and non-standard. Standard values like hidden_size, num_attention_heads, etc., seem to be hardcoded in their implementation.

From a testing perspective, how should I approach writing the config class? Loading this config directly using AutoConfig.from_pretrained() doesn’t work.

Shall I write an ad hoc script to convert this config into a standard Hugging Face config (similar to convert_weights_to_hf.py, but for config)?

Config: https://huggingface.co/deepseek-ai/Janus-Pro-1B/blob/main/config.json

@yaswanth19 yaswanth19 changed the title 🚧 [WiP] Add janus model 🚧 [WiP] Add Janus model Feb 5, 2025
@zucchini-nlp
Copy link
Member

@yaswanth19 super cool to see a draft PR!

Yeah, that reminds me of Molmo is modeled, also with hardcoded values for configuration. I am not sure how you usually approach testing. If you will test by matching with the actual weights, then yes, converting a config will be helpful

As a first step, I'd suggest to make a working model code and then convert weights/config. The vision backbone should be very similar to existing CLIP models, and for VQ part feel free to look at Emu3 model here. When the model is converted, we can try to match logits and see in which modules the logits start to diverge. LMK if you are stuck in any place and need help :)

Btw, I would be very interested to see if Janus can handle interleaved generation of image + text in one go 👀 If that's possible, would be super super nice

@geetu040
Copy link

geetu040 commented Feb 6, 2025

Hi @yaswanth19, this is really nice. I am also really interested in the model, do you think I can collaborate with you on this one? I can help with the implementation.

@yaswanth19
Copy link
Contributor Author

@geetu040 Thanks for your interest and for offering to collaborate! This is quite an ambitious PR for myself, and I’d like to take on the challenge of tackling it myself. That said, I’ll definitely reach out if I get stuck or unable to continue working on it :)

@geetu040
Copy link

geetu040 commented Feb 6, 2025

@geetu040 Thanks for your interest and for offering to collaborate! This is quite an ambitious PR for myself, and I’d like to take on the challenge of tackling it myself. That said, I’ll definitely reach out if I get stuck or unable to continue working on it :)

Sure, understood, I wish you a very Good Luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Deepseek AI's Janus model
3 participants