-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚧 [WiP] Add Janus model #36053
base: main
Are you sure you want to change the base?
🚧 [WiP] Add Janus model #36053
Conversation
89d564a
to
5d6d37a
Compare
@zucchini-nlp I’ve started working on Janus and would love to get some guidance. Right now, I’ve just created a skeleton and implemented the ImageProcessing class. My first major hurdle is the CONFIG. The Janus config on the Hub is quite composite and non-standard. Standard values like From a testing perspective, how should I approach writing the config class? Loading this config directly using AutoConfig.from_pretrained() doesn’t work. Shall I write an ad hoc script to convert this config into a standard Hugging Face config (similar to convert_weights_to_hf.py, but for config)? Config: https://huggingface.co/deepseek-ai/Janus-Pro-1B/blob/main/config.json |
@yaswanth19 super cool to see a draft PR! Yeah, that reminds me of Molmo is modeled, also with hardcoded values for configuration. I am not sure how you usually approach testing. If you will test by matching with the actual weights, then yes, converting a config will be helpful As a first step, I'd suggest to make a working model code and then convert weights/config. The vision backbone should be very similar to existing CLIP models, and for VQ part feel free to look at Emu3 model here. When the model is converted, we can try to match logits and see in which modules the logits start to diverge. LMK if you are stuck in any place and need help :) Btw, I would be very interested to see if Janus can handle interleaved generation of image + text in one go 👀 If that's possible, would be super super nice |
Hi @yaswanth19, this is really nice. I am also really interested in the model, do you think I can collaborate with you on this one? I can help with the implementation. |
@geetu040 Thanks for your interest and for offering to collaborate! This is quite an ambitious PR for myself, and I’d like to take on the challenge of tackling it myself. That said, I’ll definitely reach out if I get stuck or unable to continue working on it :) |
Sure, understood, I wish you a very Good Luck. |
What does this PR do?
Fixes #35928
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.