Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please release the code to support Wan 2.1 #8

Open
deepbeepmeep opened this issue Mar 1, 2025 · 8 comments
Open

Please release the code to support Wan 2.1 #8

deepbeepmeep opened this issue Mar 1, 2025 · 8 comments

Comments

@deepbeepmeep
Copy link

Hello

I would very happy to add RifleX to a new WAN tool I about to release. I would be grateful if you could release the code.

@gracezhao1997
Copy link
Collaborator

Thank you for your attention! We are currently reorganizing the code and will release it soon. By the way, how do users typically use these models to generate videos, and which repositories should we support to facilitate this process?

@deepbeepmeep
Copy link
Author

These models are very recent so it is hard to give you a feedback, but Wan, Hunyuan and CogvideoX are the most popular at the moment

I understand it is not easy to produce a plug and play RifleX demo given its code structure. In fact we just need the RIFLEx parameters (k, N_k) for each Wan model. It would help if you could please share them.

@deepbeepmeep
Copy link
Author

I have used you identify tool with multiples values of N (the first observed repetition frame in latent space), I keep getting 4 as a value of k. I have then modified the code equivalent to get_1d_rotary_pos_embed in Wan.
The result is not as good as with Hunyuan (repetition is still there, visual artifacts). Is 4 wrong for k ? Any recommendation ? Many thanks in advance

@zhuhz22
Copy link
Collaborator

zhuhz22 commented Mar 2, 2025

Hi @deepbeepmeep , we found k=6 in Wan2.1. We observed repetition at about 6.5s, where the latent frame index is about 21, and for k=6, its N_k is 26, so we set k=6 as it's the nearest frequency.

Also,there's a tip that the meaning of freqs in Wan's code is different from that of Hunyuan. So the code shall be

def rope_params_riflex(max_seq_len, dim, theta=10000, L_test=30, k=6):
    assert dim % 2 == 0
    exponents = torch.arange(0, dim, 2, dtype=torch.float64).div(dim)
    inv_theta_pow = 1.0 / torch.pow(theta, exponents)
    
    inv_theta_pow[k-1] = 0.9 * 2 * torch.pi / L_test
        
    freqs = torch.outer(torch.arange(max_seq_len), inv_theta_pow)
    freqs = torch.polar(torch.ones_like(freqs), freqs)
    return freqs

rather than freqs[k-1]= 0.9 * 2 * torch.pi / L_test.

We're currently reorganizing the code, and will release support for Wan soon.

@deepbeepmeep
Copy link
Author

deepbeepmeep commented Mar 2, 2025

Thank you. I did integrate the code my application Wan2.1GP.
https://github.com/deepbeepmeep/Wan2GP
It is nice but unless I did something wrong it doesn't seem to be as efficient as with Hunyuanvideo: this time the max is 8s and it seems there is a higher chance to get some repetition

@markrmiller
Copy link

@gracezhao1997

By the way, how do users typically use these models to generate videos

I think via ComfyUI has got to be the most popular way...

@IntellectzProductions
Copy link

Yes, please make sure the comfy is supported. Kijai's Wrapper and the Native set nodes.

@4ever-AI
Copy link

4ever-AI commented Mar 3, 2025

I also agree about ComfyUI. It would be great if you could support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants