-
Notifications
You must be signed in to change notification settings - Fork 90
Added Eagle training support for Kimi-K2 #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…hidden states generation (sgl-project#57) * add local data path support and more assistant * small refactor * separate out the data-preprocess logic
* add support for qwen3 eagle train * fix * Update README.md * fix * fix and add test * fix code style * feat: add training scripts for qwen3-8B Co-authored-by: sleepcoo <[email protected]> * fix * add 235B config * fix chat template * fix chat template --------- Co-authored-by: Yubo Wang <[email protected]>
Co-authored-by: lukec <[email protected]>
* updated badges * Update README.md --------- Co-authored-by: lukec <[email protected]>
* add wandb args check * fix * opt error log * remove local
Warning Gemini is unable to generate a summary due to a potential policy violation. |
Can you fix the conflict? |
|
I have resolved the conflict based on upstream/main and re - submitted the code. |
self.head_dim = getattr( | ||
config, "head_dim", config.hidden_size // config.num_attention_heads | ||
) | ||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this conflict snuck into the last commit
@xuhaojie-2025 Trying to use this for kimi-k2-0905 but having a bit of a time getting it working. Library issues, some stray bad lines, not using trust_remote_code in various places, outdated kimi_k2.py with bad refs to qk_head_dim, etc. I can struggle through but I'm wondering if perhaps you have an updated or functional branch/commit somewhere I can look at? |
add support for Kimi-K2 eagle train
add target model for Kimi-K2 in specforge/modeling/target/kimi_k2.py
add Kimi-K2 config in configs/kimi-k2-eagle3.json
fix chat template in specforge/data/template.py
When generating the hidden layer, the special dialogue template of Kimi-K2 has been adapted in specforge/data/preprocessing.py
The tokenizer of the Kimi-K2 model cannot automatically use the fasttokenizer. A script is used to generate tokenizer.json, enabling it to use the fasttokenizer interface.