Skip to content

Conversation

xuhaojie-2025
Copy link

add support for Kimi-K2 eagle train

  • add target model for Kimi-K2 in specforge/modeling/target/kimi_k2.py

  • add Kimi-K2 config in configs/kimi-k2-eagle3.json

  • fix chat template in specforge/data/template.py

  • When generating the hidden layer, the special dialogue template of Kimi-K2 has been adapted in specforge/data/preprocessing.py

  • The tokenizer of the Kimi-K2 model cannot automatically use the fasttokenizer. A script is used to generate tokenizer.json, enabling it to use the fasttokenizer interface.

FlamingoPg and others added 14 commits July 23, 2025 02:49
…hidden states generation (sgl-project#57)

* add local data path support and more assistant

* small refactor

* separate out the data-preprocess logic
* add support for qwen3 eagle train

* fix

* Update README.md

* fix

* fix and add test

* fix code style

* feat: add training scripts for qwen3-8B
Co-authored-by: sleepcoo <[email protected]>

* fix

* add 235B config

* fix chat template

* fix chat template

---------

Co-authored-by: Yubo Wang <[email protected]>
* updated badges

* Update README.md

---------

Co-authored-by: lukec <[email protected]>
* add wandb args check

* fix

* opt error log

* remove local
Copy link
Contributor

Warning

Gemini is unable to generate a summary due to a potential policy violation.

@sleepcoo
Copy link
Collaborator

sleepcoo commented Aug 4, 2025

Can you fix the conflict?

@xuhaojie-2025
Copy link
Author

Can you fix the conflict?

@xuhaojie-2025
Copy link
Author

Can you fix the conflict?

I have resolved the conflict based on upstream/main and re - submitted the code.

self.head_dim = getattr(
config, "head_dim", config.hidden_size // config.num_attention_heads
)
<<<<<<< HEAD
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this conflict snuck into the last commit

@jondurbin
Copy link

@xuhaojie-2025 Trying to use this for kimi-k2-0905 but having a bit of a time getting it working. Library issues, some stray bad lines, not using trust_remote_code in various places, outdated kimi_k2.py with bad refs to qk_head_dim, etc. I can struggle through but I'm wondering if perhaps you have an updated or functional branch/commit somewhere I can look at?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.