Support for Cohere models #248

nyxkrage · 2024-09-15T11:08:49Z

🚀 The feature, motivation and pitch

I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
As far as I can tell the FusedLinearCrossEntropy kernel should just need to support scaling the logits by the logit_scale from the config, though I'm unsure whether the rest of the rest of the kernels would or would not work as is.

Thanks for the work

Alternatives

No response

Additional context

No response

nyxkrage · 2024-09-15T14:30:44Z

Ok, after some experimentation, and editing of the tests, the SwiGLU and LayerNorm kernels pass the correctness tests when compared with the reference ones from the cohere modelling code, however it seems that with Cohere something is different in regards to rope, the tests dont pass, but from the error it seems like its the same values, ~~I assume its something with how Cohere calculates the RoPE in float32 and downcasts after.~~ Seeing the comment on the rotate_half function in the cohere modeling code that was just added, it's seems obvious. Cohere slices by odds and evens rather than splitting in half.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Cohere models #248

Support for Cohere models #248

nyxkrage commented Sep 15, 2024

nyxkrage commented Sep 15, 2024 •

edited

Loading

Support for Cohere models #248

Support for Cohere models #248

Comments

nyxkrage commented Sep 15, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

nyxkrage commented Sep 15, 2024 • edited Loading

nyxkrage commented Sep 15, 2024 •

edited

Loading