You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ok, after some experimentation, and editing of the tests, the SwiGLU and LayerNorm kernels pass the correctness tests when compared with the reference ones from the cohere modelling code, however it seems that with Cohere something is different in regards to rope, the tests dont pass, but from the error it seems like its the same values, I assume its something with how Cohere calculates the RoPE in float32 and downcasts after. Seeing the comment on the rotate_half function in the cohere modeling code that was just added, it's seems obvious. Cohere slices by odds and evens rather than splitting in half.
🚀 The feature, motivation and pitch
I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
As far as I can tell the FusedLinearCrossEntropy kernel should just need to support scaling the logits by the logit_scale from the config, though I'm unsure whether the rest of the rest of the kernels would or would not work as is.
Thanks for the work
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: