Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Support for the new Command-r7b #703

Closed
3 tasks done
ciprianveg opened this issue Dec 22, 2024 · 7 comments
Closed
3 tasks done

[REQUEST] Support for the new Command-r7b #703

ciprianveg opened this issue Dec 22, 2024 · 7 comments

Comments

@ciprianveg
Copy link

Problem

Hello, can we, please, get support for the 128k context Command-r7b, I would like to use it both as fast tool selection based on user prompt, before calling the big Command-r brother, and also as a draft model accelerator for Command-r or Command-r plus.

Solution

An exl2 supporting the full 128k that can be also used as draft model.

Alternatives

No response

Explanation

Faster Command-r generation and almost instant tool selection.

Examples

No response

Additional context

No response

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@turboderp
Copy link
Member

I've added support in the dev branch, or at least an attempt at it. It appears the architecture is identical to Cohere with the exception of SWA on all but every 4th layer.

It seems to be coherent at least up to the model's native 8k context limit, but although the readme mentions a 128k context length, I don't see any hint of that in the model's config, or in the HF implementation. Is it supposed to use YaRN?

@ciprianveg
Copy link
Author

Hi, thank you for your work on this. Can it be only a config default value issue as it was for previousCommand-r? https://huggingface.co/CohereForAI/c4ai-command-r-v01/discussions/12

@turboderp
Copy link
Member

It might be. You can always try just giving it a longer max_seq_len to override the default. I tested it a bit as just an instruct model, using the old Cohere template, and it does eventually turn incoherent, but I'm not sure it's supposed to be used this way. I.e. it might be fine producing tool calls from a 100k prompt.

@ciprianveg
Copy link
Author

I don't know how to build the windows exllama from the dev branch to be able to test.. I am using it via tabbyapi..

@turboderp
Copy link
Member

There should be a new release shortly.

@turboderp
Copy link
Member

0.2.7 is released now if you missed it. Feel free to open another issue if something is still broken.

@ciprianveg
Copy link
Author

Thank you! I will test it and let you know if any issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants