Add support for Hermes Pro function calling to llama-cpp-python

Hey, thank you so much for the great model and this repo!

Would you be willing to add support for this chat format to llama-cpp-python, so that we can use function calling (and JSON mode) with their OpenAI compatible server?

Right now, llama-cpp-python offers the only OpenAI compatible server with constrained/grammar based sampling for CPU that I am aware of. It has been very convenient to use with the functionary models, as it is plug&play with the openai client and very reliable thanks to the grammar sampling.

Besides functionary, there is already support for a format called chatml-function-calling which might be similar enough to the Hermes format to be able to just adapt it instead of writing something from scratch:

https://github.com/abetlen/llama-cpp-python/blob/6eb25231e4dafeb792ffc9597c27330344c970b1/llama_cpp/llama_chat_format.py#L2045

All that would need to be added to the library is a handler like that.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Hermes Pro function calling to llama-cpp-python #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for Hermes Pro function calling to llama-cpp-python #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions