Improve model chat template handling by parsing out the end of output token automatically #6

crimson-knight · 2024-12-19T14:24:35Z

Models use special tokens to designate boundaries within prompts. Most models have a special "end of generation" token that they use when returning the results.

In llama models this is <|eot_id|>, and must be specified on the model class like follows:

model = Llamero::BaseModel.new(model_name: "meta-llama-3-8b-instruct-Q6_K.gguf", chat_template_end_of_generation_token: "<|eot_id|>")

Without this specification, the structured response will fail to parse. It's also inconvenient to have to remember to find this and configure it on the model.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve model chat template handling by parsing out the end of output token automatically #6

Improve model chat template handling by parsing out the end of output token automatically #6

crimson-knight commented Dec 19, 2024

Improve model chat template handling by parsing out the end of output token automatically #6

Improve model chat template handling by parsing out the end of output token automatically #6

Comments

crimson-knight commented Dec 19, 2024