Skip to content

Commit

Permalink
Use model->gguf_kv for loading the template instead of using the C AP…
Browse files Browse the repository at this point in the history
…I. (ggerganov#10868)

* Bump model_template to 16384 bytes to support larger chat templates.

* Use `model->gguf_kv` for efficiency.
  • Loading branch information
dranger003 authored and arthw committed Dec 20, 2024
1 parent 0cb43f2 commit 4852a5b
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions src/llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22660,15 +22660,15 @@ int32_t llama_chat_apply_template(
std::string curr_tmpl(tmpl == nullptr ? "" : tmpl);
if (tmpl == nullptr) {
GGML_ASSERT(model != nullptr);
// load template from model
std::vector<char> model_template(2048, 0); // longest known template is about 1200 bytes
std::string template_key = "tokenizer.chat_template";
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), model_template.data(), model_template.size());
if (res < 0) {

// load template from model, if available
const auto & it = model->gguf_kv.find("tokenizer.chat_template");
if (it != model->gguf_kv.end() && it->second.size() > 0) {
curr_tmpl = it->second;
}
else {
// worst case: there is no information about template, we will use chatml by default
curr_tmpl = "chatml"; // see llama_chat_apply_template_internal
} else {
curr_tmpl = std::string(model_template.data(), model_template.size());
curr_tmpl = "chatml"; // see llama_chat_apply_template_internal
}
}

Expand Down

0 comments on commit 4852a5b

Please sign in to comment.