You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/// A performance-friendly wrapper around [LlamaModel::get_chat_template] which is then
40
+
/// fed into [LlamaModel::apply_chat_template] to convert a list of messages into an LLM
41
+
/// prompt. Internally the template is stored as a CString to avoid round-trip conversions
42
+
/// within the FFI.
43
+
#[derive(Eq,PartialEq,Clone,PartialOrd,Ord,Hash)]
44
+
pubstructLlamaChatTemplate(CString);
45
+
46
+
implLlamaChatTemplate{
47
+
/// Create a new template from a string. This can either be the name of a llama.cpp [chat template](https://github.com/ggerganov/llama.cpp/blob/8a8c4ceb6050bd9392609114ca56ae6d26f5b8f5/src/llama-chat.cpp#L27-L61)
48
+
/// like "chatml" or "llama3" or an actual Jinja template for llama.cpp to interpret.
Err(InternalChatTemplateError::RetryWithLargerBuffer(unexpected_len)) => panic!("Was told that the template length was {actual_len} but now it's {unexpected_len}"),
525
+
}
526
+
}
446
527
}
447
528
448
529
/// Loads a model from a file.
@@ -526,15 +607,25 @@ impl LlamaModel {
526
607
/// Apply the models chat template to some messages.
527
608
/// See https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
528
609
///
529
-
/// `tmpl` of None means to use the default template provided by llama.cpp for the model
610
+
/// Unlike the llama.cpp apply_chat_template which just randomly uses the ChatML template when given
611
+
/// a null pointer for the template, this requires an explicit template to be specified. If you want to
612
+
/// use "chatml", then just do `LlamaChatTemplate::new("chatml")` or any other model name or template
613
+
/// string.
614
+
///
615
+
/// Use [Self::get_chat_template] to retrieve the template baked into the model (this is the preferred
616
+
/// mechanism as using the wrong chat template can result in really unexpected responses from the LLM).
617
+
///
618
+
/// You probably want to set `add_ass` to true so that the generated template string ends with a the
619
+
/// opening tag of the assistant. If you fail to leave a hanging chat tag, the model will likely generate
620
+
/// one into the output and the output may also have unexpected output aside from that.
530
621
///
531
622
/// # Errors
532
623
/// There are many ways this can fail. See [`ApplyChatTemplateError`] for more information.
533
624
#[tracing::instrument(skip_all)]
534
625
pubfnapply_chat_template(
535
626
&self,
536
-
tmpl:Option<String>,
537
-
chat:Vec<LlamaChatMessage>,
627
+
tmpl:&LlamaChatTemplate,
628
+
chat:&[LlamaChatMessage],
538
629
add_ass:bool,
539
630
) -> Result<String,ApplyChatTemplateError>{
540
631
// Buffer is twice the length of messages per their recommendation
0 commit comments