- Clear previous chat messages when
LLMInference::load_model
is called - Allow rendering non-ASCII characters on the chat interface generated by the LLMs/SLMs
- Show token generation speed (in tokens/second) for the latest message in the chat interface (#1)