Releases: mistralai/mistral-common
Patch Release v1.5.1
What's Changed
- [From model] Make from model much stricter by @patrickvonplaten in #65
Full Changelog: v1.5.0...v1.5.1
1.5.0 - Mistral Tokenizer v7 (new System Prompt + Fn calling)
Mistral's newest tokenizer has two major improvements:
System prompt
Similar to other tokenization schemes the system prompt is now treated as a "normal" message encapsulated by [SYSTEM_PROMPT] ...[\SYSTEM_PROMPT]
E.g.
from mistral_common.protocol.instruct.messages import (
UserMessage,
SystemMessage,
AssistantMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
# Load Mistral tokenizer
tokenizer = MistralTokenizer.v7()
# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
SystemMessage(content="You are a funny AI assistant. Always make jokes."),
UserMessage(content="What's the weather like today in Paris"),
],
model="joker",
)
)
tokens, text = tokenized.tokens, tokenized.text
print(text)
# <s>[SYSTEM_PROMPT]โYouโareโaโfunnyโAIโassistant.โAlwaysโmakeโjokes.[/SYSTEM_PROMPT][INST]โWhat'sโtheโweatherโlikeโtodayโinโParis[/INST]
Improve function calling
A new [TOOL_CONTENT]
is added if trained with correctly should improve the accuracy of function calling.
from mistral_common.protocol.instruct.messages import (
UserMessage,
SystemMessage,
AssistantMessage,
ToolMessage
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
Function,
Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
# Load Mistral tokenizer
tokenizer = MistralTokenizer.v7()
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris"),
AssistantMessage(content="", tool_calls=[
{
"id": "bbc5b7ede",
"type": "function",
"function": {
"name": "weather",
"arguments": '{"location": "Paris", "format": "celsius"}',
},
}
]),
ToolMessage(content="24 degrees celsius", tool_call_id="bbc5b7ede"),
],
model="joker",
)
)
tokens, text = tokenized.tokens, tokenized.text
# Count the number of tokens
print(text)
# <s>[AVAILABLE_TOOLS]โ[{"type":โ"function",โ"function":โ{"name":โ"get_current_weather",โ"description":โ"Getโtheโcurrentโweather",โ"parameters":โ{"type":โ"object",โ"properties":โ{"location":โ{"type":โ"string",โ"description":โ"Theโcityโandโstate,โe.g.โSanโFrancisco,โCA"},โ"format":โ{"type":โ"string",โ"enum":โ["celsius",โ"fahrenheit"],โ"description":โ"Theโtemperatureโunitโtoโuse.โInferโthisโfromโtheโusersโlocation."}},โ"required":โ["location",โ"format"]}}}][/AVAILABLE_TOOLS][INST]โWhat\'sโtheโweatherโlikeโtodayโinโParis[/INST][TOOL_CALLS]โ[{"name":โ"weather",โ"arguments":โ{"location":โ"Paris",โ"format":โ"celsius"},โ"id":โ"bbc5b7ede"}]</s>[TOOL_RESULTS]โbbc5b7ede[TOOL_CONTENT]โ24โdegreesโcelsius[/TOOL_RESULTS]'
Patch release - v1.4.4
Make sure broken user envs of cv2 (which sadly happens more often than not) don't impede users from using text-only models.
What's Changed
- [cv2] Make sure broken environments don't lead to errors by @patrickvonplaten in #58
Full Changelog: v1.4.3...v1.4.4
Patch release v1.4.3 - Make cv2 install optional
As per discussion: vllm-project/vllm#8650 make cv2 optional.
What's Changed
- [Error Message] Improve error message by @patrickvonplaten in #54
- Make cv2 optional by @patrickvonplaten in #56
Full Changelog: v1.4.2...v1.4.3
Patch release v1.4.2
Make sure to send user agent for downloading pictures that require a user agent. E.g.:
from mistral_common.protocol.instruct.messages import (
UserMessage,
TextChunk,
ImageURLChunk,
ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.from_model("pixtral")
url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"
url1 = "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg"
url2 = "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg"
# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="Can this animal"),
ImageURLChunk(image_url=url1),
TextChunk(text="live here?"),
ImageURLChunk(image_url=url2),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))
What's Changed
- typo: Correct "recieved" to "received" in MistralRequestValidator by @CharlesCNorton in #21
- Add headers to download images by @ywang96 in #51
- [Tests] Fix expected value by @patrickvonplaten in #50
- 1.4.2 release by @patrickvonplaten in #53
New Contributors
- @CharlesCNorton made their first contribution in #21
- @ywang96 made their first contribution in #51
Full Changelog: v1.4.1...v1.4.2
Patch release v1.4.1 - Use cv2 resize instead of PIL
cv2 resize gives significantly better results when running pixtral in inference as compared to PIL hence we're making a patch release to resize images using cv2 as shown here: bae45b2
v1.4.0 - Mistral common goes ๐ผ๏ธ
Pixtral is out!
Mistral common has image support! You can now pass images and URLs alongside text into the user message.
pip install --upgrade mistral_common
Images
You can encode images as follows
from mistral_common.protocol.instruct.messages import (
UserMessage,
TextChunk,
ImageURLChunk,
ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.from_model("pixtral")
image = Image.new('RGB', (64, 64))
# tokenize images and text
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="Describe this image"),
ImageChunk(image=image),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))
Image URLs
You can pass image url which will be automatically downloaded
url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"
# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="Can this animal"),
ImageURLChunk(image_url=url_dog),
TextChunk(text="live here?"),
ImageURLChunk(image_url=url_mountain),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))
ImageData
You can also pass image encoded as base64
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="What is this?"),
ImageURLChunk(image_url="...
Patch release 1.3.4 - Loosen pydantic requirement
In this patch release the pydantic requirement is loosened to be <= 3.0.0
as noticed in multiple issues, e.g.:
Tekkenizer
Tekkenizer
The new Tekkenizer class is based on Open AI's tiktoken and supports the new Mistral-Nemo model.
Tekkenizer always makes use of version 3 or higher.
Examples:
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer = MistralTokenizer.from_model("...")
Function calling (just like before)
# Import needed packages:
from mistral_common.protocol.instruct.messages import (
UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
Function,
Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
# Load Mistral tokenizer
model_name = "..."
tokenizer = MistralTokenizer.from_model(model_name)
# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris"),
],
model=model_name,
)
)
tokens, text = tokenized.tokens, tokenized.text
# Count the number of tokens
print(len(tokens))
What's Changed
- v1.3.0 by @patrickvonplaten in #27
Full Changelog: v1.3.0...v1.3.1
Patch release: Fix FIM tokenizer
As noticed here: https://huggingface.co/mistralai/Codestral-22B-v0.1/discussions/10
The wrong tokenizer was used for FIM. This patch release fixes that so that the following works correctly:
from mistral_common.tokens.tokenizers.base import FIMRequest
from mistral_common_private.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.v3()
tokenized = tokenizer.encode_fim(FIMRequest(prompt="def f(", suffix="return a + b"))
assert tokenized.text == "<s>[SUFFIX]returnโaโ+โb[PREFIX]โdefโf("