Skip to content

Releases: mistralai/mistral-common

Patch Release v1.5.1

20 Nov 18:07
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.5.0...v1.5.1

1.5.0 - Mistral Tokenizer v7 (new System Prompt + Fn calling)

15 Nov 19:32
Compare
Choose a tag to compare

Mistral's newest tokenizer has two major improvements:

System prompt

Similar to other tokenization schemes the system prompt is now treated as a "normal" message encapsulated by [SYSTEM_PROMPT] ...[\SYSTEM_PROMPT]

E.g.

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    SystemMessage,
    AssistantMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer
tokenizer = MistralTokenizer.v7()

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            SystemMessage(content="You are a funny AI assistant. Always make jokes."),
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model="joker",
    )
)
tokens, text = tokenized.tokens, tokenized.text

print(text)
# <s>[SYSTEM_PROMPT]โ–Youโ–areโ–aโ–funnyโ–AIโ–assistant.โ–Alwaysโ–makeโ–jokes.[/SYSTEM_PROMPT][INST]โ–What'sโ–theโ–weatherโ–likeโ–todayโ–inโ–Paris[/INST]

Improve function calling

A new [TOOL_CONTENT] is added if trained with correctly should improve the accuracy of function calling.

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    SystemMessage,
    AssistantMessage,
    ToolMessage
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer
tokenizer = MistralTokenizer.v7()

tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
      tools=[
                  Tool(
                      function=Function(
                          name="get_current_weather",
                          description="Get the current weather",
                          parameters={
                              "type": "object",
                              "properties": {
                                  "location": {
                                      "type": "string",
                                      "description": "The city and state, e.g. San Francisco, CA",
                                  },
                                  "format": {
                                      "type": "string",
                                      "enum": ["celsius", "fahrenheit"],
                                      "description": "The temperature unit to use. Infer this from the users location.",
                                  },
                              },
                              "required": ["location", "format"],
                          },
                      )
                  )
              ],
              messages=[
                  UserMessage(content="What's the weather like today in Paris"),
                  AssistantMessage(content="", tool_calls=[
                    {
                        "id": "bbc5b7ede",
                        "type": "function",
                        "function": {
                            "name": "weather",
                            "arguments": '{"location": "Paris", "format": "celsius"}',
                        },
                    }
                ]),
                ToolMessage(content="24 degrees celsius", tool_call_id="bbc5b7ede"),
              ],
        model="joker",
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(text)
# <s>[AVAILABLE_TOOLS]โ–[{"type":โ–"function",โ–"function":โ–{"name":โ–"get_current_weather",โ–"description":โ–"Getโ–theโ–currentโ–weather",โ–"parameters":โ–{"type":โ–"object",โ–"properties":โ–{"location":โ–{"type":โ–"string",โ–"description":โ–"Theโ–cityโ–andโ–state,โ–e.g.โ–Sanโ–Francisco,โ–CA"},โ–"format":โ–{"type":โ–"string",โ–"enum":โ–["celsius",โ–"fahrenheit"],โ–"description":โ–"Theโ–temperatureโ–unitโ–toโ–use.โ–Inferโ–thisโ–fromโ–theโ–usersโ–location."}},โ–"required":โ–["location",โ–"format"]}}}][/AVAILABLE_TOOLS][INST]โ–What\'sโ–theโ–weatherโ–likeโ–todayโ–inโ–Paris[/INST][TOOL_CALLS]โ–[{"name":โ–"weather",โ–"arguments":โ–{"location":โ–"Paris",โ–"format":โ–"celsius"},โ–"id":โ–"bbc5b7ede"}]</s>[TOOL_RESULTS]โ–bbc5b7ede[TOOL_CONTENT]โ–24โ–degreesโ–celsius[/TOOL_RESULTS]'

Patch release - v1.4.4

29 Sep 13:02
21ee9f6
Compare
Choose a tag to compare

Make sure broken user envs of cv2 (which sadly happens more often than not) don't impede users from using text-only models.

What's Changed

Full Changelog: v1.4.3...v1.4.4

Patch release v1.4.3 - Make cv2 install optional

22 Sep 16:01
ce9ce79
Compare
Choose a tag to compare

As per discussion: vllm-project/vllm#8650 make cv2 optional.

What's Changed

Full Changelog: v1.4.2...v1.4.3

Patch release v1.4.2

18 Sep 12:46
992f4a0
Compare
Choose a tag to compare

Make sure to send user agent for downloading pictures that require a user agent. E.g.:

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    TextChunk,
    ImageURLChunk,
    ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.from_model("pixtral")

url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"
url1 = "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg"
url2 = "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg"


# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Can this animal"),
                    ImageURLChunk(image_url=url1),
                    TextChunk(text="live here?"),
                    ImageURLChunk(image_url=url2),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

What's Changed

New Contributors

Full Changelog: v1.4.1...v1.4.2

Patch release v1.4.1 - Use cv2 resize instead of PIL

17 Sep 08:44
bae45b2
Compare
Choose a tag to compare

cv2 resize gives significantly better results when running pixtral in inference as compared to PIL hence we're making a patch release to resize images using cv2 as shown here: bae45b2

v1.4.0 - Mistral common goes ๐Ÿ–ผ๏ธ

10 Sep 22:44
7b88116
Compare
Choose a tag to compare

Pixtral is out!

Mistral common has image support! You can now pass images and URLs alongside text into the user message.

pip install --upgrade mistral_common

Images

You can encode images as follows

from mistral_common.protocol.instruct.messages import (
    UserMessage,
    TextChunk,
    ImageURLChunk,
    ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.from_model("pixtral")

image = Image.new('RGB', (64, 64))

# tokenize images and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Describe this image"),
                    ImageChunk(image=image),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

Image URLs

You can pass image url which will be automatically downloaded

url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"

# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="Can this animal"),
                    ImageURLChunk(image_url=url_dog),
                    TextChunk(text="live here?"),
                    ImageURLChunk(image_url=url_mountain),
                ]
            )
        ],
        model="pixtral",
    )
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images

# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))

ImageData

You can also pass image encoded as base64

tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        messages=[
            UserMessage(
                content=[
                    TextChunk(text="What is this?"),
                    ImageURLChunk(image_url="data:image/jpeg;base64,/9j/4QDeRXhpZgAASUkqAAgAAAAGABIBAwABAAAAAQAAABoBBQABAAAAVgAAABsBBQABAAAAXgAAACgBAwABAAAAAgAAABMCAwABAAAAAQAAAGmHBAABAAAAZgAAAAAAAABIAAAAAQAAAEgAAAABAAAABwAAkAcABAAAADAyMTABkQcABAAAAAECAwCGkgcAFgAAAMAAAAAAoAcABAAAADAxMDABoAMAAQAAAP//AAACoAQAAQAAAMgAAAADoAQAAQAAACwBAAAAAAAAQVNDSUkAAABQaWNzdW0gSUQ6IDIzN//bAEMACAYGBwYFCAcHBwkJCAoMFA0MCwsMGRITDxQdGh8eHRocHCAkLicgIiwjHBwoNyksMDE0NDQfJzk9ODI8LjM0Mv/bAEMBCQkJDAsMGA0NGDIhHCEyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMv/CABEIASwAyAMBIgACEQEDEQH/xAAbAAACAwEBAQAAAAAAAAAAAAADBAACBQEGB//EABgBAQEBAQEAAAAAAAAAAAAAAAABAgME/9oADAMBAAIQAxAAAAHRMQ3DqCpzAk9FQU51SWMK6IelhFws0BAdGL9M4iHNAAkwWq3VhAEcgRf5/n9MfRgfPZZ76eDLXt1fHQ9aXxtz37fzUmX0S/nPT4329+S2BagNdDx+8+mycXU3ne3FuctszLlviecnbjOdhXs6c5bhLVgWvIV2cbkfUSfN5jfu/LYlNZtXh9Q3rUtLl0PS9saVjUr5zyTvxkuQDL9KcK0IFfWXq7lUTh6gJzpaluHTM2FSLVNXQ8zeX2k8XMaGWs6YvBWohISAVCY0cs9aJXty6bqkBt24DtoVZX4MBlC/eVJOQLeHpUvSkVeACcJQQ4woaZanVUTo0Xq6Ezy3MJB0lYWnenZSxSEgS0vVXEiB7Z7A1laMFqsKBNDKcGjJIGitwoOAMFROrBwMDBd7UJOQMTnaGcNgQzMC2ti6QulekG2chsbyta6+e0kGEqQZqCNlWPSYLYBMd6HZINGBeuDIE7oo6ItS3BGEHEfTqevUhJrOQNa5jAeUNWwoYGLpWcuXjEzQXF3caWMMj2ecGVawRQoYOO9TaNjPlhk7SYXVhas7A5ah1sG9mqzUmN+XqWnXnDrnqneWDJNigYrcIdcpVgNTTaXEvDpAscHKgwnFB/See9Rz1yEmN+R4O/o5UtaE72oQgbgKMQW43WBUNw1M3WUWldUqYVX844Ow0sYWxNIzemNeX59GwtPLmZHrLSTTVmTRxQJSdLr2hTTzXYZOt1T5h00qRYxwBBl9IHrcaxZqTOvTKPGzUTnTPKZnrPG9cHAqTealr0Gs8pAu16aLGP0dCCF7BsU5rvZ0n6es56amdJrd5Y8kKn0v5P1C2ng1D378kS9GX4OQUdey3G5dM+3eVY4um5qZPp+PWRwObSNwX4zcowKWXIquee8r9M8b0xlcZX6ZFS1YhRFNB2mtz6YWV7PMufPv7G7GPpE7jd1GbLydkSzUpPp+omyRAYwNdSvLCBfvxFW3V521I9PvYnq+PRdm981IGguqTNyigdAICFhQPGNSpRdBkHUPAFTwo38ftzMO46tcJ49Z67ye7x6FvniNIakU5c/g9VSiOxKKtCuQnNHohXSMZNzwzU9m1eMQ+gs6z839F69SXP62LNoDVGZvGimPbXEKA9CEw5rw/8QAKRAAAgIBAwMEAgMBAQAAAAAAAQIAAxEEEiEQEzEFFCJBFTIgIzAzQv/aAAgBAQABBQL+wRQcdoYGBMNLCUPc3G2zgOWFe/PM25NiCLWQWXAGAcnIPy3zeIOShmebGw0dSz44AOcKs7mIw+RqLF/iE4inEZd0VNkOIrAMRunbwe05i1Yhr47MKgQz7+MG3Acy3UIs9/pwv5GjH5KqN6pVj8sgD+poT+RqMX1OpRV6pVZC6vPiIHQTumLc0N8OoIhulmp2B/V8Sz1K130mra1iwaDCy7W3WkknrmZm6bpmA9Eusqml9SVogVgcYHAIMwRNR6jXVL73ueaTSHUFKu0m0y5+f9dJrm05qtW9Hfar+pUVjVepWaiZ6Uad72op7S8gEhoa+4P5Y/wp1FtMe97IeqJuNFlVI37h5AGJu2n/ABFZMNY2YnHUQ9Mw5Kq877rPf27h6iM06hLT0xNvUKTFonZwGsIiNlNuS1LCbdn8agst8eIeqsVMAhM3TGYQAvcxNxZiSEbk1jYM8ixsOdxhHXJE7hIJ4z1MEx02mVjJtdeieXaVjl27riuYAG2beuOuemOuJiEYiylgob5Ole5mTC/bNulNY2tmY5I5Ccuvxm3hl/gD1BgnmADsBIwHcHxncGTwg/as/HAn0U6cEbeYRHXpjp5hgE89K/8AluxGQNLP0Hl8bF+Ko2IrjG7hR8XMzxvmYzTcZkY6/WckCeYpIh8rZFYRavlt32OeFmIQUHcbcH3TGQeJXLfM7bQgjqIJ9Y58Q8zxEMB43/GJ5KlV7Tut1ZRpWeHEqlnmoZt1Fdtsetqi3npyOhMyMffbDz9Tn+r7lRwzFtuk0L6skKYylYnC4yV4lo4X4x7rG0oXKE5PQCHw0MEqHF4BlfNZ61W8adNQk9syWX7So/VeSQIx6KxWM7P1RC5E3w9VP9Vh5q4usGHEEHmnNYfU3CMGtPbgGI7CMf4440yFnBHQj4mfVXNbH5f+tSP7B56aaz4vyft92KyY3nP8UX46etk6A87o0+q25sGHWPk9PPSuzbN5MEPhRHSY/gg3HsuqVbkPQQ8gdHXevgk9BB48FXxKWzCdoZhlHXDpMAwjpR/1yJ3MkjqpyPsxDw6c9Vh6acYDWb3boHn3DNN/2qRVDLvIhXonk8HPQnIZcdCIIelH6eXSosGrmzEPEH7nyPO2yLXqD0yRMxf2dcHM+s8/eOduZgQwI00+CFpzaAmbLKAj3gxrN3VP3UqYvbNZDA5mZXje6hxsIh8Zn0OJnnMB5oxtX+t7FDSrTe5R9NbSxbMpdK5YxYxYmIKuGqQi/QUmNorRF016mo4baI6wwTwIZtlDGCfVh4O5ugWHzNIm+86eoBEZ22YHtsxKAoVVYepabs2LaDDyCnGwwARxibuMwMRFcNPMKw4EyNzN10aXIwtndjC5iEshrcwrqAbk1NiW07G7pWd2C2fFiwyCmOmJyJvabzN03GBd0q0m8Lo9hBtVXuUT3VaRSyT+yIxjNmNia4EWFN0asr0zNxg5mQOmM/xpODXqiItjsgU797byQYF2n4Gbk3TaZZp0emwGm3uBgeo461iPUYR0Zt0UDOnWolSk4g2o2Vhs+AI21sAGZQFvxGIaepaXkecTiHqBK0zNomo0+B0roLShOxEtGWsGSy4SzM/9fEBWEsckZIHcYx+U1FGxyIQP4LKkXG2hZtSWaVHmn9OXPtq1j1VALp0adhFK10ztKG7ZI7YnELBQLGyXrm+th6o2UD5DHqBmDzpRldmQtQwKgI6c9skLT25yA+XnY2uK1M2xg8w8NeZ2gFtoKhVeaulrNMPJ6BZ4n3o/Cq+3jJ3T54IYQpvOxgvzAZSxKNgXsFNpZ8cbczacgWsTvnbdzcnZ1UbwJiVAGzSjsWsPiNsNgxv4LLMfJWcx13QZUFnwL9GB7zRz3mknvtIJ7/ST8hpIPUNHPyOjnqDUWW5mcqYTxSEZ6LdJVPyGkw+t0YP5DSmDXaWe90kOu0k99pBPfaKe80YnvNKZ7fS49tpRPa6cqdLpQBoNPj2mmz7PS59poVnt9JlvT6rJbobK52rBEoseUaGnZ7XR4Gl0UbQ6Yz2elydPoodNogo0ukM9lpZ7HS5bSaVCNJpCUbFrtwkaIfk37vxAczdEc4sxEwQUUTChc4hHxrHwIw2xYEUx61E2gztqY9STtLs//8QAHREAAgICAwEAAAAAAAAAAAAAAAEREiAwAhAhUP/aAAgBAwEBPwHbYsWZZlmWwklsWmw30lukt86NK1JbERs47UQVI1cUR21oqxYPQsuSxgXHN4LLwlEonCevDwk8xgqVxjr/xAAdEQADAAIDAQEAAAAAAAAAAAAAARECEhAgMCFg/9oACAECAQE/AfXQ0RojRGiHgScrGkSGTu0aCxnGTftqjT8C36N+uXqyizNl5ZM25xfhsh/Sc4vwy7YPo2LIeXddH2jIyMjFwxpkZGRkZGUpSlNx5UpSlKU//8QAMRAAAgECBAQFBAIBBQAAAAAAAAERAiEQEiAxIjIzQQMwUWGBE3GRoSNSQARCYrHw/9oACAEBAAY/Ap2wZkLLRGHoS6i25Jc30X0IsL0LG+FiWiUoWHFo30WNsLlsOY3OxPY6lKL1lqjmO7OQ5S9LORyRU8pwtNF5JUk5TlIjG7gspE9kXpsQQc0eyLvyuGpoyeNZ+pNLlaLwRTSqqjNVh7IhbGakXnQ70mem6LuDiuyKeGnGKURsbkXTPfz3ke5xVs3x9EJUkojDby51Wxl2wtUS2LhHD17F3Bm3IRBHfDi0yRpt5ear4J7+RfysplppxsSz2WxLJt/gN9hvCC2Edicf/XEPzNxx/Y+whsY3qgicI8rufOCLYIbw98L4TjfXfGO2i3cqnlpEsPckmdezZda99DZV7vGKYOGWXUaqV7lS8Cl/S8Pmr9xOVUnezLafY7aLYyZs32ReqPux/wCnfirxP6Ve/oX0z3KPCj+JX+SdqFvovqkqWjJVsP6X8lDW6f8A2ZvFoyJbKo4ozf2XfVKN8YWEaJER6j0ZqW0S6r9jNVfyraqlgmv8BjqeqPUeF9crCdMGyFKtrzeTcsXJ0IW5GXRHl5iNMYImURmXnuBkvZdyzkujbGx3LZvIgvjJY2I9iG4PpqrhTFDmruPhwl4I9T/kXT0SvJq9TNTse7Kkq8niq0dqjiQx1Omauxxb4xW4HdnElV8H8cplrk/TcDpqwsteX1Hl+cPRnFfC+KRMotVY2/JNz2MsH1KOVnacLIsiHpXaMLs3w2xz0o4qDL4apOGtfgvWvwdRfgfEmVUVKmB0sjGdW5c2WO1Rbw5+4o8H8HF4HiJ/YfC6fgcOSZLtYbmb/a9V2ba7saKbbk+hxbFxNsbNixsVJ/sdL8jsTbHlSLshoii0exfFU1JscSREmxys2M9Pk3M9KtjJmaOSTlRLn4O+FyOwspvcu0Q0ba7iinMzhTOFQz+Sr4IkWVZjla+SZcYbk5rfciXJfMb2LJ/IlB3PDa9dewuA5TYZfYvmJEosX2LykK432OZfJepDWYVaJoT9yq199eSll3hylyRXZYuScpKgvU19jmZMlpOJM4Vc4mV0++lJ7FKpd2zc3LF2RmZmk50Xf7OFYdZM6lJ1UT9ZE/W/R1WdVnW/R9Twq5nfTx15V6lP86fuzron6tJznUR1EdQ5zqHVOsdGmS/hI6FJ0KTpUkPwaTpUnF4SOkh5eBlmqvsXof4LUn8t39y/go6aJ+ijpSdKlHS/Z03+Tl/ZDo/ZtjsjftgjbBSMasbCWVD4UcqNljYnuKxsKUKw7En/xAAmEAEAAgICAwEBAAIDAQEAAAABABEhMUFREGFxgZEgobHB8eHw/9oACAEBAAE/IV4EPV8wznMb4WQbE64n5DMWqj43c2zCCVLvdkVEL6lAtChMPJ3DMLLxMhGXGql7sMI6rUXJoi8J6NzLDPOUBfacMYWkM6IVXZqZjz1iFShUhaKq4Tw7lCmKs19hFKY8Nsd3XyblX+SzeBK95Q7LQ8Sl3WcCmXUaasNXP9S2wwptR7S1MD3LNtYgL/dwFu0sqgEAphTJg6UVZOMe/tzYK6YXZYRtC0NYRVQVWQzC0y4vmDeX1AdTYOhxLMR2hejMSwRerPEMoi/fFwjEi3/BGOzESBoggMVQaI+mIbFPcRZAiXfHh+3W6V5lNxAuutxDIYz4xHyP+Ay1I+N+HZAi+rqA1H0zgY4I1+HHPtjbM3ZzLY3BXJwihEXFDf8AhjxR5V4GPnMsNolnSzGfD5n2RDnJlgjXDCrEI5pucH9S/wDDMqan5Klc1hg6GXr1GntlnUVmD6lHMWwtxBqQ1FumDgUDO4eiIm3A2zuU5fI2YjcDOWJMaQy6kTWwnCEu+N3KItoLdYq45v4Jt8HipTPDLa6lKF5gfCWS3NPBdkG8ErVQpw1+Sx8weRDPrmVjMWWJlg4dxd7exMQuI6t3AxKA8bgnCkOTQXMrM2xqY+QYIDbGKnqgD+mCH9kvMxs3L8WmGtHbF6sQitfrW5cizF8S1kC9xG/Xg+MiamlhHuXCnDUMNQFqci6HEQ5lnVjQD3IBvHwYHEVn1HbX/wAgFji+Iqu+vCEMGmbgKOoo1cTy5i8RM1/JzPpUFmq5iCzaUjZgwCoBxDOGy6ZboQwRge9EvSWYX7g+t9xBA59yzTiUD8czI/KflKsikzXf5FvEqsS0SGHyG6ZR3G1KzmMsOLZgU27lg5hVnEhWkI72CSuRiEzL4RHaVYK9XKV2kcg3FQeAlBY41M13HiZjvxcu1PSZ4mFRiqaY7lnuOpsNxQl4qUn/AMIhSwy0OiekspVwls36jsOIIL7g1dy9pkxMbnvnyN1T6qOfJdGZnCpkaxMBsvqZqqplRb9QD0o0Oa5l0hzASezFxCanJh6qDUzzuENGoe9Q1HsIQuiXRf1KhSLXEIX0fBPQQLcxrrXaZBS9wFtglANNblOeVvC5eDucS3sFaDmKB2Z0fs57On/kYpQqPP3ifxS5gISKtXFxLUL7IOfaXjycna9S4fBCsi2RKdqxtbqK9ylNQkBSYjSdzebJUv592bnSEb1PAl3wNGv/AAjZZZ9PvNfrCf8AcaN/JkDxzCjTzFXDGM4cf4Sl1UsFMSyXgjVw7qNcSwHMsa1FW9zdgww6uoz26OfGRo6ru+5gZr+Q9G71APtlzmMuceCyjK1IblBxmC4lwUlL3mGdo8rrM78yqZuUfiKLqO4FCo8S43LIQvj/AJjbsXqOsv8AUo8R9eQl1huOg9EV1KBC28vU5YqF4cSjrwlOqsxYq88RNfiNImLmLW4YkFtufsZaj8IQK0MdxzcwfD4pTtlfBBTacwb4ipITTmbViCjdwgLnmXC08Km5RXgQNbnALhYG4AYnyJrm+5S1pIArnxOIbj7ofcQZp7ZguXOfAzheIOB1LKTZNf4PiGXLxGuoSaAyi7qouZUVxLNIubQZmhf9mgPnMqwH7GanOSmOvvEs09IWXxNF1KgnMCUSw3NMy42/YhZKyxfg3QJhvapc2i+5o07jKPE31L+yUmD+poP9Soci4nVQWA3cfLvwy5Qt/oimOkoqskMhXEKj+iH69Ri5YMy5G2AwNe2YmNq+GFnZjNwK2PqPgEpMVepdtyuRqI5oEDgdtkVUvpMZrGh6nKDuKaIasuYWqXtHbGoDXqWLvmOHMyIDyXqEDedRFzg2StDBLRNX65GVMpiCteJfsll8WvEuLJ+Qmirj3K0cxaxjboIB+1EUc8zI3qV9ENPFR1jubDcqizniIU+SyYhlBgQZVKNOo89Er6PUu2lPKzlIGHJOI8m8zfgxXkfNTGqkE1WGCldD1GAlruOVUincbH3MQ0m+B/sEtklmxnWGWX5uGQlooN6iv6G...
Read more

Patch release 1.3.4 - Loosen pydantic requirement

15 Aug 10:11
Compare
Choose a tag to compare

In this patch release the pydantic requirement is loosened to be <= 3.0.0

as noticed in multiple issues, e.g.:

Tekkenizer

18 Jul 14:01
Compare
Choose a tag to compare

Tekkenizer

The new Tekkenizer class is based on Open AI's tiktoken and supports the new Mistral-Nemo model.

Tekkenizer always makes use of version 3 or higher.

Examples:

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

tokenizer = MistralTokenizer.v3(is_tekken=True)
tokenizer = MistralTokenizer.from_model("...")

Function calling (just like before)

# Import needed packages:
from mistral_common.protocol.instruct.messages import (
    UserMessage,
)
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import (
    Function,
    Tool,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer

# Load Mistral tokenizer

model_name = "..."

tokenizer = MistralTokenizer.from_model(model_name)

# Tokenize a list of messages
tokenized = tokenizer.encode_chat_completion(
    ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["celsius", "fahrenheit"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            )
        ],
        messages=[
            UserMessage(content="What's the weather like today in Paris"),
        ],
        model=model_name,
    )
)
tokens, text = tokenized.tokens, tokenized.text

# Count the number of tokens
print(len(tokens))

What's Changed

Full Changelog: v1.3.0...v1.3.1

Patch release: Fix FIM tokenizer

30 May 10:37
Compare
Choose a tag to compare

As noticed here: https://huggingface.co/mistralai/Codestral-22B-v0.1/discussions/10

The wrong tokenizer was used for FIM. This patch release fixes that so that the following works correctly:

from mistral_common.tokens.tokenizers.base import FIMRequest
from mistral_common_private.tokens.tokenizers.mistral import MistralTokenizer
tokenizer =  MistralTokenizer.v3()
tokenized = tokenizer.encode_fim(FIMRequest(prompt="def f(", suffix="return a + b"))
assert tokenized.text == "<s>[SUFFIX]returnโ–aโ–+โ–b[PREFIX]โ–defโ–f("