diff --git a/docs/capabilities/embeddings.mdx b/docs/capabilities/embeddings.mdx index a6b6b55..1ffbc18 100644 --- a/docs/capabilities/embeddings.mdx +++ b/docs/capabilities/embeddings.mdx @@ -13,7 +13,7 @@ Embeddings are vectorial representations of text that capture the semantic meani Open In Colab -## Mistral Embeddings API +## Mistral Embed API To generate text embeddings using Mistral AI's embeddings API, we can make a request to the API endpoint and specify the embedding model `mistral-embed`, along with providing a list of input texts. The API will then return the corresponding embeddings as numerical vectors, which can be used for further analysis or processing in NLP applications. ```python diff --git a/docs/capabilities/function-calling.mdx b/docs/capabilities/function-calling.mdx index 3fa7944..679e033 100644 --- a/docs/capabilities/function-calling.mdx +++ b/docs/capabilities/function-calling.mdx @@ -17,7 +17,7 @@ Currently, function calling is available for the following models: - Mistral Small - Mistral Large - Mixtral 8x22B -- Mistral NeMo +- Mistral Nemo ### Four steps diff --git a/docs/getting-started/Open-weight-models.mdx b/docs/getting-started/Open-weight-models.mdx index e7e7eec..e34f72c 100644 --- a/docs/getting-started/Open-weight-models.mdx +++ b/docs/getting-started/Open-weight-models.mdx @@ -1,13 +1,13 @@ --- id: open_weight_models -title: Open-weight models +title: Apache 2.0 models sidebar_position: 1.4 --- We open-source both pre-trained models and instruction-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our [guardrailing tutorial](/capabilities/guardrailing). ## License -- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral Mamba, Mathstral, and Mistral NeMo are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints. +- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral Mamba, Mathstral, and Mistral Nemo are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints. - Codestral is under [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licences/MNPL-0.1.md). - Mistral Large is under [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md). @@ -29,8 +29,8 @@ We open-source both pre-trained models and instruction-tuned models. These model | Codestral-22B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/Codestral-22B-v0.1)
[raw_weights](https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar) (md5sum: `1ea95d474a1d374b1d1b20a8e0159de3`) | - 32768 vocabulary size
- Supports v3 Tokenizer | | Codestral-Mamba-7B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/mamba-codestral-7B-v0.1)
[raw_weights](https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar) (md5sum: `d3993e4024d1395910c55db0d11db163`) | - 32768 vocabulary size
- Supports v3 Tokenizer | | Mathstral-7B-v0.1 | [Hugging Face](https://huggingface.co/mistralai/mathstral-7B-v0.1)
[raw_weights](https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar) (md5sum: `5f05443e94489c261462794b1016f10b`) | - 32768 vocabulary size
- Supports v3 Tokenizer | -| Mistral-NeMo-Base-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
[raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) (md5sum: `c5d079ac4b55fc1ae35f51f0a3c0eb83`) | - 131k vocabulary size
- Supports tekken.json tokenizer | -| Mistral-NeMo-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
[raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) (md5sum: `296fbdf911cb88e6f0be74cd04827fe7`) | - 131k vocabulary size
- Supports tekken.json tokenizer
- Supports function calling | +| Mistral-Nemo-Base-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)
[raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) (md5sum: `c5d079ac4b55fc1ae35f51f0a3c0eb83`) | - 131k vocabulary size
- Supports tekken.json tokenizer | +| Mistral-Nemo-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
[raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) (md5sum: `296fbdf911cb88e6f0be74cd04827fe7`) | - 131k vocabulary size
- Supports tekken.json tokenizer
- Supports function calling | | Mistral-Large-Instruct-2407 | [Hugging Face](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)
[raw_weights](https://models.mistralcdn.com/mistral-large-2407/mistral-large-instruct-2407.tar) (md5sum: `fc602155f9e39151fba81fcaab2fa7c4`)| - 32768 vocabulary size
- Supports v3 Tokenizer
- Supports function calling | @@ -44,7 +44,7 @@ We open-source both pre-trained models and instruction-tuned models. These model | Codestral-22B-v0.1 | 22.2B | 22.2B | 60 | | Codestral-Mamba-7B-v0.1 | 7.3B | 7.3B | 16 | | Mathstral-7B-v0.1 | 7.3B | 7.3B | 16 | -| Mistral-NeMo-Instruct-2407 | 12B | 12B | 28 - bf16
16 - fp8 | +| Mistral-Nemo-Instruct-2407 | 12B | 12B | 28 - bf16
16 - fp8 | | Mistral-Large-Instruct-2407 | 123B | 123B | 228 | ## How to run? diff --git a/docs/getting-started/changelog.mdx b/docs/getting-started/changelog.mdx index e59edb0..5a5db9d 100644 --- a/docs/getting-started/changelog.mdx +++ b/docs/getting-started/changelog.mdx @@ -11,7 +11,7 @@ July 24, 2024 - We added fine-tuning support for Codestral, Mistral Nemo and Mistral Large. Now the model choices for fine-tuning are `open-mistral-7b` (v0.3), `mistral-small-latest` (`mistral-small-2402`), `codestral-latest` (`codestral-2405`), `open-mistral-nemo` and , `mistral-large-latest` (`mistral-large-2407`) July 18, 2024 -- We released Mistral NeMo (`open-mistral-nemo`). +- We released Mistral Nemo (`open-mistral-nemo`). July 16, 2024 - We released Codestral Mamba (`open-codestral-mamba`) and Mathstral. diff --git a/docs/getting-started/introduction.mdx b/docs/getting-started/introduction.mdx index cef065b..8c7c25b 100644 --- a/docs/getting-started/introduction.mdx +++ b/docs/getting-started/introduction.mdx @@ -19,7 +19,7 @@ We release state-of-the-art generalist models, specialized models, and research ### Specialized models - Codestral, our cutting-edge language model for coding released [May 2024](https://mistral.ai/news/codestral/) -- Mistral Embeddings, our state-of-the-art semantic for extracting representation of text extracts +- Mistral Embed, our state-of-the-art semantic for extracting representation of text extracts ### Research models - Mistral 7b, our first dense model released [September 2023](https://mistral.ai/news/announcing-mistral-7b/) diff --git a/docs/getting-started/models.mdx b/docs/getting-started/models.mdx index d11897f..f3f34ba 100644 --- a/docs/getting-started/models.mdx +++ b/docs/getting-started/models.mdx @@ -10,27 +10,27 @@ Mistral provides three types of models: state-of-the-art generalist models, spec - **State-of-the-art generalist models** -| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints| -|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| -| Mistral Large |:heavy_check_mark:
[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)| :heavy_check_mark: |Our flagship model with state-of-the-art reasoning, knowledge, and coding capabilities. It's ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents). Learn more on our [blog post](https://mistral.ai/news/mistral-large-2407/)| 128k | `mistral-large-latest`| -| Mistral NeMo | :heavy_check_mark:
Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k | `open-mistral-nemo`| +| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version| +|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| +| Mistral Large |:heavy_check_mark:
[Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md)| :heavy_check_mark: |Our flagship model with state-of-the-art reasoning, knowledge, and coding capabilities. It's ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents). Learn more on our [blog post](https://mistral.ai/news/mistral-large-2407/)| 128k | `mistral-large-latest`| 24.07| +| Mistral Nemo | :heavy_check_mark:
Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k | `open-mistral-nemo`| 24.07| - **Specialized models** -| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints| -|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| -| Codestral |:heavy_check_mark:
[Mistral AI Non-Production License](https://mistral.ai/licenses/MNPL-0.1.md)|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Learn more on our [blog post](https://mistral.ai/news/codestral/) | 32k | `codestral-latest`| -| Mistral Embeddings ||:heavy_check_mark: | A model that converts text into numerical vectors of embeddings in 1024 dimensions. Embedding models enable retrieval and retrieval-augmented generation applications. It achieves a retrieval score of 55.26 on MTEB | 8k | `mistral-embed`| +| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version| +|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| +| Codestral |:heavy_check_mark:
[Mistral AI Non-Production License](https://mistral.ai/licenses/MNPL-0.1.md)|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion. Learn more on our [blog post](https://mistral.ai/news/codestral/) | 32k | `codestral-latest`| 24.05| +| Mistral Embed ||:heavy_check_mark: | A model that converts text into numerical vectors of embeddings in 1024 dimensions. Embedding models enable retrieval and retrieval-augmented generation applications. It achieves a retrieval score of 55.26 on MTEB | 8k | `mistral-embed`| 23.12| - **Research models** -| Model | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints| -|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| -| Mistral 7B | :heavy_check_mark:
Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| -| Mixtral 8x7B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`| -| Mixtral 8x22B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| -| Mathstral | :heavy_check_mark:
Apache2 | | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k | NA| -| Codestral Mamba | :heavy_check_mark:
Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k | `open-codestral-mamba`| +| Model | Weight availability|Available via API| Description | Max Tokens| API Endpoints|Version| +|--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:| +| Mistral 7B | :heavy_check_mark:
Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`| v0.3| +| Mixtral 8x7B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k | `open-mixtral-8x7b`| v0.1| +| Mixtral 8x22B |:heavy_check_mark:
Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k | `open-mixtral-8x22b`| v0.1| +| Mathstral | :heavy_check_mark:
Apache2 | | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k | NA| v0.1| +| Codestral Mamba | :heavy_check_mark:
Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k | `open-codestral-mamba`| v0.1| ## Pricing @@ -67,7 +67,7 @@ It can be used for complex multilingual reasoning tasks, including text understa - [Codestral](https://mistral.ai/news/codestral/): as a 22B model, Codestral sets a new standard on the performance/latency space for code generation compared to previous models used for coding. - [Codestral-Mamba](https://mistral.ai/news/codestral-mamba/): we have trained this model with advanced code and reasoning capabilities, enabling the model to have a strong performance on par with SOTA transformer-based models. - [Mathstral](https://mistral.ai/news/mathstral/): Mathstral stands on the shoulders of Mistral 7B and specialises in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks. -- [Mistral NeMo](https://mistral.ai/news/mistral-nemo/): Mistral NeMo's reasoning, world knowledge, and coding performance are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. +- [Mistral Nemo](https://mistral.ai/news/mistral-nemo/): Mistral Nemo's reasoning, world knowledge, and coding performance are state-of-the-art in its size category. As it relies on standard architecture, Mistral Nemo is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. ## Picking a model @@ -86,7 +86,7 @@ Today, Mistral models are behind many LLM applications at scale. Here is a brief When selecting a model, it is essential to evaluate the performance, and cost trade-offs. Depending on what’s most important for your application, your choice may differ significantly. Note that the models will be updated over time, the information we share below only reflect the current state of the models. In general, the larger the model, the better the performance. For instance, when looking at the popular benchmark MMLU (Massive Multitask Language Understanding), the performance ranking of Mistral’s models is as follows: -- Mistral Large (84.0%) > Mistral 8x22B (77.8%) > Mistral Small (72.2%) > Mixtral 8x7B (70.6%) > Mistral NeMo (68%) > Mistral 7B (62.5%). +- Mistral Large (84.0%) > Mistral 8x22B (77.8%) > Mistral Small (72.2%) > Mixtral 8x7B (70.6%) > Mistral Nemo (68%) > Mistral 7B (62.5%). Notably, Mistral Large is currently outperforming all other four models across almost all benchmarks.