Merge pull request #102 from mistralai/doc/v0.0.55

Update docs to v0.0.55
mistralai · Jul 18, 2024 · 6a710ed · 6a710ed
2 parents 2b9c940 + 02c3f50
commit 6a710ed
Show file tree

Hide file tree

Showing 7 changed files with 172 additions and 37 deletions.
diff --git a/docs/capabilities/code-generation.mdx b/docs/capabilities/code-generation.mdx
@@ -236,8 +236,8 @@ curl --location "https://api.mistral.ai/v1/chat/completions" \
     </TabItem>
 </Tabs>
 
-## Codestral-Mamba
-We have also released Codestral-Mamba 7B, a Mamba2 language model specilized in code generation with the instruct endpoint. 
+## Codestral Mamba
+We have also released Codestral Mamba 7B, a Mamba2 language model specilized in code generation with the instruct endpoint. 
 <Tabs>
   <TabItem value="python" label="python" default>
 ```python
@@ -278,9 +278,9 @@ curl --location "https://api.mistral.ai/v1/chat/completions" \
     </TabItem>
 </Tabs>
 
-## Open-weight Codestral and Codestral-Mamba
+## Open-weight Codestral and Codestral Mamba
 Codestral is available open-weight under the [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licences/MNPL-0.1.md) and 
-Codestral-Mamba is available open-weight under the Apache 2.0 license. 
+Codestral Mamba is available open-weight under the Apache 2.0 license. 
 
 Check out the README of [mistral-inference](https://github.com/mistralai/mistral-inference) to learn how to use `mistral-inference` to run Codestral. 
 

diff --git a/docs/getting-started/Open-weight-models.mdx b/docs/getting-started/Open-weight-models.mdx
@@ -6,17 +6,18 @@ sidebar_position: 1.4
 
 We open-source both pre-trained models and fine-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our [guardrailing tutorial](/capabilities/guardrailing).
 
-| Model               |Open-weight|API| Description | Max Tokens| Endpoint|
+| Model               | Available Open-weight|Available via API| Description | Max Tokens| API Endpoints|
 |--------------------|:--------------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------:|
-| Mistral 7B    | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`<br/>(aka `mistral-tiny-2312`)|
-| Mixtral 8x7B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k  | `open-mixtral-8x7b`<br/>(aka `mistral-small-2312`) | 
-| Mixtral 8x22B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model with larger context window. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k  | `open-mixtral-8x22b`| 
+| Mistral 7B    | :heavy_check_mark: <br/> Apache2 |:heavy_check_mark: |The first dense model released by Mistral AI, perfect for experimentation, customization, and quick iteration. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/)| 32k | `open-mistral-7b`|
+| Mixtral 8x7B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/)| 32k  | `open-mixtral-8x7b`| 
+| Mixtral 8x22B  |:heavy_check_mark: <br/> Apache2 | :heavy_check_mark: |A bigger sparse mixture of experts model. As such, it leverages up to 141B parameters but only uses about 39B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-8x22b/)| 64k  | `open-mixtral-8x22b`| 
 | Codestral  |:heavy_check_mark: <br/> MNPL|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion | 32k  | `codestral-latest`| 
-| Codestral-Mamba | :heavy_check_mark: | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k  | `codestral-mamba-latest`| 
-| Mathstral | :heavy_check_mark: | :heavy_check_mark: | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k  | NA| 
+| Codestral Mamba | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k  | `open-codestral-mamba`| 
+| Mathstral | :heavy_check_mark: <br/> Apache2 |  | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k  | NA| 
+| Mistral NeMo | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k  | `open-mistral-nemo`| 
 
 ## License
-- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral-Mamba, and Mathstral are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints.
+- Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Codestral Mamba, Mathstral, and Mistral NeMo are under [Apache 2 License](https://choosealicense.com/licenses/apache-2.0/), which permits their use without any constraints.
 - Codestral is under [Mistral AI Non-Production (MNPL) License](https://mistral.ai/licences/MNPL-0.1.md).
 
 
@@ -38,6 +39,8 @@ We open-source both pre-trained models and fine-tuned models. These models are n
 | Codestral-22B-v0.1  | [Hugging Face](https://huggingface.co/mistralai/Codestral-22B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar) (md5sum: `1ea95d474a1d374b1d1b20a8e0159de3`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
 | Codestral-Mamba-7B-v0.1  | [Hugging Face](https://huggingface.co/mistralai/mamba-codestral-7B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar)(md5sum: `d3993e4024d1395910c55db0d11db163`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
 | Mathstral-7B-v0.1  | [Hugging Face](https://huggingface.co/mistralai/mathstral-7B-v0.1) <br/> [raw_weights](https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar)(md5sum: `5f05443e94489c261462794b1016f10b`) | - 32768 vocabulary size <br/> - Supports v3 Tokenizer |
+| Mistral-NeMo-Base-2407  | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar)(md5sum: `c5d079ac4b55fc1ae35f51f0a3c0eb83`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer |  
+| Mistral-NeMo-Instruct-2407  | [Hugging Face](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) <br/> [raw_weights](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar)(md5sum: `296fbdf911cb88e6f0be74cd04827fe7`) | - 131k vocabulary size <br/> - Supports tekken.json tokenizer <br/> - Supports function calling |
 
 
 ## Sizes
@@ -50,6 +53,7 @@ We open-source both pre-trained models and fine-tuned models. These models are n
 | Codestral-22B-v0.1  | 22.2B | 22.2B | 60 |
 | Codestral-Mamba-7B-v0.1  | 7.3B | 7.3B | 16 |
 | Mathstral-7B-v0.1  | 7.3B | 7.3B | 16 |
+| Mistral-NeMo-12B-v0.1  | 12B | 12B | 28 - bf16 <br/> 16 - fp8 |
 
 ## How to run? 
 Check out [mistral-inference](https://github.com/mistralai/mistral-inference/), a Python package for running our models. You can install `mistral-inference` by

diff --git a/docs/getting-started/changelog.mdx b/docs/getting-started/changelog.mdx
@@ -6,8 +6,11 @@ sidebar_position: 1.8
 
 This is the list of changes to the Mistral API. 
 
+July 18, 2024
+- We released Mistral NeMo (`open-mistral-nemo`). 
+
 July 16, 2024
-- We released Codestral-Mamba and Mathstral. 
+- We released Codestral Mamba (`open-codestral-mamba`) and Mathstral. 
 
 Jun 5, 2024
 - We released fine-tuning API. Check out the [capability docs](/capabilities/finetuning/) and [guides](/guides/finetuning/). 

diff --git a/docs/getting-started/introduction.mdx b/docs/getting-started/introduction.mdx
@@ -17,6 +17,9 @@ We release both open source and commercial models, driving innovation and conven
 - Mistral 7b, our first dense model released [September 2023](https://mistral.ai/news/announcing-mistral-7b/)
 - Mixtral 8x7b, our first sparse mixture-of-experts released [December 2023](https://mistral.ai/news/mixtral-of-experts/)
 - Mixtral 8x22b, our best open source model to date released [April 2024](https://mistral.ai/news/mixtral-8x22b/)
+- Mathstral 7b, our first math open source model released [July 2024](https://mistral.ai/news/mathstral/)
+- Codestral Mamba 7b, our first mamba 2 open source model released [July 2024](https://mistral.ai/news/codestral-mamba/)
+- Mistral NeMo 7b, our best multilingual open source model released [July 2024](https://mistral.ai/news/mistral-nemo/)
 
 ### Commercial
 

diff --git a/docs/getting-started/models.mdx b/docs/getting-started/models.mdx
@@ -21,8 +21,9 @@ They are ideal for customization, such as fine-tuning, due to their portability,
 | Mistral Large  || :heavy_check_mark: |Our flagship model that's ideal for complex tasks that require large reasoning capabilities or are highly specialized (Synthetic Text Generation, Code Generation, RAG, or Agents). Learn more on our [blog post](https://mistral.ai/news/mistral-large/)| 32k   | `mistral-large-latest`| 
 | Mistral Embeddings  ||:heavy_check_mark: | A model that converts text into numerical vectors of embeddings in 1024 dimensions. Embedding models enable retrieval and retrieval-augmented generation applications. It achieves a retrieval score of 55.26 on MTEB | 8k  | `mistral-embed`| 
 | Codestral  |:heavy_check_mark: <br/> MNPL|:heavy_check_mark: | A cutting-edge generative model that has been specifically designed and optimized for code generation tasks, including fill-in-the-middle and code completion | 32k  | `codestral-latest`| 
-| Codestral-Mamba | :heavy_check_mark: | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k  | `codestral-mamba-latest`| 
-| Mathstral | :heavy_check_mark: | :heavy_check_mark: | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k  | NA| 
+| Codestral Mamba | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A Mamba 2 language model specialized in code generation. Learn more on our [blog post](https://mistral.ai/news/codestral-mamba/) | 256k  | `open-codestral-mamba`| 
+| Mathstral | :heavy_check_mark: <br/> Apache2 |  | A math-specific 7B model designed for math reasoning and scientific tasks. Learn more on our [blog post](https://mistral.ai/news/mathstral/) | 32k  | NA| 
+| Mistral NeMo | :heavy_check_mark: <br/> Apache2 | :heavy_check_mark: | A 12B model built with the partnership with Nvidia. It is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes. Learn more on our [blog post](https://mistral.ai/news/mistral-nemo/) | 128k  | `open-mistral-nemo`| 
 
 ## Pricing
 
@@ -36,18 +37,12 @@ it is recommended to use the dated versions of the Mistral AI API.
 Additionally, be prepared for the deprecation of certain endpoints in the coming months.
 
 Here are the details of the available versions:
-- `open-mistral-7b`: currently points to `mistral-tiny-2312`. 
-It used to be called `mistral-tiny`, which will be deprecated shortly.
-- `open-mixtral-8x7b`: currently points to `mistral-small-2312`. 
-It used to be called `mistral-small`, which will be deprecated shortly.
-- `open-mixtral-8x22b` points to `open-mixtral-8x22b-2404`.
 - `mistral-small-latest`: currently points to `mistral-small-2402`. 
 - `mistral-medium-latest`: currently points to `mistral-medium-2312`. 
 The previous `mistral-medium` has been dated and tagged as `mistral-medium-2312`. 
 Mistral Medium will be deprecated shortly.
 - `mistral-large-latest`: currently points to `mistral-large-2402`. 
 - `codestral-latest`: currently points to `codestral-2405`.
-- `codestral-mamba-latest`: currently points to `codestral-mamba-2407`.
 
 ## Benchmarks results
 Mistral ranks second among all models generally available through an API.
@@ -64,6 +59,8 @@ It can be used for complex multilingual reasoning tasks, including text understa
 - [Codestral](https://mistral.ai/news/codestral/): as a 22B model, Codestral sets a new standard on the performance/latency space for code generation compared to previous models used for coding.
 - [Codestral-Mamba](https://mistral.ai/news/codestral-mamba/): we have trained this model with advanced code and reasoning capabilities, enabling the model to have a strong performance on par with SOTA transformer-based models.
 - [Mathstral](https://mistral.ai/news/mathstral/): Mathstral stands on the shoulders of Mistral 7B and specialises in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks.
+- [Mistral NeMo](https://mistral.ai/news/mistral-nemo/): Mistral NeMo's reasoning, world knowledge, and coding performance are state-of-the-art in its size category.  As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B that it supersedes.
+
 
 ## Picking a model