Add documentation

mistralai · Jan 3, 2024 · 5766c01 · 5766c01
1 parent 6cc0c03
commit 5766c01
Show file tree

Hide file tree

Showing 16 changed files with 631 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -128,3 +128,6 @@ dist
 .yarn/build-state.yml
 .yarn/install-state.gz
 .pnp.*
+
+# docusaurus build
+build/
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1 @@
+# mistral-docs
diff --git a/docs/intro.md b/docs/intro.md
@@ -0,0 +1,60 @@
+---
+sidebar_position: 1
+slug: /
+---
+
+# Introduction
+
+Mistral AI currently provides two types of access to Large Language Models: 
+- An API providing pay-as-you-go access to our latest models,
+- Open source models available under the [Apache 2.0](https://github.com/apache/.github/blob/main/LICENSE) License, available on [Hugging Face](https://huggingface.co/mistralai) or directly from [the documentation](/models).
+
+## Where to start?
+
+### API Access
+Our API is currently in beta to ramp up the load and provide good quality of service. Access the [platform](https://console.mistral.ai/) to join the waitlist. Once your subscription is active, you can immediately use our `chat` endpoint: 
+
+```bash
+curl --location "https://api.mistral.ai/v1/chat/completions" \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: application/json' \
+     --header "Authorization: Bearer $MISTRAL_API_KEY" \
+     --data '{
+    "model": "mistral-tiny",
+    "messages": [{"role": "user", "content": "Who is the most renowned French painter?"}]
+  }'
+```
+
+Or our embeddings endpoint:
+
+```bash
+curl --location "https://api.mistral.ai/v1/embeddings" \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: application/json' \
+     --header "Authorization: Bearer $MISTRAL_API_KEY" \
+     --data '{
+    "model": "mistral-embed",
+    "input": ["Embed this sentence.", "As well as this one."]
+  }'
+```
+
+For a full description of the models offered on the API, head on to the **[model docs](./models)**.
+
+For more examples on how to use our platform, head on to our **[platform docs](./platform/01-overview.md)**.
+
+### Raw model weights
+
+Raw model weights can be used in several ways: 
+- For self-deployment, on cloud or on premise, using either [TensorRT-LLM](./self-deployment/trtllm) or [vLLM](./self-deployment/vllm), head on to **[Deployment](./self-deployment/skypilot)**
+- For research, head-on to our [reference implementation repository](https://github.com/mistralai/mistral-src),
+- For local deployment on consumer grade hardware, check out the [llama.cpp](https://github.com/ggerganov/llama.cpp) project or [Ollama](https://ollama.ai/).
+
+
+## Get Help
+
+Join our [Discord community](https://discord.gg/mistralai) to discuss our models and talk to our engineers. Alternatively, reach out to our [business team](https://mistral.ai/contact/) if you have enterprise needs, want more information about our products or if there are missing features you would like us to add.
+
+
+## Contributing
+
+Mistral AI is committed to open source software development and welcomes external contributions. Please open a PR!
diff --git a/docs/models.md b/docs/models.md
@@ -0,0 +1,62 @@
+---
+sidebar_position: 3
+slug: models
+---
+
+# Open-weight models
+
+We open-source both pre-trained models and fine-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our [guardrailing tutorial](./platform/04-guardrailing.md).
+
+## Mistral 7B
+
+Mistral 7B is the first dense model released by Mistral AI. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/).
+
+## Mixtral 8X7B
+
+Mixtral 8X7B is a sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/).
+
+## Downloading
+
+- Mistral-7B-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mistral-7B-v0.1) // [raw_weights](https://files.mistral-7b-v0-1.mistral.ai/mistral-7B-v0.1.tar) (md5sum: `37dab53973db2d56b2da0a033a15307f`).
+- Mistral-7B-Instruct-v0.2: [Hugging Face](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) // [raw_weights](https://files.mistral-7b-v0-2.mistral.ai/Mistral-7B-v0.2-Instruct.tar) (md5sum: `fbae55bc038f12f010b4251326e73d39`).
+- Mixtral-8x7B-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
+- Mixtral-8x7B-Instruct-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) // [raw_weights](https://files.mixtral-8x7b-v0-1.mistral.ai/Mixtral-8x7B-v0.1-Instruct.tar) (md5sum: `8e2d3930145dc43d3084396f49d38a3f`).
+
+## Sizes
+
+| Name               | Number of parameters | Number of active parameters | Min. GPU RAM for inference (GB) |
+|--------------------|:--------------------:|:---------------------------:|:-------------------------------:|
+| Mistral-7B-v0.2    | 7.3B                 | 7.3B                        | 16                              |
+| Mistral-8X7B-v0.1  | 46.7B                  | 12.9B                         | 100                             |
+
+## Chat template
+
+The template used to build a prompt for the Instruct model is defined as follows:
+```
+<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
+```
+
+Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while `[INST]` and `[/INST]` are regular strings.
+
+:::note
+
+This format must be strictly respected. Otherwise, the model will generate sub-optimal outputs.
+
+:::
+
+As a reference, here is the format used to tokenize instructions during fine-tuning:
+
+```
+[START_SYMBOL_ID] + 
+tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
+tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
+…
+tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
+tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]
+```
+
+:::note
+
+The function `tok` should never generate the EOS token. However, FastChat (used in vLLM) sends the full prompt as a string, which might lead to incorrect tokenization of the EOS token and prompt injection. Users are encouraged to send tokens instead, as described above.
+
+:::
diff --git a/docs/platform/01-overview.md b/docs/platform/01-overview.md
@@ -0,0 +1,6 @@
+# Platform
+
+We provide chat generation endpoints for both our [open-weight models](../models.md) and our optimized models. 
+Our endpoints can be used with our [client packages](../client) or accessed directly through our [API](../../api).
+See our [endpoints page](../endpoints) for a detailed description of endpoints performance. We detail how to moderate
+our endpoints in [guardrailing](../guardrailing), and their [prices](../pricing).
diff --git a/docs/platform/02-client.md b/docs/platform/02-client.md
@@ -0,0 +1,140 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Client code
+
+We provide client codes in both Python and Javascript.
+
+## Installation
+
+Follow installation instructions in the repository for our [Python Client](https://github.com/mistralai/client-python) or [Javascript Client](https://github.com/mistralai/client-js).
+
+## Chat Completion
+
+The chat completion API allows you to chat with a model fine-tuned to follow instructions.
+
+<Tabs>
+  <TabItem value="python" label="python" default>
+```python
+from mistralai.client import MistralClient
+from mistralai.models.chat_completion import ChatMessage
+
+api_key = os.environ["MISTRAL_API_KEY"]
+model = "mistral-tiny"
+
+client = MistralClient(api_key=api_key)
+
+messages = [
+    ChatMessage(role="user", content="What is the best French cheese?")
+]
+
+# No streaming
+chat_response = client.chat(
+    model=model,
+    messages=messages,
+)
+
+# With streaming
+for chunk in client.chat_stream(model=model, messages=messages):
+    print(chunk)
+```
+  </TabItem>
+  <TabItem value="javascript" label="javascript">
+```javascript
+import MistralClient from '@mistralai/mistralai';
+
+const apiKey = process.env.MISTRAL_API_KEY;
+
+const client = new MistralClient(apiKey);
+
+const chatResponse = await client.chat({
+  model: 'mistral-tiny',
+  messages: [{role: 'user', content: 'What is the best French cheese?'}],
+});
+
+console.log('Chat:', chatResponse.choices[0].message.content);
+```
+  </TabItem>
+  <TabItem value="curl" label="curl">
+```bash
+curl --location "https://api.mistral.ai/v1/chat/completions" \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: application/json' \
+     --header "Authorization: Bearer $MISTRAL_API_KEY" \
+     --data '{
+    "model": "mistral-tiny",
+    "messages": [
+     {
+        "role": "user",
+        "content": "What is the best French cheese?"
+      }
+    ]
+  }'
+```
+  </TabItem>
+</Tabs>
+
+We allow users to provide a custom system prompt (see [API reference](../../api)). A convenient `safe_mode` flag allow to force chat completion to be moderated against sensitive content (see [Guardrailing](../guardrailing)).
+
+## Embeddings
+
+The embeddings API allows you to embed sentences.
+
+<Tabs>
+  <TabItem value="python" label="python" default>
+```python
+from mistralai.client import MistralClient
+
+api_key = os.environ["MISTRAL_API_KEY"]
+client = MistralClient(api_key=api_key)
+
+embeddings_batch_response = client.embeddings(
+      model="mistral-embed",
+      input=["Embed this sentence.", "As well as this one."],
+  )
+```
+  </TabItem>
+  <TabItem value="javascript" label="javascript">
+```javascript
+import MistralClient from '@mistralai/mistralai';
+
+const apiKey = process.env.MISTRAL_API_KEY;
+
+const client = new MistralClient(apiKey);
+
+const input = [];
+for (let i = 0; i < 10; i++) {
+  input.push('What is the best French cheese?');
+}
+
+const embeddingsBatchResponse = await client.embeddings({
+  model: 'mistral-embed',
+  input: input,
+});
+
+console.log('Embeddings Batch:', embeddingsBatchResponse.data);
+```
+  </TabItem>
+  <TabItem value="curl" label="curl">
+```bash
+curl --location "https://api.mistral.ai/v1/embeddings" \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: application/json' \
+     --header "Authorization: Bearer $MISTRAL_API_KEY" \
+     --data '{
+    "model": "mistral-embed",
+    "input": [
+      "Embed this sentence.", 
+      "As well as this one."
+    ]
+  }'
+```
+  </TabItem>
+</Tabs>
+
+# Third-Party Clients
+
+Here are some clients built by the community for various other languages:
+
+## Go
+[Gage-Technologies](https://github.com/Gage-Technologies/mistral-go)
diff --git a/docs/platform/03-endpoints.md b/docs/platform/03-endpoints.md
@@ -0,0 +1,57 @@
+import Benchmark from '@site/static/img/mistral_family.png';
+
+# Endpoints
+
+We provide different endpoints with different price/performance tradeoffs. Our endpoints depend on internal models.
+ Some of them are [open-weight](../../models), which allow users to deploy them on their own, on arbitrary infrastructure.
+ See [Self-deployment](../../self-deployment/overview) for details.
+
+## Generative endpoints
+
+All our generative endpoints can reason on contexts up to 32k tokens and follow fine-grained instructions.
+The following table gathers benchmarks for each endpoint.
+
+<!-- <div style="text-align: center;"> -->
+<img src={Benchmark} alt="Benchmark" width="500px" class="center"/>
+<!-- </div> -->
+
+We only provide chat access through our API. Users can access underlying base models for endpoints relying on 
+[open-weight models](../../models).
+
+### Tiny
+
+This generative endpoint is best used for large batch processing tasks where cost is a significant factor 
+but reasoning capabilities are not crucial.
+
+Currently powered by Mistral-7B-v0.2, a better fine-tuning of the initial Mistral-7B released,
+inspired by the fantastic work of the community.
+
+
+API name: `mistral-tiny`
+
+### Small
+
+Higher reasoning capabilities and more capabilities.
+
+The endpoint supports English, French, German, Italian, and Spanish and can produce and reason about code.
+
+Currently powered Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters.
+
+
+API name: `mistral-small`
+
+### Medium
+
+This endpoint currently relies on an internal prototype model.
+
+API name: `mistral-medium`
+
+## Embedding models
+
+Embedding models enable retrieval and retrieval-augmented generation applications.
+
+Our endpoint outputs vectors in `1024` dimensions. It achieves a retrieval score of 55.26 on MTEB.
+
+API name: `mistral-embed`
+
+
diff --git a/docs/platform/04-guardrailing.md b/docs/platform/04-guardrailing.md
@@ -0,0 +1,56 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Guardrailing
+
+## System prompt to enforce guardrails
+
+The ability to enforce guardrails in chat generations is crucial for front-facing applications. We introduce an optional system prompt to enforce guardrails on top of our models. You can activate this prompt through a `safe_mode` binary flag in API calls as follows:
+
+<Tabs>
+  <TabItem value="python" label="python" default>
+```python
+chat_response = client.chat(
+    model="mistral-tiny", 
+    messages=ChatMessage(role="user", content="What is the best French cheese?"),
+    safe_mode=True
+)
+```
+  </TabItem>
+  <TabItem value="javascript" label="javascript">
+```javascript
+const chatResponse = await client.chat(
+    model: 'mistral-tiny',
+    messages: [{role: 'user', content: 'What is the best French cheese?'}],
+    safe_mode: true
+);
+```
+  </TabItem>
+  <TabItem value="curl" label="curl">
+```bash
+curl --location "https://api.mistral.ai/v1/chat/completions" \
+     --header 'Content-Type: application/json' \
+     --header 'Accept: application/json' \
+     --header "Authorization: Bearer $MISTRAL_API_KEY" \
+     --data '{
+    "model": "mistral-tiny",
+    "messages": [
+     {
+        "role": "user",
+        "content": "What is the best French cheese?"
+      }
+    ],
+    "safe_mode": true
+  }'
+```
+  </TabItem>
+</Tabs>
+
+Toggling `safe_mode` will prepend your messages with the following system prompt:
+```
+Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
+```
+<!-- 
+## Safety and utility trade-off
+
+TODO Safety and utility benchmarks with and without safe mode -->