Skip to content

Commit

Permalink
Add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lerela committed Jan 3, 2024
1 parent 6cc0c03 commit 5766c01
Show file tree
Hide file tree
Showing 16 changed files with 631 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,6 @@ dist
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.*

# docusaurus build
build/
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# mistral-docs
60 changes: 60 additions & 0 deletions docs/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
sidebar_position: 1
slug: /
---

# Introduction

Mistral AI currently provides two types of access to Large Language Models:
- An API providing pay-as-you-go access to our latest models,
- Open source models available under the [Apache 2.0](https://github.com/apache/.github/blob/main/LICENSE) License, available on [Hugging Face](https://huggingface.co/mistralai) or directly from [the documentation](/models).

## Where to start?

### API Access
Our API is currently in beta to ramp up the load and provide good quality of service. Access the [platform](https://console.mistral.ai/) to join the waitlist. Once your subscription is active, you can immediately use our `chat` endpoint:

```bash
curl --location "https://api.mistral.ai/v1/chat/completions" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "mistral-tiny",
"messages": [{"role": "user", "content": "Who is the most renowned French painter?"}]
}'
```

Or our embeddings endpoint:

```bash
curl --location "https://api.mistral.ai/v1/embeddings" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "mistral-embed",
"input": ["Embed this sentence.", "As well as this one."]
}'
```

For a full description of the models offered on the API, head on to the **[model docs](./models)**.

For more examples on how to use our platform, head on to our **[platform docs](./platform/01-overview.md)**.

### Raw model weights

Raw model weights can be used in several ways:
- For self-deployment, on cloud or on premise, using either [TensorRT-LLM](./self-deployment/trtllm) or [vLLM](./self-deployment/vllm), head on to **[Deployment](./self-deployment/skypilot)**
- For research, head-on to our [reference implementation repository](https://github.com/mistralai/mistral-src),
- For local deployment on consumer grade hardware, check out the [llama.cpp](https://github.com/ggerganov/llama.cpp) project or [Ollama](https://ollama.ai/).


## Get Help

Join our [Discord community](https://discord.gg/mistralai) to discuss our models and talk to our engineers. Alternatively, reach out to our [business team](https://mistral.ai/contact/) if you have enterprise needs, want more information about our products or if there are missing features you would like us to add.


## Contributing

Mistral AI is committed to open source software development and welcomes external contributions. Please open a PR!
62 changes: 62 additions & 0 deletions docs/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
sidebar_position: 3
slug: models
---

# Open-weight models

We open-source both pre-trained models and fine-tuned models. These models are not tuned for safety as we want to empower users to test and refine moderation based on their use cases. For safer models, follow our [guardrailing tutorial](./platform/04-guardrailing.md).

## Mistral 7B

Mistral 7B is the first dense model released by Mistral AI. At the time of the release, it matched the capabilities of models up to 30B parameters. Learn more on our [blog post](https://mistral.ai/news/announcing-mistral-7b/).

## Mixtral 8X7B

Mixtral 8X7B is a sparse mixture of experts model. As such, it leverages up to 45B parameters but only uses about 12B during inference, leading to better inference throughput at the cost of more vRAM. Learn more on the dedicated [blog post](https://mistral.ai/news/mixtral-of-experts/).

## Downloading

- Mistral-7B-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mistral-7B-v0.1) // [raw_weights](https://files.mistral-7b-v0-1.mistral.ai/mistral-7B-v0.1.tar) (md5sum: `37dab53973db2d56b2da0a033a15307f`).
- Mistral-7B-Instruct-v0.2: [Hugging Face](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) // [raw_weights](https://files.mistral-7b-v0-2.mistral.ai/Mistral-7B-v0.2-Instruct.tar) (md5sum: `fbae55bc038f12f010b4251326e73d39`).
- Mixtral-8x7B-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
- Mixtral-8x7B-Instruct-v0.1: [Hugging Face](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) // [raw_weights](https://files.mixtral-8x7b-v0-1.mistral.ai/Mixtral-8x7B-v0.1-Instruct.tar) (md5sum: `8e2d3930145dc43d3084396f49d38a3f`).

## Sizes

| Name | Number of parameters | Number of active parameters | Min. GPU RAM for inference (GB) |
|--------------------|:--------------------:|:---------------------------:|:-------------------------------:|
| Mistral-7B-v0.2 | 7.3B | 7.3B | 16 |
| Mistral-8X7B-v0.1 | 46.7B | 12.9B | 100 |

## Chat template

The template used to build a prompt for the Instruct model is defined as follows:
```
<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
```

Note that `<s>` and `</s>` are special tokens for beginning of string (BOS) and end of string (EOS) while `[INST]` and `[/INST]` are regular strings.

:::note

This format must be strictly respected. Otherwise, the model will generate sub-optimal outputs.

:::

As a reference, here is the format used to tokenize instructions during fine-tuning:

```
[START_SYMBOL_ID] +
tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]
```

:::note

The function `tok` should never generate the EOS token. However, FastChat (used in vLLM) sends the full prompt as a string, which might lead to incorrect tokenization of the EOS token and prompt injection. Users are encouraged to send tokens instead, as described above.

:::
6 changes: 6 additions & 0 deletions docs/platform/01-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Platform

We provide chat generation endpoints for both our [open-weight models](../models.md) and our optimized models.
Our endpoints can be used with our [client packages](../client) or accessed directly through our [API](../../api).
See our [endpoints page](../endpoints) for a detailed description of endpoints performance. We detail how to moderate
our endpoints in [guardrailing](../guardrailing), and their [prices](../pricing).
140 changes: 140 additions & 0 deletions docs/platform/02-client.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Client code

We provide client codes in both Python and Javascript.

## Installation

Follow installation instructions in the repository for our [Python Client](https://github.com/mistralai/client-python) or [Javascript Client](https://github.com/mistralai/client-js).

## Chat Completion

The chat completion API allows you to chat with a model fine-tuned to follow instructions.

<Tabs>
<TabItem value="python" label="python" default>
```python
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = MistralClient(api_key=api_key)

messages = [
ChatMessage(role="user", content="What is the best French cheese?")
]

# No streaming
chat_response = client.chat(
model=model,
messages=messages,
)

# With streaming
for chunk in client.chat_stream(model=model, messages=messages):
print(chunk)
```
</TabItem>
<TabItem value="javascript" label="javascript">
```javascript
import MistralClient from '@mistralai/mistralai';

const apiKey = process.env.MISTRAL_API_KEY;

const client = new MistralClient(apiKey);

const chatResponse = await client.chat({
model: 'mistral-tiny',
messages: [{role: 'user', content: 'What is the best French cheese?'}],
});

console.log('Chat:', chatResponse.choices[0].message.content);
```
</TabItem>
<TabItem value="curl" label="curl">
```bash
curl --location "https://api.mistral.ai/v1/chat/completions" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "mistral-tiny",
"messages": [
{
"role": "user",
"content": "What is the best French cheese?"
}
]
}'
```
</TabItem>
</Tabs>

We allow users to provide a custom system prompt (see [API reference](../../api)). A convenient `safe_mode` flag allow to force chat completion to be moderated against sensitive content (see [Guardrailing](../guardrailing)).

## Embeddings

The embeddings API allows you to embed sentences.

<Tabs>
<TabItem value="python" label="python" default>
```python
from mistralai.client import MistralClient

api_key = os.environ["MISTRAL_API_KEY"]
client = MistralClient(api_key=api_key)

embeddings_batch_response = client.embeddings(
model="mistral-embed",
input=["Embed this sentence.", "As well as this one."],
)
```
</TabItem>
<TabItem value="javascript" label="javascript">
```javascript
import MistralClient from '@mistralai/mistralai';

const apiKey = process.env.MISTRAL_API_KEY;

const client = new MistralClient(apiKey);

const input = [];
for (let i = 0; i < 10; i++) {
input.push('What is the best French cheese?');
}

const embeddingsBatchResponse = await client.embeddings({
model: 'mistral-embed',
input: input,
});

console.log('Embeddings Batch:', embeddingsBatchResponse.data);
```
</TabItem>
<TabItem value="curl" label="curl">
```bash
curl --location "https://api.mistral.ai/v1/embeddings" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "mistral-embed",
"input": [
"Embed this sentence.",
"As well as this one."
]
}'
```
</TabItem>
</Tabs>

# Third-Party Clients

Here are some clients built by the community for various other languages:

## Go
[Gage-Technologies](https://github.com/Gage-Technologies/mistral-go)
57 changes: 57 additions & 0 deletions docs/platform/03-endpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import Benchmark from '@site/static/img/mistral_family.png';

# Endpoints

We provide different endpoints with different price/performance tradeoffs. Our endpoints depend on internal models.
Some of them are [open-weight](../../models), which allow users to deploy them on their own, on arbitrary infrastructure.
See [Self-deployment](../../self-deployment/overview) for details.

## Generative endpoints

All our generative endpoints can reason on contexts up to 32k tokens and follow fine-grained instructions.
The following table gathers benchmarks for each endpoint.

<!-- <div style="text-align: center;"> -->
<img src={Benchmark} alt="Benchmark" width="500px" class="center"/>
<!-- </div> -->

We only provide chat access through our API. Users can access underlying base models for endpoints relying on
[open-weight models](../../models).

### Tiny

This generative endpoint is best used for large batch processing tasks where cost is a significant factor
but reasoning capabilities are not crucial.

Currently powered by Mistral-7B-v0.2, a better fine-tuning of the initial Mistral-7B released,
inspired by the fantastic work of the community.


API name: `mistral-tiny`

### Small

Higher reasoning capabilities and more capabilities.

The endpoint supports English, French, German, Italian, and Spanish and can produce and reason about code.

Currently powered Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters.


API name: `mistral-small`

### Medium

This endpoint currently relies on an internal prototype model.

API name: `mistral-medium`

## Embedding models

Embedding models enable retrieval and retrieval-augmented generation applications.

Our endpoint outputs vectors in `1024` dimensions. It achieves a retrieval score of 55.26 on MTEB.

API name: `mistral-embed`


56 changes: 56 additions & 0 deletions docs/platform/04-guardrailing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Guardrailing

## System prompt to enforce guardrails

The ability to enforce guardrails in chat generations is crucial for front-facing applications. We introduce an optional system prompt to enforce guardrails on top of our models. You can activate this prompt through a `safe_mode` binary flag in API calls as follows:

<Tabs>
<TabItem value="python" label="python" default>
```python
chat_response = client.chat(
model="mistral-tiny",
messages=ChatMessage(role="user", content="What is the best French cheese?"),
safe_mode=True
)
```
</TabItem>
<TabItem value="javascript" label="javascript">
```javascript
const chatResponse = await client.chat(
model: 'mistral-tiny',
messages: [{role: 'user', content: 'What is the best French cheese?'}],
safe_mode: true
);
```
</TabItem>
<TabItem value="curl" label="curl">
```bash
curl --location "https://api.mistral.ai/v1/chat/completions" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header "Authorization: Bearer $MISTRAL_API_KEY" \
--data '{
"model": "mistral-tiny",
"messages": [
{
"role": "user",
"content": "What is the best French cheese?"
}
],
"safe_mode": true
}'
```
</TabItem>
</Tabs>

Toggling `safe_mode` will prepend your messages with the following system prompt:
```
Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
```
<!--
## Safety and utility trade-off
TODO Safety and utility benchmarks with and without safe mode -->
Loading

0 comments on commit 5766c01

Please sign in to comment.