API

OpenAI ChatCompletion
Ollama ChatCompletion
OpenAI Assistant

OpenAI ChatCompletion

POST /v1/chat/completions

Generate responses based on the selected model.

Parameters

messages: An array of message representing all historical messages. A message can be from a user or model (assistant) and includes:
- role: Either user or assistant, indicating the creator of this message.
- content: The message from the user or model.
model: The name of the selected model
stream: Either true or false. Indicates whether to use streaming response. If true, model inference results are returned via HTTP event stream.

Response

Streaming response: An event stream, each event contains a chat.completion.chunk. chunk.choices[0].delta.content is the incremental output returned by the model each time.
Non-streaming response: Not supported yet.

Example

curl -X 'POST' \
  'http://localhost:9112/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "messages": [
    {
      "content": "tell a joke",
      "role": "user"
    }
  ],
  "model": "Meta-Llama-3-8B-Instruct",
  "stream": true
}'

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"Why ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"couldn't ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

...

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"two-tired!","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

event: done
data: [DONE]

Ollama ChatCompletion

POST /api/generate

Generate responses using the selected model.

Parameters

prompt: A string representing the input prompt.
model: The name of the selected model
stream: Either true or false. Indicates whether to use streaming responses. If true, returns the model inference results in the form of an HTTP event stream.

Response

Streaming response: A stream of JSON responses, each line is a JSON.
- response: The incremental result of the model completion.
- done: Whether the inference has finished.
Non-streaming response: Not yet supported.

例子

curl -X 'POST' \
  'http://localhost:9112/api/generate' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "Meta-Llama-3-8B-Instruct",
  "prompt": "tell me a joke",
  "stream": true
}'

{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.686513","response":"I'll ","done":false}
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.729214","response":"give ","done":false}

...

{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.955475","response":"for","done":false}
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.956795","response":"","done":true}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api.md

api.md

API

OpenAI ChatCompletion

Parameters

Response

Example

Ollama ChatCompletion

Parameters

Response

例子

Files

api.md

Latest commit

History

api.md

File metadata and controls

API

OpenAI ChatCompletion

Parameters

Response

Example

Ollama ChatCompletion

Parameters

Response

例子