Skip to content

Latest commit

 

History

History
108 lines (77 loc) · 3.79 KB

api.md

File metadata and controls

108 lines (77 loc) · 3.79 KB

API

OpenAI ChatCompletion

POST /v1/chat/completions

Generate responses based on the selected model.

Parameters

  • messages: An array of message representing all historical messages. A message can be from a user or model (assistant) and includes:

    • role: Either user or assistant, indicating the creator of this message.
    • content: The message from the user or model.
  • model: The name of the selected model

  • stream: Either true or false. Indicates whether to use streaming response. If true, model inference results are returned via HTTP event stream.

Response

  • Streaming response: An event stream, each event contains a chat.completion.chunk. chunk.choices[0].delta.content is the incremental output returned by the model each time.
  • Non-streaming response: Not supported yet.

Example

curl -X 'POST' \
  'http://localhost:9112/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "messages": [
    {
      "content": "tell a joke",
      "role": "user"
    }
  ],
  "model": "Meta-Llama-3-8B-Instruct",
  "stream": true
}'
data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"Why ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"couldn't ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

...

data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"two-tired!","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}

event: done
data: [DONE]

Ollama ChatCompletion

POST /api/generate

Generate responses using the selected model.

Parameters

  • prompt: A string representing the input prompt.
  • model: The name of the selected model
  • stream: Either true or false. Indicates whether to use streaming responses. If true, returns the model inference results in the form of an HTTP event stream.

Response

  • Streaming response: A stream of JSON responses, each line is a JSON.
    • response: The incremental result of the model completion.
    • done: Whether the inference has finished.
  • Non-streaming response: Not yet supported.

例子

curl -X 'POST' \
  'http://localhost:9112/api/generate' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "Meta-Llama-3-8B-Instruct",
  "prompt": "tell me a joke",
  "stream": true
}'
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.686513","response":"I'll ","done":false}
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.729214","response":"give ","done":false}

...

{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.955475","response":"for","done":false}
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.956795","response":"","done":true}