server: add OpenAI compatible response format for legacy /completions with b… #10645
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is based of a previous PR
However, @ngxson seems to be working refactoring the server.cpp to prevent use of JSON as stated here so I don't expect is to be merged easily. However, might be of use to someone else.
Support for full (almost) OpenAI API response format for the legacy completion related endpoints (including when logprobs is specified)
When
oai_compat
is set toTrue
in the request (as suggested by @ngxson, the old response format is used (check tests)HELM benchmarks from CRFM have support for a OpenAI compatible API server that uses this endpoint, this enables testing differently quantized models for degradation against this benchmark. Tested it on a QwQ Preview 32B GGUF Q4_K_M to evaluate the model against other frontier models. I've described that here