server: (refactor) implement generator-based API for task results #17174

ngxson · 2025-11-11T18:29:41Z

This PR adds a generator-based API for receiving task results. It aims to reduce the usage of callback function, making the code looks more "linear", easier to follow.

This also allowing to return correct HTTP error code in streaming case, ref: #16486 (comment)

Example:

server_response_generator gen(ctx_server);
{
    std::vector<server_task> tasks;
    // ... populate tasks ...
    gen.post_tasks(std::move(tasks));
}

// wait for the results
auto all_results = gen.wait_for_all(req.is_connection_closed);

// collect results
if (all_results.is_terminated) {
    return; // connection is closed
} else if (all_results.error) {
    res_error(res, all_results.error->to_json());
    return;
} else {
    for (auto & res : all_results.results) {
        GGML_ASSERT(dynamic_cast<server_task_result_embd*>(res.get()) != nullptr);
        responses.push_back(res->to_json());
    }
}

ngxson · 2025-11-11T18:35:35Z

Trying to address https://github.com/ggml-org/llama.cpp/pull/16486/files#r2419474810 in the meantime

Edit: resolved in 31b8b70

ngxson · 2025-11-11T19:10:07Z

tools/server/server.cpp

+
+            // next responses are streamed
+            json first_result_json = first_result->to_json();
+            const auto chunked_content_provider = [first_result_json, gen, oaicompat](size_t, httplib::DataSink & sink) mutable -> bool {


~~note: in the future, when we separate the HTTP implementation from the current code base, this chunked_content_provider callback pattern will disappear.~~

the goal is to make each server endpoint handler itself become a generator, which generate JSON response each time the next() function is called

on second thought, since this chunked_content_provider lambda function is already a generator itself, we can just keep it and only change the return type.

the ultimate goal is to expose an API that allow writing code like this:

const auto handle_chat_completions = [&](const Request & req, Response & res) { auto body = json::parse(req.body); // ... do parsing stuff with body auto response = handle_completions_impl(...); if (response.stream) { // response is now a generator, call next() until returns false res.set_stream(true); json chunk; while (response.next(chunk)) { res.write(chunk.dump()); } res.end(); } else { // non-stream, response is simple object res.set_content(response.data); } }

ngxson · 2025-11-12T13:29:34Z

I rename "generator" to "reader" as the term "generator" is better to be used to describe the interface between server_context and the HTTP layer.

In a follow-up PR, I'll separate all http-related code into its own API. The idea is that server_context returns a response_generator to HTTP layer, and HTTP layer simply call next() until there is no data left.

For now, this PR should be ready for review. No rush but CC @ggerganov for visibility.

ngxson added 5 commits November 11, 2025 18:36

server: (refactor) implement generator-based API for task results

dfa2400

improve

88277d8

moving some code

440ce93

fix "Response ended prematurely"

993440e

add sink.done before return false

cc2e397

ngxson requested a review from ggerganov as a code owner November 11, 2025 18:29

github-actions bot added examples server labels Nov 11, 2025

rm redundant check

31b8b70

DajanaV mentioned this pull request Nov 11, 2025

UPSTREAM PR #17174: server: (refactor) implement generator-based API for task results auroralabs-loci/llama.cpp#170

Open

ngxson commented Nov 11, 2025

View reviewed changes

rm unused var

efd73cf

ngxson mentioned this pull request Nov 11, 2025

Refactor: abstract out HTTP-related code from server #16488

Open

ngxson added 2 commits November 12, 2025 14:19

Merge branch 'master' into xsn/server_response_generator_refactor

f3bdded

rename generator --> reader

bfa5a70

ggerganov approved these changes Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: (refactor) implement generator-based API for task results #17174

server: (refactor) implement generator-based API for task results #17174

ngxson commented Nov 11, 2025 •

edited

Loading

Uh oh!

ngxson commented Nov 11, 2025 •

edited

Loading

Uh oh!

ngxson Nov 11, 2025 •

edited

Loading

Uh oh!

ngxson Nov 11, 2025

Uh oh!

ngxson commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

server: (refactor) implement generator-based API for task results #17174

Are you sure you want to change the base?

server: (refactor) implement generator-based API for task results #17174

Conversation

ngxson commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Nov 11, 2025 •

edited

Loading

ngxson commented Nov 11, 2025 •

edited

Loading

ngxson Nov 11, 2025 •

edited

Loading