-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL Flock function #14
base: main
Are you sure you want to change the base?
Conversation
Online example mixing up DuckDB httpserver and ClickHouse API requests using JSONEachRow/NDJSON --- Create a Backends table
CREATE TABLE IF NOT EXISTS backends (who VARCHAR, url VARCHAR);
--- Inserts some Backends
INSERT INTO backends VALUES
('httpserver', 'https://quackpy.fly.dev'),
('httpserver2', 'https://cowsdb.fly.dev');
--- Run queries across all backends
SET variable __backends = (SELECT ARRAY_AGG(url) AS urls_array FROM backends);
SELECT * FROM url_flock('SELECT ''hello'', version()', getvariable('__backends') ); |
@carlopi we're almost there! do you perhaps know of a smarter way to achieve the following without passing via a SET? --- Run queries across all backends
SET variable __backends = (SELECT ARRAY_AGG(url) AS urls_array FROM backends);
SELECT * FROM url_flock('SELECT ''hello'', version()', getvariable('__backends') ); TableFunction f(
"url_flock",
{LogicalType::VARCHAR, LogicalType::LIST(LogicalType::VARCHAR)},
DuckFlockImplementation,
DuckFlockBind,
nullptr,
nullptr
); If we try to pass a |
I will take a look at the code, it's either hard (due to read_json not being table-in table-out) or simple. One thing that I think could make sense, possibly optionally via parameter, is adding a column Other question I have is on which extensions this need to reside, I am not completely sure, possibly http_server could also make sense. |
Thanks in advance! We originally tried making this with a regular MACRO but condition 1 above was preventing it....
That would be amazing but I cannot imagine how :) I also thought about passing a DB/schema as a setting to have the function do the host lookups directly but that brings a different set of challenges as well. Curious as of what you think.
This function was originally added to httpserver but i felt it would bloat an already complex role with an even more complex dependency so we decided to stash in the chsql extension as a superpowered Once again thanks in advance for helping us square this out and for the original inspiration :) |
I was thinking something like: CREATE TABLE IF NOT EXISTS backends (who VARCHAR, url VARCHAR);
--- Inserts some Backends
INSERT INTO backends VALUES
('httpserver', 'https://quackpy.fly.dev'),
('httpserver2', 'https://cowsdb.fly.dev');
SELECT * FROM backends, LATERAL (FROM url_flock('SELECT version()', backends.url)); could possibly work, making url_flock a table-in, table-out, that takes a single backend BUT it's table-in, table-out. I will try to make some progress later, I though this was still interesting to share. |
Purely to provide further context, the following ClickHouse functions are in scope of this function emulation:
Essentially querying a remote node or cluster or nodes as ours with parameters inverted - the above variations can be implemented as simple macro aliases on top of the final command we'll come up with. Also in context, the 'cluster_name' approach seems a good one to follow allowing users to group servers by cluster_name when selecting from a backends table. |
It's only tangentially related and I don't think the ergonomics in DuckDB afford this yet, but Arrow IPC over HTTP could be really interesting in conjunction |
@nicosuave I think this might be possible using |
I'm super curious about table-in-out as I do not believe I understand it yet and how to use it. Thanks in advance!
Ultimately you were right. we're going to have a flock function in http_server alongside the discovery API to fill it up! |
url_flock
The
url_flock
function was inspired by an idea of @carlopi from DuckDB Labs 👋When used in combination with the httpserver extension (or compatible endpoints such as ClickHouse HTTP using format JSONEachRow) the
url_flock
helper builds a multi-backend query with UNON ALL aggregation of results.Limitations
Currently the function expects all backends to return the same columns (or no data).
Examples
The construction supports Token based authentication to the httpserver extension:
Known Issues
HTTP Auth
HEAD requests fail when Basic Auth
user:password@host
is included in the url,. Works with X-Tokens.