You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The client in version 2 allows to insert data from input byte stream. It also supports inserting data in compressed format (decompress option). I would like to be able to insert data from already compressed data stream. This change allows to create compressed in memory batches and then send them directly.
I have already looked at code and found that there is one major incompatibly with the current API - insert statement is a part of the request body and is added before each request. This is a problem because the whole body has to be compressed or not. So I think the options are to send the statement as query parameter of HTTP or to allow user defined insert statements.
Code example
Here are two examples of how I would like to use the client. The second example is my actual use case - create compressed batches in memory and then sending them. Creating these compressed batches should be more memory efficient which can be used in memory heavy applications.
DISCLAIMER: The following examples do not work with current implementation! they assume that insert query will be sent as an HTTP query parameter.
Compressed stream
voidcompressedStream() {
finalStringtable = "test_table";
finalClickHouseFormatformat = ClickHouseFormat.CSV;
try (Clientclient = newClient.Builder()
.compressClientRequest(false)
.compressServerResponse(false)
.addEndpoint(Protocol.HTTP, "localhost", clickhouse.getMappedPort(8123), false)
.setUsername(clickhouse.getUsername())
.setPassword(clickhouse.getPassword())
.useAsyncRequests(true)
.build()) {
finalvarpipedOutputStream = ClickHouseDataStreamFactory.getInstance().createPipedOutputStream(newClickHouseConfig());
finalOutputStreamcompressedOutputStream = newClickHouseLZ4OutputStream(pipedOutputStream, LZ4Factory.fastestInstance().fastCompressor(), 8192);
finalvarfutureResponse = client.insert(table, pipedOutputStream.getInputStream(), format, newInsertSettings().serverSetting("decompress", "1"));
// write data to insert to compressedOutputStream finalintnumberOfRows = 2;
compressedOutputStream.write("1,foo\n".getBytes());
compressedOutputStream.write("2,bar\n".getBytes());
compressedOutputStream.close();
pipedOutputStream.close();
// insert setting tells ClickHouse that data are compressed // but the insert doesn't work, because the `insert` method prepend the insert statement (and it isn't compressed)try (varresponse = futureResponse.join()) {
finalvarwrittenRows = response.getWrittenRows();
System.out.println("Written rows to ClickHouse: " + writtenRows);
if (writtenRows != numberOfRows) {
System.err.println("Written only " + writtenRows + " from " + numberOfRows + " expected.");
}
}
}
}
I think option with sending SQL as query parameter is good. I will experiment with this. So in case you are reading compressed data from external source it would be great to forward it to server.
What compression algorithm do you use?
Do you plan to insert by small batches?
chernser
changed the title
Inserting compressed data (using compressed input stream)
[client-v2] Inserting compressed data (using compressed input stream)
Dec 10, 2024
It was supported in the old jdbc extended api
Clickhouse's HTTP API supports 'auto', 'none', 'gzip', 'deflate', 'br', 'xz', 'zstd', 'lz4', 'bz2', 'snappy'
I think option with sending SQL as query parameter is good. I will experiment with this. So in case you are reading compressed data from external source it would be great to forward it to server.
I have already modify client, with four line change, to suit my needs (SQL in query parameter). Everything works fine but the usage is not convenient as there are more configurations settings for compression etc.
What compression algorithm do you use?
Do you plan to insert by small batches?
As the example shows, I read uncompressed data from source, perform some transformation and compressed it with LZ4 (ClickHouseLZ4OutputStream). The batches are quite large ~1GB of uncompressed data, so I would like to compress data before sending. Streaming data without buffering does not work for me, because I need to do retry in case of error and the source does not provide any way to read the same data twice.
Describe your feedback
The client in version 2 allows to insert data from input byte stream. It also supports inserting data in compressed format (
decompress
option). I would like to be able to insert data from already compressed data stream. This change allows to create compressed in memory batches and then send them directly.I have already looked at code and found that there is one major incompatibly with the current API - insert statement is a part of the request body and is added before each request. This is a problem because the whole body has to be compressed or not. So I think the options are to send the statement as query parameter of HTTP or to allow user defined insert statements.
Code example
Here are two examples of how I would like to use the client. The second example is my actual use case - create compressed batches in memory and then sending them. Creating these compressed batches should be more memory efficient which can be used in memory heavy applications.
DISCLAIMER: The following examples do not work with current implementation! they assume that insert query will be sent as an HTTP query parameter.
Compressed stream
Compressed batch
The text was updated successfully, but these errors were encountered: