HighLoad tweaks: vector->vector_aggregator->clickhouse #21448
Unanswered
ADovgalyuk
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello team,
I'm using following scheme:
30 vectors on each VM, which collect data through source:file and use sink:vector to one (+1 duplicate) global vector acting as an aggregator). Buffer type is disk. There are two types of VMs with different type of files.
On vector hub we have sourcevector and sink:clickhouse with following configuration:
And
The same configuration on another server acting as hot backup.
But as soon as we got higher traffic there is a delay in sink for type called SSU_L3. As you can see, green line is the same as before, no actual delays. But for SSU_L3 it looks like this, that it collects data for some time and only then try to upload it to sink using bigger batches.
Grafana:
Vector top as well. I can see that Events In are increasing for source and sink almost 30 seconds and only then EventsOut start to sink big batches of data:
For another type called RSU_cdr it's ok, data is uploaded almost at realtime.
Insert log from clickhouse: again, constant small upload for one type and periodical big chunks for another
Question:
Which parameters should i tweak to force sink to clickhouse more often, every second or two, not 20-30.
I thought that batch.timeout_secs will be able to do this, but no. According to number of bytes\rows in sink inserts which can be up to millions of rows there is no problem with DB as sink, it supports higher number of rows in one batch.
Beta Was this translation helpful? Give feedback.
All reactions