You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into a problem after adding summary metrics to the fluentd configuration file: scrape duration started going up and in the end prometheus timed out trying to get metrics from the td-agent.
Scrape duration graph
td-agent plugins and configuration
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-filter_typecast' version '0.0.3' [0/9938]
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-grok-parser' version '2.6.1'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-kafka' version '0.14.1'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-prometheus' version '1.8.3'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.0.2'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.3.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-s3' version '1.4.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-td' version '1.1.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-webhdfs' version '1.2.5'
2020-08-28 14:27:27 +0300 [info]: gem 'fluentd' version '1.11.2'
2020-08-28 14:27:27 +0300 [info]: starting fluentd-1.11.2 as dry run mode ruby="2.7.1"
2020-08-28 14:27:27 +0300 [info]: using configuration file: <ROOT>
<source>
@type prometheus
</source>
<source>
@type forward
bind "0.0.0.0"
port 24224
</source>
<filter syslog_haproxy>
@type typecast
types "time_duration:integer,time_be_response:integer"
</filter>
<filter syslog_haproxy>
@type prometheus
<metric>
name http_request_duration_msecs_summary
type summary
desc HTTP request duration summary (im milliseconds)
key time_duration
<labels>
be ${be_name}
</labels>
</metric>
<metric>
name http_backend_response_msecs_summary
type summary
desc HTTP backend response time summary (im milliseconds)
key time_be_response
<labels>
be ${be_name}
</labels>
</metric>
</filter>
<match *>
@type null
</match>
</ROOT>
2020-08-28 14:27:27 +0300 [info]: finished dry run mode
In need of your assistance. Periodical restart of the td-agent service is not the option))
There is no such an issue with different types of metrics, only with summary.
Regards.
The text was updated successfully, but these errors were encountered:
From personal production experience: drop summary altogether and replace them by histograms. This will prevent fluentd to become slower and slower. The computing charge will be displaced on prometheus side when you make query to compute quantiles from the histogram metrics.
Hi!
I ran into a problem after adding summary metrics to the fluentd configuration file: scrape duration started going up and in the end prometheus timed out trying to get metrics from the td-agent.
Scrape duration graph
td-agent plugins and configuration
In need of your assistance. Periodical restart of the td-agent service is not the option))
There is no such an issue with different types of metrics, only with summary.
Regards.
The text was updated successfully, but these errors were encountered: