Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping of summary metrics starts taking longer and longer #172

Open
alexd88 opened this issue Aug 28, 2020 · 1 comment
Open

Scraping of summary metrics starts taking longer and longer #172

alexd88 opened this issue Aug 28, 2020 · 1 comment

Comments

@alexd88
Copy link

alexd88 commented Aug 28, 2020

Hi!

I ran into a problem after adding summary metrics to the fluentd configuration file: scrape duration started going up and in the end prometheus timed out trying to get metrics from the td-agent.

Scrape duration graph
Screenshot 2020-08-28 at 14 40 29

td-agent plugins and configuration

2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-filter_typecast' version '0.0.3'                                                                                                                                                 [0/9938]
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-grok-parser' version '2.6.1'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-kafka' version '0.14.1'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-prometheus' version '1.8.3'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.0.2'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-record-modifier' version '2.1.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.3.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-s3' version '1.4.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-systemd' version '1.0.2'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-td' version '1.1.0'
2020-08-28 14:27:27 +0300 [info]: gem 'fluent-plugin-webhdfs' version '1.2.5'
2020-08-28 14:27:27 +0300 [info]: gem 'fluentd' version '1.11.2'
2020-08-28 14:27:27 +0300 [info]: starting fluentd-1.11.2 as dry run mode ruby="2.7.1"
2020-08-28 14:27:27 +0300 [info]: using configuration file: <ROOT>
  <source>
    @type prometheus
  </source>
  <source>
    @type forward
    bind "0.0.0.0"
    port 24224
  </source>
  <filter syslog_haproxy>
    @type typecast
    types "time_duration:integer,time_be_response:integer"
  </filter>
  <filter syslog_haproxy>
    @type prometheus
    <metric>
      name http_request_duration_msecs_summary
      type summary
      desc HTTP request duration summary (im milliseconds)
      key time_duration
      <labels>
        be ${be_name}
      </labels>
    </metric>
    <metric>
      name http_backend_response_msecs_summary
      type summary
      desc HTTP backend response time summary (im milliseconds)
      key time_be_response
      <labels>
        be ${be_name}
      </labels>
    </metric>
  </filter>
  <match *>
    @type null
  </match>
</ROOT>
2020-08-28 14:27:27 +0300 [info]: finished dry run mode

In need of your assistance. Periodical restart of the td-agent service is not the option))

There is no such an issue with different types of metrics, only with summary.

Regards.

@AntoineC44
Copy link
Contributor

Hello Mister,

From personal production experience: drop summary altogether and replace them by histograms. This will prevent fluentd to become slower and slower. The computing charge will be displaced on prometheus side when you make query to compute quantiles from the histogram metrics.

See https://github.com/fluent/fluent-plugin-prometheus#histogram-type and https://prometheus.io/docs/practices/histograms/

Btw in recent prom ruby client, summary metric no longer exposes quantiles, I request an update to latest prom client version in #180

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants