From 284632ff98da24f5cf97b4308197280d61fc6a76 Mon Sep 17 00:00:00 2001 From: Yilia Lin <114121331+Yilialinn@users.noreply.github.com> Date: Fri, 3 Jan 2025 08:58:11 +0800 Subject: [PATCH] docs: improve prometheus plugin docs (#11874) --- docs/en/latest/plugins/prometheus.md | 645 ++++++++++++++------------- docs/zh/latest/plugins/prometheus.md | 618 +++++++++++++------------ 2 files changed, 657 insertions(+), 606 deletions(-) diff --git a/docs/en/latest/plugins/prometheus.md b/docs/en/latest/plugins/prometheus.md index 3200fb48a09c..1c5c5300b994 100644 --- a/docs/en/latest/plugins/prometheus.md +++ b/docs/en/latest/plugins/prometheus.md @@ -5,7 +5,7 @@ keywords: - API Gateway - Plugin - Prometheus -description: This document contains information about the Apache APISIX prometheus Plugin. +description: The prometheus Plugin provides the capability to integrate APISIX with Prometheus for metric collection and continuous monitoring. --- -## Description - -The `prometheus` Plugin exports metrics in [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#exposition-formats). + + + -## Attributes +## Description -| Name | Type | Required | Default | Description | -| ----------- | ------- | -------- | ------- | --------------------------------------------------------------------------------- | -| prefer_name | boolean | False | false | When set to `true`, prints Route/Service name instead of ID in Prometheus metric. | +The `prometheus` Plugin provides the capability to integrate APISIX with [Prometheus](https://prometheus.io). -### Specifying `export_uri` +After enabling the Plugin, APISIX will start collecting relevant metrics, such as API requests and latencies, and exporting them in a [text-based exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#exposition-formats) to Prometheus. You can then create event monitoring and alerting in Prometheus to monitor the health of your API gateway and APIs. -You can change the default export URI by configuring the `export_uri` attribute under `plugin_attr` in your configuration file (`conf/config.yaml`). +## Static Configurations -| Name | Type | Default | Description | -| ---------- | ------ | ---------------------------- | ------------------------------------- | -| export_uri | string | "/apisix/prometheus/metrics" | URI to export the Prometheus metrics. | +By default, `prometheus` configurations are pre-configured in the [default configuration](https://github.com/apache/apisix/blob/master/apisix/cli/config.lua). -Here is a configuration example: +To customize these values, add the corresponding configurations to `config.yaml`. For example: -```yaml title="conf/config.yaml" +```yaml plugin_attr: - prometheus: - export_uri: /apisix/metrics + prometheus: # Plugin: prometheus attributes + export_uri: /apisix/prometheus/metrics # Set the URI for the Prometheus metrics endpoint. + metric_prefix: apisix_ # Set the prefix for Prometheus metrics generated by APISIX. + enable_export_server: true # Enable the Prometheus export server. + export_addr: # Set the address for the Prometheus export server. + ip: 127.0.0.1 # Set the IP. + port: 9091 # Set the port. + # metrics: # Create extra labels for metrics. + # http_status: # These metrics will be prefixed with `apisix_`. + # extra_labels: # Set the extra labels for http_status metrics. + # - upstream_addr: $upstream_addr + # - status: $upstream_status + # expire: 0 # The expiration time of metrics in seconds. + # 0 means the metrics will not expire. + # http_latency: + # extra_labels: # Set the extra labels for http_latency metrics. + # - upstream_addr: $upstream_addr + # expire: 0 # The expiration time of metrics in seconds. + # 0 means the metrics will not expire. + # bandwidth: + # extra_labels: # Set the extra labels for bandwidth metrics. + # - upstream_addr: $upstream_addr + # expire: 0 # The expiration time of metrics in seconds. + # 0 means the metrics will not expire. + # default_buckets: # Set the default buckets for the `http_latency` metrics histogram. + # - 10 + # - 50 + # - 100 + # - 200 + # - 500 + # - 1000 + # - 2000 + # - 5000 + # - 10000 + # - 30000 + # - 60000 + # - 500 ``` -### Specifying `metrics` +You can use the [Nginx variable](https://nginx.org/en/docs/http/ngx_http_core_module.html) to create `extra_labels`. See [add extra labels](#add-extra-labels-for-metrics). -For http request related metrics, you could specify extra labels, which match the APISIX variables. +Reload APISIX for changes to take effect. -If you specify label for nonexist APISIX variable, the label value would be "". +## Attribute -Currently, only below metrics are supported: +| Name | Type | Required | Default | Valid values | Description | +| ------------- | ------- | -------- | ------- | ------------ | ------------------------------------------ | +| `prefer_name` | boolean | | False | | If true, export Route/Service name instead of their ID in Prometheus metrics. | -* http_status -* http_latency -* bandwidth +## Metrics -Here is a configuration example: +There are different types of metrics in Prometheus. To understand their differences, see [metrics types](https://prometheus.io/docs/concepts/metric_types/). -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - metrics: - http_status: - extra_labels: - - upstream_addr: $upstream_addr - - upstream_status: $upstream_status -``` - -### Specifying `default_buckets` - -`DEFAULT_BUCKETS` is the default value for bucket array in `http_latency` metrics. - -You can change the `DEFAULT_BUCKETS` by configuring `default_buckets` attribute in you configuration file. - -Here is a configuration example: - -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - default_buckets: - - 15 - - 55 - - 105 - - 205 - - 505 -``` +The following metrics are exported by the `prometheus` Plugin by default. See [get APISIX metrics](#get-apisix-metrics) for an example. Note that some metrics, such as `apisix_batch_process_entries`, are not readily visible if there are no data. -### Specifying `expire` +| Name | Type | Description | +| ------------------------------ | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| apisix_bandwidth | counter | Total amount of traffic flowing through APISIX in bytes. | +| apisix_etcd_modify_indexes | gauge | Number of changes to etcd by APISIX keys. | +| apisix_batch_process_entries | gauge | Number of remaining entries in a batch when sending data in batches, such as with `http logger`, and other logging Plugins. | +| apisix_etcd_reachable | gauge | Whether APISIX can reach etcd. A value of `1` represents reachable and `0` represents unreachable. | +| apisix_http_status | counter | HTTP status codes returned from upstream Services. | +| apisix_http_requests_total | gauge | Number of HTTP requests from clients. | +| apisix_nginx_http_current_connections | gauge | Number of current connections with clients. | +| apisix_nginx_metric_errors_total | counter | Total number of `nginx-lua-prometheus` errors. | +| apisix_http_latency | histogram | HTTP request latency in milliseconds. | +| apisix_node_info | gauge | Information of the APISIX node, such as host name. | +| apisix_shared_dict_capacity_bytes | gauge | The total capacity of an [NGINX shared dictionary](https://github.com/openresty/lua-nginx-module#ngxshareddict). | +| apisix_shared_dict_free_space_bytes | gauge | The remaining space in an [NGINX shared dictionary](https://github.com/openresty/lua-nginx-module#ngxshareddict). | +| apisix_upstream_status | gauge | Health check status of upstream nodes, available if health checks are configured on the upstream. A value of `1` represents healthy and `0` represents unhealthy. | +| apisix_stream_connection_total | counter | Total number of connections handled per Stream Route. | -`expire` sets the expiration time of `apisix_http_status`, `apisix_bandwidth`, and `apisix_http_latency` metrics in seconds. When set to 0, metrics will not expire. +## Labels -Here is a configuration example: +[Labels](https://prometheus.io/docs/practices/naming/#labels) are attributes of metrics that are used to differentiate metrics. -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - expire: 86400 -``` +For example, the `apisix_http_status` metric can be labeled with `route` information to identify which Route the HTTP status originates from. -## Metrics endpoint +The following are labels for a non-exhaustive list of APISIX metrics and their descriptions. -This Plugin will add the metrics endpoint `/apisix/prometheus/metrics` or your custom export URI for exposing the metrics. +### Labels for `apisix_http_status` -These metrics are exposed by a separate Prometheus server address. By default, the address is `127.0.0.1:9091`. You can change it in your configuration file (`conf/config.yaml`): +The following labels are used to differentiate `apisix_http_status` metrics. -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - export_addr: - ip: ${{INTRANET_IP}} - port: 9092 -``` +| Name | Description | +| ------------ | ----------------------------------------------------------------------------------------------------------------------------- | +| code | HTTP response code returned by the upstream node. | +| route | ID of the Route that the HTTP status originates from when `prefer_name` is `false` (default), and name of the Route when `prefer_name` to `true`. Default to an empty string if a request does not match any Route. | +| matched_uri | URI of the Route that matches the request. Default to an empty string if a request does not match any Route. | +| matched_host | Host of the Route that matches the request. Default to an empty string if a request does not match any Route, or host is not configured on the Route. | +| service | ID of the Service that the HTTP status originates from when `prefer_name` is `false` (default), and name of the Service when `prefer_name` to `true`. Default to the configured value of host on the Route if the matched Route does not belong to any Service. | +| consumer | Name of the Consumer associated with a request. Default to an empty string if no Consumer is associated with the request. | +| node | IP address of the upstream node. | -Now, if the environment variable `INTRANET_IP` is `172.1.1.1`, APISIX will export the metrics via `172.1.1.1:9092`. +### Labels for `apisix_bandwidth` -If you still want to expose the metrics via the data plane port (default: `9080`), you can configure it as shown below: +The following labels are used to differentiate `apisix_bandwidth` metrics. -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - enable_export_server: false -``` +| Name | Description | +| ---------- | ----------------------------------------------------------------------------------------------------------------------------- | +| type | Type of traffic, `egress` or `ingress`. | +| route | ID of the Route that bandwidth corresponds to when `prefer_name` is `false` (default), and name of the Route when `prefer_name` to `true`. Default to an empty string if a request does not match any Route. | +| service | ID of the Service that bandwidth corresponds to when `prefer_name` is `false` (default), and name of the Service when `prefer_name` to `true`. Default to the configured value of host on the Route if the matched Route does not belong to any Service. | +| consumer | Name of the Consumer associated with a request. Default to an empty string if no Consumer is associated with the request. | +| node | IP address of the upstream node. | -You can then expose it by using the [public-api](public-api.md) Plugin. +### Labels for `apisix_http_latency` -:::info IMPORTANT +The following labels are used to differentiate `apisix_http_latency` metrics. -If the Prometheus plugin collects too many metrics, it will take CPU resources to calculate the metric data when getting the metrics via URI, which may affect APISIX to process normal requests. To solve this problem, APISIX exposes the URI and calculates the metrics in the [privileged agent](https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/process.md#enable_privileged_agent). -If the URI is exposed using the public-api plugin, then APISIX will calculate the metric data in a normal worker process, which may still affect APISIX processing of normal requests. +| Name | Description | +| ---------- | ----------------------------------------------------------------------------------------------------------------------------------- | +| type | Type of latencies. See [latency types](#latency-types) for details. | +| route | ID of the Route that latencies correspond to when `prefer_name` is `false` (default), and name of the Route when `prefer_name` to `true`. Default to an empty string if a request does not match any Route. | +| service | ID of the Service that latencies correspond to when `prefer_name` is `false` (default), and name of the Service when `prefer_name` to `true`. Default to the configured value of host on the Route if the matched Route does not belong to any Service. | +| consumer | Name of the Consumer associated with latencies. Default to an empty string if no Consumer is associated with the request. | +| node | IP address of the upstream node associated with latencies. | -This feature requires APISIX to run on [APISIX-Runtime](../FAQ.md#how-do-i-build-the-apisix-runtime-environment). +#### Latency Types -::: +`apisix_http_latency` can be labeled with one of the three types: -## Enable Plugin +* `request` represents the time elapsed between the first byte was read from the client and the log write after the last byte was sent to the client. -The `prometheus` Plugin can be enabled with an empty table. +* `upstream` represents the time elapsed waiting on responses from the upstream Service. -The example below shows how you can configure the Plugin on a specific Route: +* `apisix` represents the difference between the `request` latency and `upstream` latency. -:::note -You can fetch the `admin_key` from `config.yaml` and save to an environment variable with the following command: +In other words, the APISIX latency is not only attributed to the Lua processing. It should be understood as follows: -```bash -admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g') +```text +APISIX latency + = downstream request time - upstream response time + = downstream traffic latency + NGINX latency ``` -::: +### Labels for `apisix_upstream_status` -```shell -curl http://127.0.0.1:9180/apisix/admin/routes/1 -H "X-API-KEY: $admin_key" -X PUT -d ' -{ - "uri": "/hello", - "plugins": { - "prometheus":{} - }, - "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:80": 1 - } - } -}' -``` - -:::note +The following labels are used to differentiate `apisix_upstream_status` metrics. -When `prefer_name` is set to `true` make sure to not duplicate names for multiple Routes/Services or it could be misleading. +| Name | Description | +| ---------- | --------------------------------------------------------------------------------------------------- | +| name | Resource ID corresponding to the upstream configured with health checks, such as `/apisix/routes/1` and `/apisix/upstreams/1`. | +| ip | IP address of the upstream node. | +| port | Port number of the node. | -::: +## Examples - +If you deploy APISIX in a containerized environment and would like to access the Prometheus metrics endpoint externally, update the configuration file as follows and reload APISIX: -## Fetching metrics +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: + export_addr: + ip: 0.0.0.0 +``` -You can fetch the metrics from the specified export URI (default: `/apisix/prometheus/metrics`): +Send a request to the APISIX Prometheus metrics endpoint: ```shell -curl -i http://127.0.0.1:9091/apisix/prometheus/metrics +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` -You can add this address to Prometheus to fetch the data: +You should see an output similar to the following: -```yaml -scrape_configs: - - job_name: "apisix" - scrape_interval: 15s # This value will be related to the time range of the rate function in Prometheus QL. The time range in the rate function should be at least twice this value. - metrics_path: "/apisix/prometheus/metrics" - static_configs: - - targets: ["127.0.0.1:9091"] +```text +# HELP apisix_bandwidth Total bandwidth in bytes consumed per Service in Apisix +# TYPE apisix_bandwidth counter +apisix_bandwidth{type="egress",route="",service="",consumer="",node=""} 8417 +apisix_bandwidth{type="egress",route="1",service="",consumer="",node="127.0.0.1"} 1420 +apisix_bandwidth{type="egress",route="2",service="",consumer="",node="127.0.0.1"} 1420 +apisix_bandwidth{type="ingress",route="",service="",consumer="",node=""} 189 +apisix_bandwidth{type="ingress",route="1",service="",consumer="",node="127.0.0.1"} 332 +apisix_bandwidth{type="ingress",route="2",service="",consumer="",node="127.0.0.1"} 332 +# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys +# TYPE apisix_etcd_modify_indexes gauge +apisix_etcd_modify_indexes{key="consumers"} 0 +apisix_etcd_modify_indexes{key="global_rules"} 0 +... ``` -Now, you will be able to check the status in your Prometheus console: +### Expose APISIX Metrics on Public API Endpoint -![checking status on prometheus dashboard](../../../assets/images/plugin/prometheus01.png) +The following example demonstrates how you can disable the Prometheus export server that, by default, exposes an endpoint on port `9091`, and expose APISIX Prometheus metrics on a new public API endpoint on port `9080`, which APISIX uses to listen to other client requests. -![prometheus apisix in-depth metric view](../../../assets/images/plugin/prometheus02.png) +Disable the Prometheus export server in the configuration file and reload APISIX for changes to take effect: -## Using Grafana to graph the metrics +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: + enable_export_server: false +``` -Metrics exported by the `prometheus` Plugin can be graphed in Grafana using a drop in dashboard. +Next, create a Route with [`public-api`](../../../en/latest/plugins/public-api.md) Plugin and expose a public API endpoint for APISIX metrics: -To set it up, download [Grafana dashboard meta](https://github.com/apache/apisix/blob/master/docs/assets/other/json/apisix-grafana-dashboard.json) and import it in Grafana. Or, you can go to [Grafana official](https://grafana.com/grafana/dashboards/11719) for Grafana metadata. +```shell +curl "http://127.0.0.1:9180/apisix/admin/routes/prometheus-metrics" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "uri": "/apisix/prometheus/metrics", + "plugins": { + "public-api": {} + } + }' +``` -![Grafana chart-1](../../../assets/images/plugin/grafana-1.png) +Send a request to the new metrics endpoint to verify: -![Grafana chart-2](../../../assets/images/plugin/grafana-2.png) +```shell +curl "http://127.0.0.1:9080/apisix/prometheus/metrics" +``` -![Grafana chart-3](../../../assets/images/plugin/grafana-3.png) +You should see an output similar to the following: -![Grafana chart-4](../../../assets/images/plugin/grafana-4.png) +```text +# HELP apisix_http_requests_total The total number of client requests since APISIX started +# TYPE apisix_http_requests_total gauge +apisix_http_requests_total 1 +# HELP apisix_nginx_http_current_connections Number of HTTP connections +# TYPE apisix_nginx_http_current_connections gauge +apisix_nginx_http_current_connections{state="accepted"} 1 +apisix_nginx_http_current_connections{state="active"} 1 +apisix_nginx_http_current_connections{state="handled"} 1 +apisix_nginx_http_current_connections{state="reading"} 0 +apisix_nginx_http_current_connections{state="waiting"} 0 +apisix_nginx_http_current_connections{state="writing"} 1 +... +``` -## Available HTTP metrics +### Monitor Upstream Health Statuses -The following metrics are exported by the `prometheus` Plugin: +The following example demonstrates how to monitor the health status of upstream nodes. -- Status code: HTTP status code returned from Upstream services. They are available for a single service and across all services. +Create a Route with the `prometheus` Plugin and configure upstream active health checks: - The available attributes are: +```shell +curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "id": "prometheus-route", + "uri": "/get", + "plugins": { + "prometheus": {} + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "httpbin.org:80": 1, + "127.0.0.1:20001": 1 + }, + "checks": { + "active": { + "timeout": 5, + "http_path": "/status", + "healthy": { + "interval": 2, + "successes": 1 + }, + "unhealthy": { + "interval": 1, + "http_failures": 2 + } + }, + "passive": { + "healthy": { + "http_statuses": [200, 201], + "successes": 3 + }, + "unhealthy": { + "http_statuses": [500], + "http_failures": 3, + "tcp_failures": 3 + } + } + } + } + }' +``` - | Name | Description | - |--------------|-------------------------------------------------------------------------------------------------------------------------------| - | code | HTTP status code returned by the upstream service. | - | route | `route_id` of the matched Route with request. Defaults to an empty string if the Routes don't match. | - | matched_uri | `uri` of the Route matching the request. Defaults to an empty string if the Routes don't match. | - | matched_host | `host` of the Route matching the request. Defaults to an empty string if the Routes don't match. | - | service | `service_id` of the Route matching the request. If the Route does not have a `service_id` configured, it defaults to `$host`. | - | consumer | `consumer_name` of the Consumer matching the request. Defaults to an empty string if it does not match. | - | node | IP address of the Upstream node. | +Send a request to the APISIX Prometheus metrics endpoint: -- Bandwidth: Total amount of traffic (ingress and egress) flowing through APISIX. Total bandwidth of a service can also be obtained. +```shell +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" +``` - The available attributes are: +You should see an output similar to the following: + +```text +# HELP apisix_upstream_status upstream status from health check +# TYPE apisix_upstream_status gauge +apisix_upstream_status{name="/apisix/routes/1",ip="54.237.103.220",port="80"} 1 +apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="20001"} 0 +``` - | Name | Description | - |----------|-------------------------------------------------------------------------------------------------------------------------------| - | type | Type of traffic (egress/ingress). | - | route | `route_id` of the matched Route with request. Defaults to an empty string if the Routes don't match. | - | service | `service_id` of the Route matching the request. If the Route does not have a `service_id` configured, it defaults to `$host`. | - | consumer | `consumer_name` of the Consumer matching the request. Defaults to an empty string if it does not match. | - | node | IP address of the Upstream node. | +This shows that the upstream node `httpbin.org:80` is healthy and the upstream node `127.0.0.1:20001` is unhealthy. -- etcd reachability: A gauge type representing whether etcd can be reached by APISIX. A value of `1` represents reachable and `0` represents unreachable. -- Connections: Nginx connection metrics like active, reading, writing, and number of accepted connections. -- Batch process entries: A gauge type useful when Plugins like [syslog](./syslog.md), [http-logger](./http-logger.md), [tcp-logger](./tcp-logger.md), [udp-logger](./udp-logger.md), and [zipkin](./zipkin.md) use batch process to send data. Entries that hasn't been sent in batch process will be counted in the metrics. -- Latency: Histogram of the request time per service in different dimensions. +### Add Extra Labels for Metrics - The available attributes are: +The following example demonstrates how to add additional labels to metrics and use the [Nginx variable](https://nginx.org/en/docs/http/ngx_http_core_module.html) in label values. - | Name | Description | - |----------|-------------------------------------------------------------------------------------------------------------------------------------| - | type | Value can be one of `apisix`, `upstream`, or `request`. This translates to latency caused by APISIX, Upstream, or both (their sum). | - | route | `route_id` of the matched Route with request. Defaults to an empty string if the Routes don't match. | - | service | `service_id` of the Route matching the request. If the Route does not have a `service_id` configured, it defaults to `$host`. | - | consumer | `consumer_name` of the Consumer matching the request. Defaults to an empty string if it does not match. | - | node | IP address of the Upstream node. | +Currently, only the following metrics support extra labels: -- Info: Information about the APISIX node. -- Shared dict: The capacity and free space of all nginx.shared.DICT in APISIX. +* apisix_http_status +* apisix_http_latency +* apisix_bandwidth -- `apisix_upstream_status`: Health check result status of upstream nodes. A value of `1` represents healthy and `0` represents unhealthy. +Include the following configurations in the configuration file to add labels for metrics and reload APISIX for changes to take effect: - The available attributes are: +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: # Plugin: prometheus + metrics: # Create extra labels from the NGINX variables. + http_status: + extra_labels: # Set the extra labels for http_status metrics. + - upstream_addr: $upstream_addr # Add an extra upstream_addr label with value being the NGINX variable $upstream_addr. + - route_name: $route_name # Add an extra route_name label with value being the APISIX variable $route_name. +``` - | Name | Description | - |--------------|-------------------------------------------------------------------------------------------------------------------------------| - | name | resource id where the upstream node is attached to, e.g. `/apisix/routes/1`, `/apisix/upstreams/1`. | - | ip | ip address of the node. | - | port | port number of the node. | +Note that if you define a variable in the label value but it does not correspond to any existing [APISIX variables](https://apisix.apache.org/docs/apisix/apisix-variable/) and [Nginx variable](https://nginx.org/en/docs/http/ngx_http_core_module.html), the label value will default to an empty string. -Here are the original metrics from APISIX: +Create a Route with the `prometheus` Plugin: ```shell -curl http://127.0.0.1:9091/apisix/prometheus/metrics +curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "id": "prometheus-route", +Include the following configurations in the configuration file to add labels for metrics and reload APISIX for changes to take effect: + "name": "extra-label", + "plugins": { + "prometheus": {} + }, + "upstream": { + "nodes": { + "httpbin.org:80": 1 + } + } + }' ``` +Send a request to the Route to verify: + ```shell -# HELP apisix_bandwidth Total bandwidth in bytes consumed per service in Apisix -# TYPE apisix_bandwidth counter -apisix_bandwidth{type="egress",route="",service="",consumer="",node=""} 8417 -apisix_bandwidth{type="egress",route="1",service="",consumer="",node="127.0.0.1"} 1420 -apisix_bandwidth{type="egress",route="2",service="",consumer="",node="127.0.0.1"} 1420 -apisix_bandwidth{type="ingress",route="",service="",consumer="",node=""} 189 -apisix_bandwidth{type="ingress",route="1",service="",consumer="",node="127.0.0.1"} 332 -apisix_bandwidth{type="ingress",route="2",service="",consumer="",node="127.0.0.1"} 332 -# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys -# TYPE apisix_etcd_modify_indexes gauge -apisix_etcd_modify_indexes{key="consumers"} 0 -apisix_etcd_modify_indexes{key="global_rules"} 0 -apisix_etcd_modify_indexes{key="max_modify_index"} 222 -apisix_etcd_modify_indexes{key="prev_index"} 35 -apisix_etcd_modify_indexes{key="protos"} 0 -apisix_etcd_modify_indexes{key="routes"} 222 -apisix_etcd_modify_indexes{key="services"} 0 -apisix_etcd_modify_indexes{key="ssls"} 0 -apisix_etcd_modify_indexes{key="stream_routes"} 0 -apisix_etcd_modify_indexes{key="upstreams"} 0 -apisix_etcd_modify_indexes{key="x_etcd_index"} 223 -# HELP apisix_batch_process_entries batch process remaining entries -# TYPE apisix_batch_process_entries gauge -apisix_batch_process_entries{name="http-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="sls-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="tcp-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="udp-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="sys-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="zipkin_report",route_id="9",server_addr="127.0.0.1"} 1 -# HELP apisix_etcd_reachable Config server etcd reachable from Apisix, 0 is unreachable -# TYPE apisix_etcd_reachable gauge -apisix_etcd_reachable 1 -# HELP apisix_http_status HTTP status codes per service in Apisix -# TYPE apisix_http_status counter -apisix_http_status{code="200",route="1",matched_uri="/hello",matched_host="",service="",consumer="",node="127.0.0.1"} 4 -apisix_http_status{code="200",route="2",matched_uri="/world",matched_host="",service="",consumer="",node="127.0.0.1"} 4 -apisix_http_status{code="404",route="",matched_uri="",matched_host="",service="",consumer="",node=""} 1 -# HELP apisix_http_requests_total The total number of client requests -# TYPE apisix_http_requests_total gauge -apisix_http_requests_total 1191780 -# HELP apisix_nginx_http_current_connections Number of HTTP connections -# TYPE apisix_nginx_http_current_connections gauge -apisix_nginx_http_current_connections{state="accepted"} 11994 -apisix_nginx_http_current_connections{state="active"} 2 -apisix_nginx_http_current_connections{state="handled"} 11994 -apisix_nginx_http_current_connections{state="reading"} 0 -apisix_nginx_http_current_connections{state="waiting"} 1 -apisix_nginx_http_current_connections{state="writing"} 1 -# HELP apisix_nginx_metric_errors_total Number of nginx-lua-prometheus errors -# TYPE apisix_nginx_metric_errors_total counter -apisix_nginx_metric_errors_total 0 -# HELP apisix_http_latency HTTP request latency in milliseconds per service in APISIX -# TYPE apisix_http_latency histogram -apisix_http_latency_bucket{type="apisix",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="apisix",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -apisix_http_latency_bucket{type="upstream",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="upstream",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -... -# HELP apisix_node_info Info of APISIX node -# TYPE apisix_node_info gauge -apisix_node_info{hostname="desktop-2022q8f-wsl"} 1 -# HELP apisix_shared_dict_capacity_bytes The capacity of each nginx shared DICT since APISIX start -# TYPE apisix_shared_dict_capacity_bytes gauge -apisix_shared_dict_capacity_bytes{name="access-tokens"} 1048576 -apisix_shared_dict_capacity_bytes{name="balancer-ewma"} 10485760 -apisix_shared_dict_capacity_bytes{name="balancer-ewma-last-touched-at"} 10485760 -apisix_shared_dict_capacity_bytes{name="balancer-ewma-locks"} 10485760 -apisix_shared_dict_capacity_bytes{name="discovery"} 1048576 -apisix_shared_dict_capacity_bytes{name="etcd-cluster-health-check"} 10485760 -... -# HELP apisix_shared_dict_free_space_bytes The free space of each nginx shared DICT since APISIX start -# TYPE apisix_shared_dict_free_space_bytes gauge -apisix_shared_dict_free_space_bytes{name="access-tokens"} 1032192 -apisix_shared_dict_free_space_bytes{name="balancer-ewma"} 10412032 -apisix_shared_dict_free_space_bytes{name="balancer-ewma-last-touched-at"} 10412032 -apisix_shared_dict_free_space_bytes{name="balancer-ewma-locks"} 10412032 -apisix_shared_dict_free_space_bytes{name="discovery"} 1032192 -apisix_shared_dict_free_space_bytes{name="etcd-cluster-health-check"} 10412032 -... -# HELP apisix_upstream_status Upstream status from health check -# TYPE apisix_upstream_status gauge -apisix_upstream_status{name="/apisix/routes/1",ip="100.24.156.8",port="80"} 0 -apisix_upstream_status{name="/apisix/routes/1",ip="52.86.68.46",port="80"} 1 +curl -i "http://127.0.0.1:9080/get" ``` -## Delete Plugin +You should see an `HTTP/1.1 200 OK` response. -To remove the `prometheus` Plugin, you can delete the corresponding JSON configuration from the Plugin configuration. APISIX will automatically reload and you do not have to restart for this to take effect. +Send a request to the APISIX Prometheus metrics endpoint: ```shell -curl http://127.0.0.1:9180/apisix/admin/routes/1 -H "X-API-KEY: $admin_key" -X PUT -d ' -{ - "uri": "/hello", - "plugins": {}, - "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:80": 1 - } - } -}' +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` -## How to enable it for TCP/UDP +You should see an output similar to the following: -:::info IMPORTANT - -This feature requires APISIX to run on [APISIX-Runtime](../FAQ.md#how-do-i-build-the-apisix-runtime-environment?). +```text +# HELP apisix_http_status HTTP status codes per Service in APISIX +# TYPE apisix_http_status counter +apisix_http_status{code="200",route="1",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220",upstream_addr="54.237.103.220:80",route_name="extra-label"} 1 +``` -::: +### Monitor TCP/UDP Traffic with Prometheus -We can also enable `prometheus` to collect metrics for TCP/UDP. +The following example demonstrates how to collect TCP/UDP traffic metrics in APISIX. -First of all, ensure `prometheus` plugin is in your configuration file (`conf/config.yaml`): +Include the following configurations in `config.yaml` to enable stream proxy and `prometheus` Plugin for stream proxy. Reload APISIX for changes to take effect: ```yaml title="conf/config.yaml" +apisix: + proxy_mode: http&stream # Enable both L4 & L7 proxies + stream_proxy: # Configure L4 proxy + tcp: + - 9100 # Set TCP proxy listening port + udp: + - 9200 # Set UDP proxy listening port + stream_plugins: - - ... - - prometheus + - prometheus # Enable prometheus for stream proxy ``` -Then you need to configure the `prometheus` plugin on the stream route: +Create a Stream Route with the `prometheus` Plugin: ```shell -curl http://127.0.0.1:9180/apisix/admin/stream_routes/1 -H "X-API-KEY: $admin_key" -X PUT -d ' -{ +curl "http://127.0.0.1:9180/apisix/admin/stream_routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ +Include the following configurations in `config.yaml` to enable stream proxy and enable `prometheus` Plugin for stream proxy. Reload APISIX for changes to take effect: "plugins": { - "prometheus":{} + "prometheus":{} }, "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:80": 1 - } + "type": "roundrobin", + "nodes": { + "httpbin.org:80": 1 + } } -}' + }' ``` -## Available TCP/UDP metrics - -The following metrics are available when using APISIX as an L4 proxy. +Send a request to the Stream Route to verify: -* `Stream Connections`: The number of processed connections at the route level. - - Attributes: +```shell +curl -i "http://127.0.0.1:9100" +``` - | Name | Description | - | ------------- | -------------------- | - | route | matched stream route ID | -* `Connections`: Various Nginx connection metrics like active, reading, writing, and number of accepted connections. -* `Info`: Information about the current APISIX node. +You should see an `HTTP/1.1 200 OK` response. -Here are examples of APISIX metrics: +Send a request to the APISIX Prometheus metrics endpoint: ```shell -$ curl http://127.0.0.1:9091/apisix/prometheus/metrics +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` -``` -... -# HELP apisix_node_info Info of APISIX node -# TYPE apisix_node_info gauge -apisix_node_info{hostname="desktop-2022q8f-wsl"} 1 -# HELP apisix_stream_connection_total Total number of connections handled per stream route in APISIX +You should see an output similar to the following: + +```text +# HELP apisix_stream_connection_total Total number of connections handled per Stream Route in APISIX # TYPE apisix_stream_connection_total counter apisix_stream_connection_total{route="1"} 1 ``` diff --git a/docs/zh/latest/plugins/prometheus.md b/docs/zh/latest/plugins/prometheus.md index 39720704d289..81f1df224c15 100644 --- a/docs/zh/latest/plugins/prometheus.md +++ b/docs/zh/latest/plugins/prometheus.md @@ -5,7 +5,7 @@ keywords: - API 网关 - Plugin - Prometheus -description: 本文将介绍 API 网关 Apache APISIX 如何通过 prometheus 插件将 metrics 上报到开源的监控软件 Prometheus。 +description: 本文将介绍 prometheus 插件,以及将 APISIX 与 Prometheus 集成以进行指标收集和持续监控。 --- + + + + ## 描述 -`prometheus` 插件以 [Prometheus 文档](https://prometheus.io/docs/instrumenting/exposition_formats/#exposition-formats)规定的格式上报指标到 Prometheus 中。 +`prometheus` 插件提供将 APISIX 与 Prometheus 集成的能力。 + +启用该插件后,APISIX 将开始收集相关指标,例如 API 请求和延迟,并以[基于文本的展示格式](https://prometheus.io/docs/instrumenting/exposition_formats/#exposition-formats)导出到 Prometheus。然后,您可以在 Prometheus 中创建事件监控和警报,以监控 API 网关和 API 的健康状况。 + +## 静态配置 + +默认情况下,已在默认配置文件 [`config.lua`](https://github.com/apache/apisix/blob/master/apisix/cli/config.lua) 中对 `prometheus` 进行预配置。 + +要自定义这些值,请将相应的配置添加到 config.yaml 中。例如: + +```yaml +plugin_attr: + prometheus: # 插件:prometheus 属性 + export_uri: /apisix/prometheus/metrics # 设置 Prometheus 指标端点的 URI。 + metric_prefix: apisix_ # 设置 APISIX 生成的 Prometheus 指标的前缀。 + enable_export_server: true # 启用 Prometheus 导出服务器。 + export_addr: # 设置 Prometheus 导出服务器的地址。 + ip: 127.0.0.1 # 设置 IP。 + port: 9091 # 设置端口。 + # metrics: # 为指标创建额外的标签。 + # http_status: # 这些指标将以 `apisix_` 为前缀。 + # extra_labels: # 设置 http_status 指标的额外标签。 + # - upstream_addr: $upstream_addr + # - status: $upstream_status + # expire: 0 # 指标的过期时间(秒)。 + # 0 表示指标不会过期。 + # http_latency: + # extra_labels: # 设置 http_latency 指标的额外标签。 + # - upstream_addr: $upstream_addr + # expire: 0 # 指标的过期时间(秒)。 + # 0 表示指标不会过期。 + # bandwidth: + # extra_labels: # 设置 bandwidth 指标的额外标签。 + # - upstream_addr: $upstream_addr + # expire: 0 # 指标的过期时间(秒)。 + # 0 表示指标不会过期。 + # default_buckets: # 设置 `http_latency` 指标直方图的默认桶。 + # - 10 + # - 50 + # - 100 + # - 200 + # - 500 + # - 1000 + # - 2000 + # - 5000 + # - 10000 + # - 30000 + # - 60000 + # - 500 +``` + +您可以使用 [Nginx 变量](https://nginx.org/en/docs/http/ngx_http_core_module.html)创建 `extra_labels`。请参见[为指标添加额外标签](#为指标添加额外标签)。 + +重新加载 APISIX 以使更改生效。 ## 属性 | 名称 | 类型 | 必选项 | 默认值 | 描述 | | ------------ | --------| ------ | ------ | ----------------------------------------------------- | -| prefer_name | boolean | 否 | false | 当设置为 `true` 时,将使用路由或服务的 `name` 标识请求所命中的路由或服务,否则使用其 `id`。 | +|`prefer_name` | boolean | 否 | False | 当设置为 `true` 时,则在`prometheus` 指标中导出路由/服务名称而非它们的 `id`。 | -:::note +## 指标 -多个路由或服务可以设置为相同的名称,所以当设置 `prefer_name` 为 `true` 时,请规范路由和服务的命名,否则容易引起误解。 +Prometheus 中有不同类型的指标。要了解它们之间的区别,请参见[指标类型](https://prometheus.io/docs/concepts/metric_types/)。 -::: +以下是 `prometheus` 插件默认导出的指标。有关示例,请参见[获取 APISIX 指标](#获取 APISIX 指标)。请注意,一些指标,例如 `apisix_batch_process_entries`,如果没有数据,将不可见。 -### 如何修改暴露指标的 `export_uri` +| 名称 | 类型 | 描述 | +| ----------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| apisix_bandwidth | counter | APISIX 中每个服务消耗的总流量(字节)。 | +| apisix_etcd_modify_indexes | gauge | APISIX 键的 etcd 修改次数。 | +| apisix_batch_process_entries | gauge | 发送数据时批处理中的剩余条目数,例如使用 `http logger` 和其他日志插件。 | +| apisix_etcd_reachable | gauge | APISIX 是否可以访问 etcd。值为 `1` 表示可达,`0` 表示不可达。 | +| apisix_http_status | counter | 从上游服务返回的 HTTP 状态代码。 | +| apisix_http_requests_total | gauge | 来自客户端的 HTTP 请求数量。 | +| apisix_nginx_http_current_connections | gauge | 当前与客户端的连接数量。 | +| apisix_nginx_metric_errors_total | counter | `nginx-lua-prometheus` 错误的总数。 | +| apisix_http_latency | histogram | HTTP 请求延迟(毫秒)。 | +| apisix_node_info | gauge | APISIX 节点的信息,例如主机名。 | +| apisix_shared_dict_capacity_bytes | gauge | [NGINX 共享字典](https://github.com/openresty/lua-nginx-module#ngxshareddict) 的总容量。 | +| apisix_shared_dict_free_space_bytes | gauge | [NGINX 共享字典](https://github.com/openresty/lua-nginx-module#ngxshareddict) 中剩余的空间。 | +| apisix_upstream_status | gauge | 上游节点的健康检查状态,如果在上游配置了健康检查,则可用。值为 `1` 表示健康,`0` 表示不健康。 | +| apisix_stream_connection_total | counter | 每个 Stream Route 处理的总连接数。 | -你可以在配置文件 `./conf/config.yaml` 的 `plugin_attr` 列表下修改默认的 URI。 +## 标签 -| 名称 | 类型 | 默认值 | 描述 | -| ---------- | ------ | ---------------------------- | --------------------------- | -| export_uri | string | "/apisix/prometheus/metrics" | 暴露 Prometheus 指标的 URI。 | +[标签](https://prometheus.io/docs/practices/naming/#labels) 是指标的属性,用于区分指标。 -配置示例如下: +例如,`apisix_http_status` 指标可以使用 `route` 信息进行标记,以识别 HTTP 状态的来源路由。 -```yaml title="./conf/config.yaml" -plugin_attr: - prometheus: - export_uri: /apisix/metrics -``` +以下是 APISIX 指标的非详尽标签及其描述。 -### 如何修改延迟指标中的 `default_buckets` +### `apisix_http_status` 的标签 -`DEFAULT_BUCKETS` 是 `http_latency` 指标中 bucket 数组的默认值。 +以下标签用于区分 `apisix_http_status` 指标。 -你可以通过修改配置文件中的 `default_buckets` 来重新指定 `DEFAULT_BUCKETS` +| 名称 | 描述 | +| ------ | ---------------------------------------------------------------------------------------------------------------------- | +| code | 上游节点返回的 HTTP 响应代码。 | +| route | HTTP 状态来源的路由 ID,当 `prefer_name` 为 `false`(默认)时,使用路由 ID,当 `prefer_name` 为 `true` 时,使用路由名称。如果请求不匹配任何路由,则默认为空字符串。 | +| matched_uri | 匹配请求的路由 URI。如果请求不匹配任何路由,则默认为空字符串。 | +| matched_host | 匹配请求的路由主机。如果请求不匹配任何路由,或路由未配置主机,则默认为空字符串。 | +| service | HTTP 状态来源的服务 ID,当 `prefer_name` 为 `false`(默认)时,使用服务 ID,当 `prefer_name` 为 `true` 时,使用服务名称。如果匹配的路由不属于任何服务,则默认为路由上配置的主机值。 | +| consumer | 与请求关联的消费者名称。如果请求没有与之关联的消费者,则默认为空字符串。 | +| node | 上游节点的 IP 地址。 | -配置示例如下: +### `apisix_bandwidth` 的标签 -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - default_buckets: - - 15 - - 55 - - 105 - - 205 - - 505 -``` +以下标签用于区分 `apisix_bandwidth` 指标。 -### 如何修改指标的 `expire` +| 名称 | 描述 | +| ------ | ---------------------------------------------------------------------------------------------------------------------- | +| type | 流量类型,`egress` 或 `ingress`。 | +| route | 带宽对应的路由 ID,当 `prefer_name` 为 `false`(默认)时,使用路由 ID,当 `prefer_name` 为 `true` 时,使用路由名称。如果请求不匹配任何路由,则默认为空字符串。 | +| service | 带宽对应的服务 ID,当 `prefer_name` 为 `false`(默认)时,使用服务 ID,当 `prefer_name` 为 `true` 时,使用服务名称。如果匹配的路由不属于任何服务,则默认为路由上配置的主机值。 | +| consumer | 与请求关联的消费者名称。如果请求没有与之关联的消费者,则默认为空字符串。 | +| node | 上游节点的 IP 地址。 | -`expire` 用于设置 `apisix_http_status`、`apisix_bandwidth` 和 `apisix_http_latency` 指标的过期时间(以秒为单位)。当设置为 0 时,指标不会过期。 +### `apisix_http_latency` 的标签 -配置示例如下: +以下标签用于区分 `apisix_http_latency` 指标。 -```yaml title="conf/config.yaml" -plugin_attr: - prometheus: - expire: 86400 -``` +| 名称 | 描述 | +| ------ | ---------------------------------------------------------------------------------------------------------------------- | +| type | 延迟类型。有关详细信息,请参见 [延迟类型](#延迟类型)。 | +| route | 延迟对应的路由 ID,当 `prefer_name` 为 `false`(默认)时,使用路由 ID,当 `prefer_name` 为 `true` 时,使用路由名称。如果请求不匹配任何路由,则默认为空字符串。 | +| service | 延迟对应的服务 ID,当 `prefer_name` 为 `false`(默认)时,使用服务 ID,当 `prefer_name` 为 `true` 时,使用服务名称。如果匹配的路由不属于任何服务,则默认为路由上配置的主机值。 | +| consumer | 与延迟关联的消费者名称。如果请求没有与之关联的消费者,则默认为空字符串。 | +| node | 与延迟关联的上游节点的 IP 地址。 | -## API +#### 延迟类型 -`prometheus` 插件会增加 `/apisix/prometheus/metrics` 接口或者你自定义的 URI 来暴露其指标信息。 +`apisix_http_latency` 可以标记为以下三种类型之一: -这些指标由独立的 Prometheus 服务器地址公开。默认情况下,地址为 `127.0.0.1:9091`。你可以在配置文件(`./conf/config.yaml`)中修改,示例如下: +* `request` 表示从客户端读取第一个字节到最后一个字节发送到客户端之间的时间。 -```yaml title="./conf/config.yaml" -plugin_attr: - prometheus: - export_addr: - ip: ${{INTRANET_IP}} - port: 9092 -``` +* `upstream` 表示等待上游服务响应的时间。 -假设环境变量 `INTRANET_IP` 是 `172.1.1.1`,那么 APISIX 将会在 `172.1.1.1:9092` 上暴露指标。 +* `apisix` 表示 `request` 延迟与 `upstream` 延迟之间的差异。 -如果你仍然想要让指标暴露在数据面的端口(默认:`9080`)上,可参考如下配置: +换句话说,APISIX 延迟不仅归因于 Lua 处理。应理解为: -```yaml title="./conf/config.yaml" -plugin_attr: - prometheus: - enable_export_server: false +```text +APISIX 延迟 + = 下游请求时间 - 上游响应时间 + = 下游流量延迟 + NGINX 延迟 ``` -你可以使用 [public-api](../../../en/latest/plugins/public-api.md) 插件来暴露该 URI。 +### `apisix_upstream_status` 的标签 -:::info IMPORTANT +以下标签用于区分 `apisix_upstream_status` 指标。 -如果 Prometheus 插件收集的指标数量过多,在通过 URI 获取指标时,会占用 CPU 资源来计算指标数据,可能会影响 APISIX 处理正常请求。为解决此问题,APISIX 在 [privileged agent](https://github.com/openresty/lua-resty-core/blob/master/lib/ngx/process.md#enable_privileged_agent) 中暴露 URI 并且计算指标。 -如果使用 public-api 插件暴露该 URI,那么 APISIX 将在普通的 worker 进程中计算指标数据,这仍可能会影响 APISIX 处理正常请求。 +| 名称 | 描述 | +| ------ | ---------------------------------------------------------------------------------------------------------------------- | +| name | 与健康检查配置的上游对应的资源 ID,例如 `/apisix/routes/1` 和 `/apisix/upstreams/1`。 | +| ip | 上游节点的 IP 地址。 | +| port | 节点的端口号。 | -该特性要求 APISIX 运行在 [APISIX-Runtime](../FAQ.md#如何构建-apisix-runtime-环境) 上。 +## 示例 -::: +以下示例演示如何在不同场景中使用 `prometheus` 插件。 -## 启用插件 +### 获取 APISIX 指标 -`prometheus` 插件可以使用空表 `{}` 开启。 +以下示例演示如何从 APISIX 获取指标。 -你可以通过如下命令在指定路由上启用 `prometheus` 插件: +默认的 Prometheus 指标端点和其他与 Prometheus 相关的配置可以在 [静态配置](#静态配置) 中找到。如果您希望自定义这些配置,更新 `config.yaml` 并重新加载 APISIX。 -:::note +如果您在容器化环境中部署 APISIX,并希望外部访问 Prometheus 指标端点,请按如下方式更新配置文件并重新加载 APISIX: -您可以这样从 `config.yaml` 中获取 `admin_key` 并存入环境变量: - -```bash -admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g') +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: + export_addr: + ip: 0.0.0.0 ``` -::: +向 APISIX Prometheus 指标端点发送请求: ```shell -curl http://127.0.0.1:9180/apisix/admin/routes/1 \ --H "X-API-KEY: $admin_key" -X PUT -d ' -{ - "uri": "/hello", - "plugins": { - "prometheus":{} - }, - "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:1980": 1 - } - } -}' +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` - +### 在公共 API 端点上公开 APISIX 指标 -## 提取指标 +以下示例演示如何禁用默认情况下在端口 `9091` 上公开的 Prometheus 导出服务器,并在 APISIX 用于监听其他客户端请求的公共 API 端点上公开 APISIX Prometheus 指标。 -你可以从指定的 URL(默认:`/apisix/prometheus/metrics`)中提取指标数据: +在配置文件中禁用 Prometheus 导出服务器,并重新加载 APISIX 以使更改生效: -``` -curl -i http://127.0.0.1:9091/apisix/prometheus/metrics +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: + enable_export_server: false ``` -你可以将该 URI 地址添加到 Prometheus 中来提取指标数据,配置示例如下: +接下来,使用 [`public-api`](../../../zh/latest/plugins/public-api.md) 插件创建一个路由,并为 APISIX 指标公开一个公共 API 端点: -```yaml title="./prometheus.yml" -scrape_configs: - - job_name: "apisix" - scrape_interval: 15s # 该值会跟 Prometheus QL 中 rate 函数的时间范围有关系,rate 函数中的时间范围应该至少两倍于该值。 - metrics_path: "/apisix/prometheus/metrics" - static_configs: - - targets: ["127.0.0.1:9091"] +```shell +curl "http://127.0.0.1:9180/apisix/admin/routes/prometheus-metrics" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "uri": "/apisix/prometheus/metrics", + "plugins": { + "public-api": {} + } + }' ``` -现在你可以在 Prometheus 控制台中检查状态: - -![checking status on prometheus dashboard](../../../assets/images/plugin/prometheus01.png) +向新指标端点发送请求以进行验证: -![prometheus apisix in-depth metric view](../../../assets/images/plugin/prometheus02.png) +```shell +curl "http://127.0.0.1:9080/apisix/prometheus/metrics" +``` -## 使用 Grafana 绘制指标 +您应该看到类似以下的输出: -`prometheus` 插件导出的指标可以在 Grafana 进行图形化绘制显示。 +```text +# HELP apisix_http_requests_total 自 APISIX 启动以来客户端请求的总数。 +# TYPE apisix_http_requests_total gauge +apisix_http_requests_total 1 +# HELP apisix_nginx_http_current_connections 当前 HTTP 连接数量。 +# TYPE apisix_nginx_http_current_connections gauge +apisix_nginx_http_current_connections{state="accepted"} 1 +apisix_nginx_http_current_connections{state="active"} 1 +apisix_nginx_http_current_connections{state="handled"} 1 +apisix_nginx_http_current_connections{state="reading"} 0 +apisix_nginx_http_current_connections{state="waiting"} 0 +apisix_nginx_http_current_connections{state="writing"} 1 +... +``` -如果需要进行设置,请下载 [APISIX's Grafana dashboard 元数据](https://github.com/apache/apisix/blob/master/docs/assets/other/json/apisix-grafana-dashboard.json) 并导入到 Grafana 中。 +### 监控上游健康状态 -你可以到 [Grafana 官方](https://grafana.com/grafana/dashboards/11719) 下载 `Grafana` 元数据。 +以下示例演示如何监控上游节点的健康状态。 -![Grafana chart-1](../../../assets/images/plugin/grafana-1.png) +使用 `prometheus` 插件创建一个路由,并配置上游的主动健康检查: -![Grafana chart-2](../../../assets/images/plugin/grafana-2.png) +```shell +curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "id": "prometheus-route", + "uri": "/get", + "plugins": { + "prometheus": {} + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "httpbin.org:80": 1, + "127.0.0.1:20001": 1 + }, + "checks": { + "active": { + "timeout": 5, + "http_path": "/status", + "healthy": { + "interval": 2, + "successes": 1 + }, + "unhealthy": { + "interval": 1, + "http_failures": 2 + } + }, + "passive": { + "healthy": { + "http_statuses": [200, 201], + "successes": 3 + }, + "unhealthy": { + "http_statuses": [500], + "http_failures": 3, + "tcp_failures": 3 + } + } + } + } + }' +``` -![Grafana chart-3](../../../assets/images/plugin/grafana-3.png) +向 APISIX Prometheus 指标端点发送请求: -![Grafana chart-4](../../../assets/images/plugin/grafana-4.png) +```shell +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" +``` -## 可用的 HTTP 指标 +您应该看到类似以下的输出: -`prometheus` 插件可以导出以下指标: +```text +# HELP apisix_upstream_status 上游健康检查的状态 +# TYPE apisix_upstream_status gauge +apisix_upstream_status{name="/apisix/routes/1",ip="54.237.103.220",port="80"} 1 +apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="20001"} 0 +``` -- Status codes: 上游服务返回的 HTTP 状态码,可以统计到每个服务或所有服务的响应状态码的次数总和。属性如下所示: +这显示上游节点 `httpbin.org:80` 是健康的,而上游节点 `127.0.0.1:20001` 是不健康的。 - | 名称 | 描述 | - | -------------| ----------------------------------------------------------------------------- | - | code | 上游服务返回的 HTTP 状态码。 | - | route | 与请求匹配的路由的 `route_id`,如果未匹配,则默认为空字符串。 | - | matched_uri | 与请求匹配的路由的 `uri`,如果未匹配,则默认为空字符串。 | - | matched_host | 与请求匹配的路由的 `host`,如果未匹配,则默认为空字符串。 | - | service | 与请求匹配的路由的 `service_id`。当路由缺少 `service_id` 时,则默认为 `$host`。 | - | consumer | 与请求匹配的消费者的 `consumer_name`。如果未匹配,则默认为空字符串。 | - | node | 上游节点 IP 地址。 | +### 为指标添加额外标签 -- Bandwidth: 经过 APISIX 的总带宽(出口带宽和入口带宽),可以统计到每个服务的带宽总和。属性如下所示: +以下示例演示如何为指标添加额外标签,并在标签值中使用 [Nginx 变量](https://nginx.org/en/docs/http/ngx_http_core_module.html)。 - | 名称 | 描述 | - | -------------| ------------- | - | type | 带宽的类型 (`ingress` 或 `egress`)。 | - | route | 与请求匹配的路由的 `route_id`,如果未匹配,则默认为空字符串。 | - | service | 与请求匹配的路由的 `service_id`。当路由缺少 `service_id` 时,则默认为 `$host`。 | - | consumer | 与请求匹配的消费者的 `consumer_name`。如果未匹配,则默认为空字符串。 | - | node | 消费者节点 IP 地址。 | +目前,仅以下指标支持额外标签: -- etcd reachability: APISIX 连接 etcd 的可用性,用 0 和 1 来表示,`1` 表示可用,`0` 表示不可用。 -- Connections: 各种的 NGINX 连接指标,如 `active`(正处理的活动连接数),`reading`(NGINX 读取到客户端的 Header 信息数),writing(NGINX 返回给客户端的 Header 信息数),已建立的连接数。 -- Batch process entries: 批处理未发送数据计数器,当你使用了批处理发送插件,比如:[syslog](./syslog.md), [http-logger](./http-logger.md), [tcp-logger](./tcp-logger.md), [udp-logger](./udp-logger.md), and [zipkin](./zipkin.md),那么你将会在此指标中看到批处理当前尚未发送的数据的数量。 -- Latency: 每个服务的请求用时和 APISIX 处理耗时的直方图。属性如下所示: +* apisix_http_status +* apisix_http_latency +* apisix_bandwidth - | 名称 | 描述 | - | -------------| --------------------------------------------------------------------------------------- | - | type | 该值可以是 `apisix`、`upstream` 和 `request`,分别表示耗时的来源是 APISIX、上游以及两者总和。 | - | route | 与请求匹配的路由的 `route_id`,如果未匹配,则默认为空字符串。 | - | service | 与请求匹配的路由 的 `service_id`。当路由缺少 `service_id` 时,则默认为 `$host`。 | - | consumer | 与请求匹配的消费者的 `consumer_name`。未匹配,则默认为空字符串。 | - | node | 上游节点的 IP 地址。 | +在配置文件中包含以下配置以为指标添加标签,并重新加载 APISIX 以使更改生效: -- Info: 当前 APISIX 节点信息。 -- Shared dict: APISIX 中所有共享内存的容量以及剩余可用空间。 -- `apisix_upstream_status`: 上游健康检查的节点状态,`1` 表示健康,`0` 表示不健康。属性如下所示: +```yaml title="conf/config.yaml" +plugin_attr: + prometheus: # 插件:prometheus + metrics: # 根据 NGINX 变量创建额外标签。 + http_status: + extra_labels: # 设置 `http_status` 指标的额外标签。 + - upstream_addr: $upstream_addr # 添加一个额外的 `upstream_addr` 标签,其值为 NGINX 变量 $upstream_addr。 + - route_name: $route_name # 添加一个额外的 `route_name` 标签,其值为 APISIX 变量 $route_name。 +``` - | 名称 | 描述 | - |--------------|-------------------------------------------------------------------------------------------------------------------------------| - | name | 上游所依附的资源 ID,例如 `/apisix/routes/1`, `/apisix/upstreams/1`. | - | ip | 上游节点的 IP 地址。 | - | port | 上游节点的端口号。 | +请注意,如果您在标签值中定义了一个变量,但它与任何现有的 [APISIX 变量](https://apisix.apache.org/zh/docs/apisix/apisix-variable/) 和 [Nginx 变量](https://nginx.org/en/docs/http/ngx_http_core_module.html) 不对应,则标签值将默认为空字符串。 -以下是 APISIX 的原始的指标数据集: +使用 `prometheus` 插件创建一个路由: ```shell -curl http://127.0.0.1:9091/apisix/prometheus/metrics +curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ + "id": "prometheus-route", + "name": "extra-label", + "plugins": { + "prometheus": {} + }, + "upstream": { + "nodes": { + "httpbin.org:80": 1 + } + } + }' ``` +向路由发送请求以进行验证: + ```shell -# HELP apisix_bandwidth Total bandwidth in bytes consumed per service in Apisix -# TYPE apisix_bandwidth counter -apisix_bandwidth{type="egress",route="",service="",consumer="",node=""} 8417 -apisix_bandwidth{type="egress",route="1",service="",consumer="",node="127.0.0.1"} 1420 -apisix_bandwidth{type="egress",route="2",service="",consumer="",node="127.0.0.1"} 1420 -apisix_bandwidth{type="ingress",route="",service="",consumer="",node=""} 189 -apisix_bandwidth{type="ingress",route="1",service="",consumer="",node="127.0.0.1"} 332 -apisix_bandwidth{type="ingress",route="2",service="",consumer="",node="127.0.0.1"} 332 -# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys -# TYPE apisix_etcd_modify_indexes gauge -apisix_etcd_modify_indexes{key="consumers"} 0 -apisix_etcd_modify_indexes{key="global_rules"} 0 -apisix_etcd_modify_indexes{key="max_modify_index"} 222 -apisix_etcd_modify_indexes{key="prev_index"} 35 -apisix_etcd_modify_indexes{key="protos"} 0 -apisix_etcd_modify_indexes{key="routes"} 222 -apisix_etcd_modify_indexes{key="services"} 0 -apisix_etcd_modify_indexes{key="ssls"} 0 -apisix_etcd_modify_indexes{key="stream_routes"} 0 -apisix_etcd_modify_indexes{key="upstreams"} 0 -apisix_etcd_modify_indexes{key="x_etcd_index"} 223 -# HELP apisix_batch_process_entries batch process remaining entries -# TYPE apisix_batch_process_entries gauge -apisix_batch_process_entries{name="http-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="sls-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="tcp-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="udp-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="sys-logger",route_id="9",server_addr="127.0.0.1"} 1 -apisix_batch_process_entries{name="zipkin_report",route_id="9",server_addr="127.0.0.1"} 1 -# HELP apisix_etcd_reachable Config server etcd reachable from Apisix, 0 is unreachable -# TYPE apisix_etcd_reachable gauge -apisix_etcd_reachable 1 -# HELP apisix_http_status HTTP status codes per service in Apisix -# TYPE apisix_http_status counter -apisix_http_status{code="200",route="1",matched_uri="/hello",matched_host="",service="",consumer="",node="127.0.0.1"} 4 -apisix_http_status{code="200",route="2",matched_uri="/world",matched_host="",service="",consumer="",node="127.0.0.1"} 4 -apisix_http_status{code="404",route="",matched_uri="",matched_host="",service="",consumer="",node=""} 1 -# HELP apisix_http_requests_total The total number of client requests -# TYPE apisix_http_requests_total gauge -apisix_http_requests_total 1191780 -# HELP apisix_nginx_http_current_connections Number of HTTP connections -# TYPE apisix_nginx_http_current_connections gauge -apisix_nginx_http_current_connections{state="accepted"} 11994 -apisix_nginx_http_current_connections{state="active"} 2 -apisix_nginx_http_current_connections{state="handled"} 11994 -apisix_nginx_http_current_connections{state="reading"} 0 -apisix_nginx_http_current_connections{state="waiting"} 1 -apisix_nginx_http_current_connections{state="writing"} 1 -# HELP apisix_nginx_metric_errors_total Number of nginx-lua-prometheus errors -# TYPE apisix_nginx_metric_errors_total counter -apisix_nginx_metric_errors_total 0 -# HELP apisix_http_latency HTTP request latency in milliseconds per service in APISIX -# TYPE apisix_http_latency histogram -apisix_http_latency_bucket{type="apisix",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="apisix",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -apisix_http_latency_bucket{type="upstream",route="1",service="",consumer="",node="127.0.0.1",le="1"} 1 -apisix_http_latency_bucket{type="upstream",route="1",service="",consumer="",node="127.0.0.1",le="2"} 1 -... -# HELP apisix_node_info Info of APISIX node -# TYPE apisix_node_info gauge -apisix_node_info{hostname="APISIX"} 1 -# HELP apisix_shared_dict_capacity_bytes The capacity of each nginx shared DICT since APISIX start -# TYPE apisix_shared_dict_capacity_bytes gauge -apisix_shared_dict_capacity_bytes{name="access-tokens"} 1048576 -apisix_shared_dict_capacity_bytes{name="balancer-ewma"} 10485760 -apisix_shared_dict_capacity_bytes{name="balancer-ewma-last-touched-at"} 10485760 -apisix_shared_dict_capacity_bytes{name="balancer-ewma-locks"} 10485760 -apisix_shared_dict_capacity_bytes{name="discovery"} 1048576 -apisix_shared_dict_capacity_bytes{name="etcd-cluster-health-check"} 10485760 -... -# HELP apisix_shared_dict_free_space_bytes The free space of each nginx shared DICT since APISIX start -# TYPE apisix_shared_dict_free_space_bytes gauge -apisix_shared_dict_free_space_bytes{name="access-tokens"} 1032192 -apisix_shared_dict_free_space_bytes{name="balancer-ewma"} 10412032 -apisix_shared_dict_free_space_bytes{name="balancer-ewma-last-touched-at"} 10412032 -apisix_shared_dict_free_space_bytes{name="balancer-ewma-locks"} 10412032 -apisix_shared_dict_free_space_bytes{name="discovery"} 1032192 -apisix_shared_dict_free_space_bytes{name="etcd-cluster-health-check"} 10412032 -... -# HELP apisix_upstream_status Upstream status from health check -# TYPE apisix_upstream_status gauge -apisix_upstream_status{name="/apisix/routes/1",ip="100.24.156.8",port="80"} 0 -apisix_upstream_status{name="/apisix/routes/1",ip="52.86.68.46",port="80"} 1 +curl -i "http://127.0.0.1:9080/get" ``` -## 删除插件 +您应该看到 `HTTP/1.1 200 OK` 的响应。 -当你需要禁用 `prometheus` 插件时,可以通过以下命令删除相应的 JSON 配置,APISIX 将会自动重新加载相关配置,无需重启服务: +向 APISIX Prometheus 指标端点发送请求: ```shell -curl http://127.0.0.1:9180/apisix/admin/routes/1 -H "X-API-KEY: $admin_key" -X PUT -d ' -{ - "uri": "/hello", - "plugins": {}, - "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:80": 1 - } - } -}' +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` -## 如何启用 TCP/UDP 指标 +您应该看到类似以下的输出: -:::info IMPORTANT - -该功能要求 APISIX 运行在 [APISIX-Runtime](../FAQ.md#如何构建-APISIX-Runtime-环境?) 上。 +```text +# HELP apisix_http_status APISIX 中每个服务的 HTTP 状态代码 +# TYPE apisix_http_status counter +apisix_http_status{code="200",route="1",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220",upstream_addr="54.237.103.220:80",route_name="extra-label"} 1 +``` -::: +### 使用 Prometheus 监控 TCP/UDP 流量 -我们也可以通过 `prometheus` 插件采集 TCP/UDP 指标。 +以下示例演示如何在 APISIX 中收集 TCP/UDP 流量指标。 -首先,确保 `prometheus` 插件已经在你的配置文件(`./conf/config.yaml`)中启用: +在 `config.yaml` 中包含以下配置以启用 Stream proxy 和 `prometheus` 插件。重新加载 APISIX 以使更改生效: ```yaml title="conf/config.yaml" +apisix: + proxy_mode: http&stream # 启用 L4 和 L7 代理 + stream_proxy: # 配置 L4 代理 + tcp: + - 9100 # 设置 TCP 代理监听端口 + udp: + - 9200 # 设置 UDP 代理监听端口 + stream_plugins: - - ... - - prometheus + - prometheus # 为 stream proxy 启用 prometheus ``` -接着你需要在 stream 路由中配置该插件: +使用 `prometheus` 插件创建一个 Stream Route: ```shell -curl http://127.0.0.1:9180/apisix/admin/stream_routes/1 -H "X-API-KEY: $admin_key" -X PUT -d ' -{ +curl "http://127.0.0.1:9180/apisix/admin/stream_routes" -X PUT \ + -H "X-API-KEY: ${admin_key}" \ + -d '{ "plugins": { - "prometheus":{} + "prometheus": {} }, "upstream": { - "type": "roundrobin", - "nodes": { - "127.0.0.1:80": 1 - } + "type": "roundrobin", + "nodes": { + "httpbin.org:80": 1 + } } -}' + }' ``` -## 可用的 TCP/UDP 指标 - -以下是将 APISIX 作为 L4 代理时可用的指标: +向该 Stream Route 发送请求以进行验证: -* Stream Connections: 路由级别的已处理连接数。具有的维度: +```shell +curl -i "http://127.0.0.1:9100" +``` - | 名称 | 描述 | - | ------------- | ---------------------- | - | route | 匹配的 stream 路由 ID。 | -* Connections: 各种的 NGINX 连接指标,如 `active`,`reading`,`writing` 等已建立的连接数。 -* Info: 当前 APISIX 节点信息。 +您应该看到 `HTTP/1.1 200 OK` 的响应。 -以下是 APISIX 指标的示例: +向 APISIX Prometheus 指标端点发送请求: ```shell -curl http://127.0.0.1:9091/apisix/prometheus/metrics +curl "http://127.0.0.1:9091/apisix/prometheus/metrics" ``` -``` -... -# HELP apisix_node_info Info of APISIX node -# TYPE apisix_node_info gauge -apisix_node_info{hostname="desktop-2022q8f-wsl"} 1 -# HELP apisix_stream_connection_total Total number of connections handled per stream route in APISIX +您应该看到类似以下的输出: + +```text +# HELP apisix_stream_connection_total APISIX 中每个 Stream Route 处理的总连接数 # TYPE apisix_stream_connection_total counter apisix_stream_connection_total{route="1"} 1 ```