Skip to content

Commit

Permalink
docs(ds): Add a chapter about metrics
Browse files Browse the repository at this point in the history
  • Loading branch information
ieQu1 committed May 16, 2024
1 parent c3df5fd commit ba75ef1
Show file tree
Hide file tree
Showing 4 changed files with 51 additions and 2 deletions.
1 change: 1 addition & 0 deletions dir.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,7 @@
path: durability/durability_introduction
children:
- durability/managing-replication
- durability/metrics
- title_en: Multi-Protocol Gateway
title_cn: 多协议网关
path: gateway/gateway
Expand Down
7 changes: 5 additions & 2 deletions en_US/durability/durability_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ State of durable sessions, as well as messages routed to the durable sessions ar
This session implementation is disabled by default. It can be enabled by setting `session_persistence.enable` configuration parameter to `true`.

The session persistence feature ensures robust durability and high availability by consistently replicating session metadata and MQTT messages sent to the durable sessions across multiple nodes within an EMQX cluster.
The configurable *replication factor* determines the number of replicas for each message or session, enabling users to customize the balance between durability and performance to meet their specific requirements.
The configurable [replication factor](./managing-replication.md#replication-factor) determines the number of replicas for each message or session, enabling users to customize the balance between durability and performance to meet their specific requirements.

This implementation has the following advantages:
- Durable sessions can be resumed after restarting or stopping of EMQX nodes.
Expand Down Expand Up @@ -90,7 +90,7 @@ Storage encapsulates all data of a certain type, such as MQTT messages or MQTT s

#### Shard

At this level, messages are segregated by client, and stored in distinct shards based on the publisher's client ID. The number of shards is determined by `durable_storage.messages.n_shards` configuration parameter during the initial startup of EMQX.
At this level, messages are segregated by client, and stored in distinct shards based on the publisher's client ID. The number of shards is determined by [n_shards](./managing-replication.md#number-of-shards) configuration parameter during the initial startup of EMQX.

A shard is also a unit of replication, and EMQX ensures that each shard is consistently replicated `durable_storage.messages.replication_factor` times across different nodes in the cluster so that each shard replica contains the same set of messages in the same order.

Expand All @@ -100,6 +100,9 @@ Messages within a shard are further segmented into generations corresponding to

Different generations can organize the data differently, according to the *storage layout* specification. Currently, only one layout is supported, optimized for managing the high throughput of wildcard subscriptions spanning a large number of topics and single-topic subscriptions. Future updates will introduce additional layouts to optimize for the different types of workloads, such as prioritizing low latency over high throughput for certain applications.

Storage layout used for the new generations is configured by `durable_storage.messages.layout` parameter.
Each layout engine can define its own set of configuration parameters, depending on its type.

#### Stream

Messages in each shard and generation are split into streams. Streams serve as units of message serialization in EMQX. Streams can contain messages from multiple topics. Various storage layouts can employ different strategies for mapping topics into streams.
Expand Down
44 changes: 44 additions & 0 deletions en_US/durability/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Metrics

This document explains Prometheus metrics relevant to the durable sessions.

## `emqx_ds_egress_batches`

This counter is increased every time when a batch of messages is successfully written to the durable storage.

## `emqx_ds_egress_messages`

This metric counts messages successfully written to the durable storage.

## `emqx_ds_egress_bytes`

This metric counts total volume of payload data successfully written to the durable storage.
Note: this counter only takes message payloads into consideration, so the actual volume of data written to the durable storage may be larger.

## `emqx_ds_egress_batches_failed`

This counter is incremented every time when writing data to the durable storage fails for any reason.

## `emqx_ds_egress_flush_time`

This is a rolling average of time spent writing batches to the durable storage.
It's a key indicator of the replication speed.

## `emqx_ds_store_batch_time`

This is a rolling average of time spent writing batches to the local RocksDB storage.
Unlike `emqx_ds_egress_flush_time`, it does not include network replication costs, so it's the key indicator of the disk IO efficiency.

## `emqx_ds_builtin_next_time`

This is a rolling average of time spent consuming a batch of messages from the durable storage.

## `emqx_ds_storage_bitfield_lts_counter_seek` and `emqx_ds_storage_bitfield_lts_counter_next`

These counters are specific to the "wildcard optimized" storage layout.
They measure the efficiency of consuming data from the local storage.

Wildcard optimized layout uses two primitives for looking up data from RocksDB: one that searches for a key (seek), and one that simply jumps to the next key (next).
`seek` primitive is generally slower, so ideally rate of growth of `emqx_ds_storage_bitfield_lts_counter_next` counter must be much greater than the rate of growth of `seek` counter.

Increasing `durable_storage.messages.layout.epoch_bits` parameter can help to increase this ratio.
1 change: 1 addition & 0 deletions zh_CN/durability/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# TODO

0 comments on commit ba75ef1

Please sign in to comment.