diff --git a/source/configure/database-configuration-settings.rst b/source/configure/database-configuration-settings.rst index c1e54aac881..459940fe114 100644 --- a/source/configure/database-configuration-settings.rst +++ b/source/configure/database-configuration-settings.rst @@ -376,7 +376,7 @@ Replica lag settings | String array input consists of: | | | | | | - ``DataSource``: The database credentials to connect | | -| to the replica instance. | | +| to the database instance. | | | - ``QueryAbsoluteLag``: A plain SQL query that must | | | return a single row. The first column must be the | | | node value of the Prometheus metric, and the second | | @@ -388,27 +388,111 @@ Replica lag settings | column must be the value of the lag used to measure | | | the time lag. | | +--------------------------------------------------------+----------------------------------------------------------------------------------+ -| Examples: | +| **Notes**: | | | -| For AWS Aurora instances, ``QueryAbsoluteLag`` can be: | -| | -| .. code-block:: sql | -| | -| select server_id, highest_lsn_rcvd-durable_lsn as bindiff from aurora_global_db_instance_status() where server_id=<> | -| | -| And for AWS Aurora instances, ``QueryTimeLag`` can be: | -| | -| .. code-block:: sql | -| | -| select server_id, visibility_lag_in_msec from aurora_global_db_instance_status() where server_id=<> | -| | -| For MySQL Group Replication, the absolute lag can be measured from the number of pending transactions in the applier queue: | -| | -| .. code-block:: sql | -| | -| select member_id, count_transactions_remote_in_applier_queue FROM performance_schema.replication_group_member_stats where member_id=<> | +| - The ``QueryAbsoluteLag`` and ``QueryTimeLag`` queries must return a single row. | +| - To properly monitor this you must setup `performance monitoring `__ for Mattermost. | +--------------------------------------------------------+----------------------------------------------------------------------------------+ +1. Configure the replica lag metric based on your database type. See the following tabs for details on configuring this for each database type. + + .. tabs:: + + .. tab:: AWS Aurora + + Add the configuration highlighted below to your ``SqlSettings.ReplicaLagSettings`` array. You only need to add this once because replication statistics for AWS Aurora nodes are visible across all server instances that are members of the cluster. Be sure to change the ``DataSource`` to point to a single node in the group. + + For more information on Aurora replication stats, see the `AWS Aurora documentaion `__. + + **Example:** + + .. code-block:: json + :emphasize-lines: 4,5,6,7,8 + + { + "SqlSettings": { + "ReplicaLagSettings": [ + { + "DataSource": "replica-1", + "QueryAbsoluteLag": "select server_id, highest_lsn_rcvd-durable_lsn as bindiff from aurora_global_db_instance_status() where server_id=<>", + "QueryTimeLag": "select server_id, visibility_lag_in_msec from aurora_global_db_instance_status() where server_id=<>" + } + ] + } + } + + .. tab:: MySQL Group Replication + + Add the configuration highlighted below to your ``SqlSettings.ReplicaLagSettings`` array. You only need to add this once because replication statistics for all nodes are shared across all server instances that are members of the MySQL replication group. Be sure to change the ``DataSource`` to point to a single node in the group. + + For more information on group replication stats, see the `MySQL documentation `__. + + **Example:** + + .. code-block:: json + :emphasize-lines: 4,5,6,7,8 + + { + "SqlSettings": { + "ReplicaLagSettings": [ + { + "DataSource": "replica-1", + "QueryAbsoluteLag": "select member_id, count_transactions_remote_in_applier_queue FROM performance_schema.replication_group_member_stats where member_id=<>", + "QueryTimeLag": "" + } + ] + } + } + + .. tab:: PostgreSQL replication slots + + 1. Add the configuration highlighted below to your ``SqlSettings.ReplicaLagSettings`` array. This query should run against the **primary** node in your cluster, to do this change the ``DataSource`` to match the `SqlSettings.DataSource <#data-source>`__ setting you have configured. + + For more information on pg_stat_replication, see the `PostgreSQL documentation `__. + + **Example:** + + .. code-block:: json + :emphasize-lines: 4,5,6,7,8 + + { + "SqlSettings": { + "ReplicaLagSettings": [ + { + "DataSource": "postgres://mmuser:password@localhost:5432/mattermost_test?sslmode=disable&connect_timeout=10.", + "QueryAbsoluteLag": "select usename, pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) as metric from pg_stat_replication;", + "QueryTimeLag": "" + } + ] + } + } + + 2. Grant permissions to the database user for ``pg_monitor``. This user should be the same user configured above in the ``DataSource`` string. + + For more information on roles, see the `PostgreSQL documentation `__. + + .. code-block:: bash + + sudo -u postgres psql + postgres=# GRANT pg_monitor TO mmuser; + + +2. Save the config and restart all Mattermost nodes. +3. Navigate to your Grafana instance monitoring Mattermost and open the `Mattermost Performance Monitoring v2 `__ dashboard. +4. The ``QueryTimeLag`` chart is already setup for you utilizing the existing ``Replica Lag`` chart. If using ``QueryAbsoluteLag`` metric clone the ``Replica Lag`` chart and edit the query to use the below absolute lag metrics and modify the title to be ``Replica Lag Absolute``. + + .. code-block:: none + + mattermost_db_replica_lag_abs{instance=~"$server"} + + .. image:: ../../source/images/database-configuration-settings-replica-lag-grafana-1.jpg + :alt: A screenshot showing how to clone a chart within Grafana + + + .. image:: ../../source/images/database-configuration-settings-replica-lag-grafana-2.jpg + :alt: A screenshot showing the specific edits to make to the cloned grafana chart. + + .. config:setting:: database-replicamonitorintervalseconds :displayname: Replica monitor interval (Database) :systemconsole: N/A diff --git a/source/images/database-configuration-settings-replica-lag-grafana-1.jpg b/source/images/database-configuration-settings-replica-lag-grafana-1.jpg new file mode 100644 index 00000000000..7d67d125f0b Binary files /dev/null and b/source/images/database-configuration-settings-replica-lag-grafana-1.jpg differ diff --git a/source/images/database-configuration-settings-replica-lag-grafana-2.jpg b/source/images/database-configuration-settings-replica-lag-grafana-2.jpg new file mode 100644 index 00000000000..129c85002b1 Binary files /dev/null and b/source/images/database-configuration-settings-replica-lag-grafana-2.jpg differ