Skip to content

Commit

Permalink
[MM-58584] Update Calls metrics (#7219)
Browse files Browse the repository at this point in the history
* Update Calls metrics

* Update source/configure/calls-deployment.rst

* Add more clarity to metrics endpoints

---------

Co-authored-by: Carrie Warner (Mattermost) <[email protected]>
  • Loading branch information
streamer45 and cwarnermm authored Jun 21, 2024
1 parent 131a5c3 commit 719e5ca
Showing 1 changed file with 62 additions and 11 deletions.
73 changes: 62 additions & 11 deletions source/configure/calls-deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,10 @@ Both the plugin and the external ``rtcd`` service expose some Prometheus metrics
Calls plugin metrics
^^^^^^^^^^^^^^^^^^^^

Metrics for the calls plugin are exposed through the public ``/plugins/com.mattermost.calls/metrics`` API endpoint.
Metrics for the calls plugin are exposed through the ``/plugins/com.mattermost.calls/metrics`` subpath under the existing Mattermost server metrics endpoint. This is controlled by the :ref:`Listen address for performance <configure/performance-monitoring-configuration-settings:listen address for performance>` configuration setting. It defaults to port ``8067``.

.. note::
On Mattermost versions prior to v9.5, plugin metrics were exposed through the public ``/plugins/com.mattermost.calls/metrics`` API endpoint controlled by the :ref:`Web server listen address <configure/environment-configuration-settings:web server listen address>` configuration setting. This defaults to port ``8065``.

**Process**

Expand Down Expand Up @@ -209,19 +212,52 @@ Metrics for the calls plugin are exposed through the public ``/plugins/com.matte

- ``mattermost_plugin_calls_rtc_sessions_total``: Total number of active RTC sessions.

**Application**

- ``mattermost_plugin_calls_app_handlers_time_bucket``: Time taken to execute app handlers.

- ``mattermost_plugin_calls_app_handlers_time_sum``

- ``mattermost_plugin_calls_app_handlers_time_count``

**Database**

- ``mattermost_plugin_calls_store_ops_total``: Total number of db store operations.
- ``mattermost_plugin_calls_store_methods_time_bucket``: Time taken to execute store methods.

- ``mattermost_plugin_calls_store_methods_time_sum``

- ``mattermost_plugin_calls_store_methods_time_count``
- ``mattermost_plugin_calls_cluster_mutex_grab_time_bucket``: Time taken to grab global mutexes.

- ``mattermost_plugin_calls_cluster_mutex_grab_time_sum``

- ``mattermost_plugin_calls_cluster_mutex_grab_time_count``
- ``mattermost_plugin_calls_cluster_mutex_locked_time_bucket``: Time spent locked in global mutexes.

- ``mattermost_plugin_calls_cluster_mutex_locked_time_sum``

- ``mattermost_plugin_calls_cluster_mutex_locked_time_count``

**WebSocket**

- ``mattermost_plugin_calls_websocket_connections_total``: Total number of active WebSocket connections.
- ``mattermost_plugin_calls_websocket_events_total``: Total number of WebSocket events.

**Jobs**

- ``mattermost_plugin_calls_jobs_live_captions_new_audio_len_ms_bucket``: Duration (in ms) of new audio transcribed for live captions.

- ``mattermost_plugin_calls_jobs_live_captions_new_audio_len_ms_sum``

- ``mattermost_plugin_calls_jobs_live_captions_new_audio_len_ms_count``
- ``mattermost_plugin_calls_jobs_live_captions_pktPayloadCh_buf_full``: Total packets of audio data dropped due to full channel.
- ``mattermost_plugin_calls_jobs_live_captions_window_dropped``: Total windows of audio data dropped due to pressure on the transcriber.

WebRTC service metrics
^^^^^^^^^^^^^^^^^^^^^^

Metrics for the ``rtcd`` service are exposed through the ``/metrics`` API endpoint.
Metrics for the ``rtcd`` service are exposed through the ``/metrics`` API endpoint under the ``rtcd`` API listener controlled by the ``api.http.listen_address`` configuration setting. It defaults to port ``8045``.

**Process**

Expand All @@ -236,24 +272,39 @@ Metrics for the ``rtcd`` service are exposed through the ``/metrics`` API endpoi
- ``rtcd_rtc_conn_states_total``: Total number of RTC connection state changes.
- ``rtcd_rtc_errors_total``: Total number of RTC errors.
- ``rtcd_rtc_rtp_bytes_total``: Total number of sent/received RTP packets in bytes.

- Note: removed as of v0.10.0

- ``rtcd_rtc_rtp_packets_total``: Total number of sent/received RTP packets.

- Note: removed as of v0.10.0

- ``rtcd_rtc_rtp_tracks_total``: Total number of incoming/outgoing RTP tracks.
- ``rtcd_rtc_sessions_total``: Total number of active RTC sessions.
- ``rtcd_rtc_rtp_tracks_writes_time_bucket``: Time taken to write to outgoing RTP tracks.

- Note: added as of v0.10.0
- ``rtcd_rtc_rtp_tracks_writes_time_sum``

- ``rtcd_rtc_sessions_total``: Total number of active RTC sessions.
- ``rtcd_rtc_rtp_tracks_writes_time_count``

**WebSocket**

- ``rtcd_ws_connections_total``: Total number of active WebSocket connections.
- ``rtcd_ws_messages_total``: Total number of received/sent WebSocket messages.

Configuration
^^^^^^^^^^^^^

A sample Prometheus configuration to scrape both plugin and ``rtcd`` metrics could look like this:

.. code::
scrape_configs:
- job_name: node
static_configs:
- targets: ['rtcd-0:9100','rtcd-1:9100', 'calls-offloader-1:9100', 'calls-offloader-2:9100']
- job_name: calls
metrics_path: /plugins/com.mattermost.calls/metrics
static_configs:
- targets: ['app-0:8067','app-1:8067','app-2:8067']
- job_name: rtcd
static_configs:
- targets: ['rtcd-0:8045', 'rtcd-1:8045']
System tunings
~~~~~~~~~~~~~~

Expand Down Expand Up @@ -472,4 +523,4 @@ On the server side, run the following:
sudo tcpdump -n port 8443
This command will output information (i.e. source and destination addresses) for all the network packets being sent or received through port ``8443``. This is a good way to check whether data is getting in and out of the instance and can be used to quickly identify network configuration issues.
This command will output information (i.e. source and destination addresses) for all the network packets being sent or received through port ``8443``. This is a good way to check whether data is getting in and out of the instance and can be used to quickly identify network configuration issues.

0 comments on commit 719e5ca

Please sign in to comment.