Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DF-45522: Docs for metrics events configuration #41

Merged
merged 3 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions docs/user-guide/app-metrics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
===================
Application metrics
===================

Applications can use Sasquatch infrastructure to publish metrics events to `InfluxDB`_ via `Kafka`_.
Setting certain Sasquatch values in Phalanx will create Kafka user and topic, and configure a Telegraf consumer to put messages from that topic into the ``telegraf-kafka-app-metrics-consumer`` database in the Sasquatch InfluxDB instance.

The messages are expected to be in :ref:`Avro <avro>` format, and schemas are expected to be in the `Schema Registry`_ for any messages that are encoded with a schema ID.

.. _Schema Registry: https://docs.confluent.io/platform/current/schema-registry/
.. _Safir: https://safir.lsst.io


Configuration
=============

Apps that want to publish metrics events need to:

* Set ``app-metrics.enabled`` to ``true`` in every Sasquatch **environment** values files where app metrics should be enabled
* Add the app name to the ``app-metrics.apps`` list in the Sasquatch **environment** values file
* Add an entry to`globalAppConfig` dict in the **app-metrics** ``values.yaml`` file in Phalanx.
fajpunk marked this conversation as resolved.
Show resolved Hide resolved

This entry should be structured like this:

.. code-block:: yaml
:caption: applications/sasquatch/charts/app-metrics/values.yaml

globalAppConfig:
# App name
some-app: # App name
# An array of events keys that will be tags (vs. fields) in InfluxDB
influxTags: [ "foo", "bar" ]
some-other-app:
influxTags: [ "foo", "bar", "baz" ]

This will:

* Provision a Kafka topic to which the app can publish events
* Provision a Kafka user with access to publish messages to that topic
* Update the ``sasquatch-telegraf-app-metrics`` `Telegraf`_ instance to:

* Start consuming from the ``lsst.square.metrics.events.<your app's name>`` Kafka topic.
* Push metrics to InfluxDB with the all of the ``influxTags`` keys in all events ending up as `tags`_, and all other keys ending up as `fields`_.

Then in your app, you can :ref:`connect to kafka<directconnection>` and publish events manually, or if you have a `Safir`_ app, you can use the Safir metrics helpers to streamline this integration.

.. _Telegraf: https://www.influxdata.com/time-series-platform/telegraf/
.. _InfluxDB: https://docs.influxdata.com/enterprise_influxdb/v1/
.. _Kafka: https://strimzi.io/
fajpunk marked this conversation as resolved.
Show resolved Hide resolved
.. _tags: https://docs.influxdata.com/influxdb/v1/concepts/glossary/#tag
.. _fields: https://docs.influxdata.com/influxdb/v1/concepts/glossary/#field

InfluxDB tags vs. fields
========================

.. hint::

If the value is likely to be used in a "WHERE" clause in queries, and if it has fewer than 10,000 possible values, it should be a tag.

Any value in an event that is not in the ``influxTags`` list be be field in InfluxDB.
Tags are indexed, which means you can use them as filters efficiently in InfluxDB queries.

It can be difficult to decide what should be a tag and what should be a field, but here are some guidelines:

* If it's a value that will be aggregated and graphed over time, like the duration of a query, then it should be a field, because you'll never be filtering on it.
* If it's metadata like which app generated the event, then it should be a tag.

One thing to keep in mind is that tags shouldn't be "high-cardinality", meaning you shouldn't specify something with a lot of different values as a tag, becuase it could greatly increase the memory usage of the InfluxDB instance, and having a lot of high-cardinality tags could decrease query performance across the board.
fajpunk marked this conversation as resolved.
Show resolved Hide resolved

How many values for a key makes it high-cardinality?
There's not a lot of concrete advice on that, and it depends a lot on the composition of the entire dataset, and the cardinality of other tags, so let's say **10,000** for now. This means if you have a username on an event, it can be a tag.
2 changes: 2 additions & 0 deletions docs/user-guide/directconnection.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _directconnection:

############################
Connecting directly to Kafka
############################
Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ User guide
Kafka REST Proxy <restproxy>
Connect directly to Kafka <directconnection>
Kafdrop <kafdrop>
Application Metrics <app-metrics>
fajpunk marked this conversation as resolved.
Show resolved Hide resolved