-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #32 from lsst-sqre/tickets/DM-45604
[DM-45604] Add procedures and troubleshooting section into Sasquatch documentation
- Loading branch information
Showing
11 changed files
with
266 additions
and
179 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
.. _architecture: | ||
|
||
##################### | ||
Architecture Overview | ||
##################### | ||
|
||
|
||
.. figure:: /_static/sasquatch_architecture_single.png | ||
:name: Sasquatch architecture overviewpng | ||
|
||
Kafka | ||
----- | ||
|
||
In Sasquatch, `Kafka`_ is used as a message queue to InfluxDB and for data replication between Sasquatch :ref:`environments`. | ||
|
||
Kafka is managed by `Strimzi`_. | ||
In addition to the Strimzi components, Sasquatch uses the Confluent Schema Registry and the Confluent Kafka REST proxy to connect HTTP-based clients with Kafka. | ||
|
||
.. _Kafka: https://kafka.apache.org | ||
.. _Strimzi: https://strimzi.io | ||
|
||
Kafka Connect | ||
------------- | ||
|
||
In Sasquatch, Kafka connectors are managed by the `kafka-connect-manager`_ tool. | ||
|
||
The InfluxDB Sink connector consumes Kafka topics, converts the records to the InfluxDB line protocol, and writes them to an InfluxDB database. | ||
Sasquatch :ref:`namespaces` map to InfluxDB databases. | ||
|
||
The MirrorMaker 2 source connector is used for data replication. | ||
|
||
|
||
InfluxDB Enterprise | ||
------------------- | ||
|
||
InfluxDB is a `time series database`_ optimized for efficient storage and analysis of time series data. | ||
|
||
InfluxDB organizes the data in measurements, fields, and tags. | ||
In Sasquatch, Kafka topics (telemetry topics and metrics) map to InfluxDB measurements. | ||
|
||
InfluxDB provides an SQL-like query language called `InfluxQL`_ and a more powerful data scripting language called `Flux`_. | ||
Both languages can be used in Chronograf for data exploration and visualization. | ||
|
||
Read more about the Sasquatch architecture in `SQR-068`_. | ||
|
||
.. _kafka-connect-manager: https://kafka-connect-manager.lsst.io/ | ||
.. _time series database: https://www.influxdata.com/time-series-database/ | ||
.. _InfluxQL: https://docs.influxdata.com/influxdb/v1.8/query_language/ | ||
.. _Flux: https://docs.influxdata.com/influxdb/v1.8/flux/ | ||
.. _SQR-068: https://sqr-068.lsst.io | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
.. _broker-migration: | ||
|
||
####################################### | ||
Kafka broker migration to local storage | ||
####################################### | ||
|
||
From time to time, you might need to expand the size of the Kafka storage because your brokers need to handle more data, or you might need to migrate the kafka brokers to a different storage that uses a different storage class. | ||
|
||
In Strimzi, each ``kafkaNodePool`` has its own storage configuration. | ||
The first step for the broker migration is creating a new ``KafkaNodePool`` with the new storage configuration. | ||
Once that's done you can use the Cruise Control tool and the Strimzi ``KafkaRebalance`` resource to move the data from the old brokers to the new brokers. | ||
|
||
The procedure is outlined in the `Kafka Node Pools Storage & Scheduling`_ post adapted here to migrate the Kafka brokers originally deployed on the cluster default storage (usually a network attached storage) to local storage. | ||
|
||
First make sure to enable Cruise Control in your Sasquatch Phalanx environment. | ||
Look in ``sasquatch/values-<environment>.yaml`` for: | ||
|
||
.. code:: yaml | ||
strimzi-kafka: | ||
cruiseControl: | ||
enabled: true | ||
Then, specify the storage class for local storage and its size and set ``migration.enabled: true`` to start the migration. | ||
|
||
.. code:: yaml | ||
localStorage: | ||
storageClassName: zfs--rubin-efd | ||
size: 1.5Ti | ||
enabled: false | ||
migration: | ||
enabled: true | ||
rebalance: false | ||
This will create a new ``KafkaNodePool`` resource for the brokers on local storage. | ||
Sync the new ``KafkaNodePool`` resource in Argo CD. | ||
|
||
At this point, the data is still in the old brokers and the new ones are empty. | ||
Now use Cruise Control to move the data by setting ``migration.rebalance: true`` and specifying the IDs of the old brokers, the ones to be removed after the migration. | ||
|
||
.. code:: yaml | ||
localStorage: | ||
storageClassName: zfs--rubin-efd | ||
size: 1.5Ti | ||
enabled: false | ||
migration: | ||
enabled: true | ||
rebalance: true | ||
brokers: | ||
- 3 | ||
- 4 | ||
- 5 | ||
This will create a new ``KafkaRebalance`` resource that needs to be synced in Argo CD. | ||
|
||
Now, we have to wait until Cruise Control executes the cluster rebalance. | ||
You can check state of the rebalance by looking at the ``KafkaRebalance`` resource: | ||
|
||
.. code:: bash | ||
$ kubectl get kafkarebalances.kafka.strimzi.io -n sasquatch | ||
NAME CLUSTER PENDINGPROPOSAL PROPOSALREADY REBALANCING READY NOTREADY STOPPED | ||
broker-migration sasquatch True | ||
Finally, once the rebalancing state is ready, set ``localStorage.enabled: true`` and ``migration.enabled: false`` and ``migration.rebalance: false``. | ||
|
||
Note that the PVCs of the old brokers need to be deleted manually, as they are orphan resources in Sasquatch to prevent on-cascade deletion. | ||
|
||
Also note that Strimzi will assign new broker IDs for the recently created brokers. | ||
Make sure to update the broker IDs whenever they are used, for example, in the Kafka external listener configuration. | ||
|
||
|
||
.. _Kafka Node Pools Storage & Scheduling: https://strimzi.io/blog/2023/08/28/kafka-node-pools-storage-and-scheduling/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
.. _connectors: | ||
|
||
###################################### | ||
Configuring an InfluxDB Sink connector | ||
###################################### | ||
|
||
An InfluxDB Sink connector consumes data from Kafka and writes to InfluxDB. | ||
Sasquatch uses the Telegraf `Kafka consumer input`_ and the `InfluxDB v1 output`_ plugins for that. | ||
|
||
The connector configuration is specified per Sasquatch environment in ``sasquatch/values-<environment>.yaml``. | ||
|
||
Here's what the connector configuration for writing data from the ``lsst.example.skyFluxMetric`` kafka topic to InfluxDB looks like: | ||
|
||
.. code:: yaml | ||
telegraf-kafka-consumer: | ||
enabled: true | ||
kafkaConsumers: | ||
example: | ||
enabled: true | ||
topicRegexps: | | ||
[ "lsst.example" ] | ||
database: "lsst.example" | ||
timestamp_field: "timestamp" | ||
timestamp_format: "unix_ms" | ||
tags: | | ||
[ "band", "instrument" ] | ||
replicaCount: 1 | ||
Selecting Kafka topics | ||
====================== | ||
|
||
``kafkaConsumers.example.topicRegexps`` is a list of regular expressions used to specify the Kafka topics consumed by this connector, and ``KafkaConsumers.example.database`` is the name of the InfluxDB v1 database to write to. | ||
In this example, all Kafka topics prefixed by ``lsst.example`` are recorded in the ``lsst.example`` database in InfluxDB. | ||
|
||
.. note:: | ||
|
||
If the database doesn't exist in InfluxDB it is automatically create by Telegraf. | ||
Telegraf also records internal metrics from its input and output plugins in the same database. | ||
|
||
Timestamp | ||
========= | ||
|
||
InfluxDB, being a time-series database, requires a timestamp to index the data. | ||
The name of the field that contains the timestamp value and the timestamp format are specified by the ``kafkaConsumers.example.timestamp_field`` and | ||
``kafkaConsumers.timestamp_format`` keys. | ||
|
||
Tags | ||
==== | ||
|
||
InfluxDB tags provide additional context when querying data. | ||
|
||
From the ``lsst.example.skyFluxMetric`` metric example: | ||
|
||
.. code:: json | ||
{ | ||
"timestamp": 1681248783000000, | ||
"band": "y", | ||
"instrument": "LSSTCam-imSim", | ||
"meanSky": -213.75839364883444, | ||
"stdevSky": 2328.906118708811, | ||
} | ||
``band`` and ``instrument`` are good candidates for tags, while ``meanSky`` and ``stdevSky`` are the fields associated to the ``lsst.example.skyFluxMetric`` metric. | ||
Tags are specified in the ``kafkaConsumers.example.tags`` list which is the superset of the tags from all the Kafka topics consumed by this connector. | ||
|
||
In InfluxDB tags are indexed, you can use tags to efficiently aggregate and filter data in different ways. | ||
For example, you might query the ``lsst.example.skyFluxMetric`` metric and group the results by ``band``, or you might filter the data to only return values for a specific band or instrument. | ||
|
||
.. note:: | ||
|
||
In InfluxDB tags values are always strings. | ||
Use an empty string when a tag value is missing. | ||
Avoid tagging high cardinality fields such as IDs. | ||
|
||
See `InfluxDB schema design and data layout`_ for more insights on how to design tags. | ||
|
||
See the `telegraf-kafka-consumer subchart`_ for additional configuration options. | ||
|
||
.. _InfluxDB v1 output: https://github.com/influxdata/telegraf/blob/master/plugins/outputs/influxdb/README.md | ||
.. _Kafka consumer input: https://github.com/influxdata/telegraf/blob/master/plugins/inputs/kafka_consumer/README.md | ||
.. _InfluxDB schema design and data layout: https://docs.influxdata.com/influxdb/v1/concepts/schema_and_data_layout | ||
.. _telegraf-kafka-consumer subchart: https://github.com/lsst-sqre/phalanx/tree/main/applications/sasquatch/charts/telegraf-kafka-consumer/README.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
.. _schema-registry-ssl: | ||
|
||
###################################################################### | ||
Schema Registry Pod cannot start because of an invalid SSL certificate | ||
###################################################################### | ||
|
||
**Symptoms:** | ||
Sasquatch Schema Registry pod cannot start and ends up in ``CrashLoopBackOff`` state. | ||
Kafka brokers show an ``org.apache.kafka.common.errors.SslAuthenticationException``. | ||
|
||
**Cause:** | ||
The Schema Registry Operator cannot recreate its JKS secret when Strimzi rotates the cluster certificates. | ||
|
||
**Solution:** | ||
Use this procedure in Argo CD to force Schema Registry Operator to create the JKS secret: | ||
|
||
- Delete the ``strimzischemaregistry`` resource called ``sasquatch-schema-registry`` | ||
- Restart the deployment resource called ``strimzi-registry-operator`` | ||
- Re-sync the ``strimzischemaregistry`` resource called ``sasquatch-schema-registry`` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.