Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DM-43658] Improve procedure for Kafka version upgrades to avoid outages #34

Merged
merged 3 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 23 additions & 22 deletions docs/developer-guide/broker-migration.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
.. _broker-migration:

#######################################
Kafka broker migration to local storage
#######################################
######################
Kafka broker migration
######################

From time to time, you might need to expand the size of the Kafka storage because your brokers need to handle more data, or you might need to migrate the kafka brokers to a different storage that uses a different storage class.
From time to time, you may need to expand the size of your Kafka storage because your brokers need to handle more data, or you might need to migrate your Kafka brokers to a different storage that uses a new storage class.

In Strimzi, each ``kafkaNodePool`` has its own storage configuration.
The first step for the broker migration is creating a new ``KafkaNodePool`` with the new storage configuration.
Once that's done you can use the Cruise Control tool and the Strimzi ``KafkaRebalance`` resource to move the data from the old brokers to the new brokers.
The first step in the broker migration process is to create a new ``KafkaNodePool`` with the updated storage configuration.
Once you’ve done this, you can use the Cruise Control tool along with the Strimzi ``KafkaRebalance`` resource to transfer data from the old brokers to the new ones.

The procedure is outlined in the `Kafka Node Pools Storage & Scheduling`_ post adapted here to migrate the Kafka brokers originally deployed on the cluster default storage (usually a network attached storage) to local storage.
This guide documents the procedure for migrating Kafka brokers that were originally deployed using the cluster's default storage class to a new storage class.
This procedure is adapted from the `Kafka Node Pools Storage & Scheduling`_ Strimzi blog post.

First make sure to enable Cruise Control in your Sasquatch Phalanx environment.
Look in ``sasquatch/values-<environment>.yaml`` for:
Before you begin the broker migration, ensure that Cruise Control is enabled in your Sasquatch Phalanx environment.
Check your ``sasquatch/values-<environment>.yaml`` file for the following:

.. code:: yaml

strimzi-kafka:
cruiseControl:
enabled: true

Then, specify the storage class for local storage and its size and set ``migration.enabled: true`` to start the migration.
To migrate your Kafka brokers to a new storage class, you need to specify the storage class name and the size, then set ``brokerStorage.migration.enabled: true`` to initiate the migration.

.. code:: yaml

localStorage:
brokerStorage:
storageClassName: zfs--rubin-efd
size: 1.5Ti
enabled: false
Expand All @@ -34,15 +35,15 @@ Then, specify the storage class for local storage and its size and set ``migrati
rebalance: false


This will create a new ``KafkaNodePool`` resource for the brokers on local storage.
This configuration creates a new ``KafkaNodePool`` resource for the brokers using the new storage class.
Sync the new ``KafkaNodePool`` resource in Argo CD.

At this point, the data is still in the old brokers and the new ones are empty.
Now use Cruise Control to move the data by setting ``migration.rebalance: true`` and specifying the IDs of the old brokers, the ones to be removed after the migration.
At this point, your data will still reside on the old brokers, and the new ones will be empty.
To move the data, use Cruise Control by setting ``brokerStorage.migration.rebalance: true`` and specifying the IDs of the old brokersthe ones you plan to remove after the migration.

.. code:: yaml

localStorage:
brokerStorage:
storageClassName: zfs--rubin-efd
size: 1.5Ti
enabled: false
Expand All @@ -54,23 +55,23 @@ Now use Cruise Control to move the data by setting ``migration.rebalance: true``
- 4
- 5

This will create a new ``KafkaRebalance`` resource that needs to be synced in Argo CD.
This action will create a new ``KafkaRebalance`` resource, which you’ll need to sync in Argo CD.

Now, we have to wait until Cruise Control executes the cluster rebalance.
You can check state of the rebalance by looking at the ``KafkaRebalance`` resource:
Next, wait for Cruise Control to execute the cluster rebalance.
You can check the state of the rebalance by inspecting the ``KafkaRebalance`` resource:

.. code:: bash

$ kubectl get kafkarebalances.kafka.strimzi.io -n sasquatch
NAME CLUSTER PENDINGPROPOSAL PROPOSALREADY REBALANCING READY NOTREADY STOPPED
broker-migration sasquatch True

Finally, once the rebalancing state is ready, set ``localStorage.enabled: true`` and ``migration.enabled: false`` and ``migration.rebalance: false``.
Finally, once the rebalancing state is ready, set ``brokerStorage.enabled: true`` and ``brokerStorage.migration.enabled: false`` and ``brokerStorage.migration.rebalance: false``.

Note that the PVCs of the old brokers need to be deleted manually, as they are orphan resources in Sasquatch to prevent on-cascade deletion.
Note that the PVCs of the old brokers need to be deleted manually, as they are orphan resources in Sasquatch.

Also note that Strimzi will assign new broker IDs for the recently created brokers.
Make sure to update the broker IDs whenever they are used, for example, in the Kafka external listener configuration.
Also, keep in mind that Strimzi will assign new broker IDs to the newly created brokers.
Ensure that you update the broker IDs wherever they are used, such as in the Kafka external listener configuration.


.. _Kafka Node Pools Storage & Scheduling: https://strimzi.io/blog/2023/08/28/kafka-node-pools-storage-and-scheduling/
1 change: 1 addition & 0 deletions docs/developer-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ A Sasquatch developer is responsible for maintaining the Sasquatch components an
.. toctree::
:caption: Procedures

strimzi-updates
broker-migration
connectors

Expand Down
45 changes: 45 additions & 0 deletions docs/developer-guide/strimzi-updates.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _strimzi-updates:


################
Strimzi upgrades
################

It is recommended that you perform incremental upgrades of the Strimzi operator as soon as new versions become available.
In Phalanx, dependabot will detect a new version of Strimzi.
Once you merge the dependabot PR into the ``main`` branch, you can sync the Strimzi app in Argo CD.

This operation will upgrade the operator to the latest version and will trigger a Kafka rollout in the namespaces watched by Strimzi.

.. note::

Before upgrading Strimzi, ensure that the `latest version of the operator`_ is compatible with the Kubernetes version running in your cluster.

If the currently deployed Kafka version is not supported by the latest operator, the operator will fail to initiate a Kafka rollout and will display an error.
See :ref:`kafka-upgrades` for instructions on upgrading Kafka.

.. _kafka-upgrades:

Kafka upgrades
==============

Each Strimzi release supports a range of Kafka versions.
It is recommended that you always use the latest version of Kafka that is supported by the operator.

Sasquatch deploys Kafka in KRaft mode.

Upgrading the Kafka brokers and client applications in Sasquatch (Kafka Connect and Mirror Maker 2) involves updating the Kafka ``version`` in ``sasquatch/charts/strimzi-kafka/values.yaml``.

Note that you do not explicitly set the Kafka ``metadataVersion`` in Sasquatch; instead, Strimzi automatically updates it to the current default after you update the Kafka version.

.. note::

When upgrading Kafka from an unsupported version to a supported version, an outage will occur during the upgrade of the second broker.
This happens because, while the first broker will be running the new version supported by the operator, the third broker will still be on an unsupported version.
Since Sasquatch requires a minimum of two in-sync replicas for each Kafka topic, this mismatch causes the outage.

Refer to the `Strimzi documentation`_ for more details.

.. _latest version of the operator: https://strimzi.io/downloads/

.. _Strimzi documentation: https://strimzi.io/docs/operators/in-development/deploying#proc-upgrade-kafka-kraft-str