Skip to content

Commit

Permalink
Merge pull request #47 from lsst-sqre/u/afausti/doc-fixes
Browse files Browse the repository at this point in the history
Doc fixes
  • Loading branch information
afausti authored Nov 21, 2024
2 parents 8127f6f + e4f1aef commit 23fdf30
Showing 1 changed file with 25 additions and 22 deletions.
47 changes: 25 additions & 22 deletions docs/developer-guide/managing-shards.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,25 @@ Managing Shards in an InfluxDB Cluster
######################################

The InfluxDB storage engine organizes data into shards, which are logical partitions of the database that contain a subset of the data.
In the default Sasquatch configuration, the shard duration is 7 days, starting at 00:00:00.000 UTC on Sunday and ending at 23:59:59.999 UTC on the following Saturday.
An InfluxDB Enterprise cluster enables horizontal scaling and high availability by distributing shards across multiple data nodes.
Sasquatch uses the InfluxDB default configuration in which the shard duration is 7 days, starting at 00:00:00.000 UTC on Sunday and ending at 23:59:59.999 UTC on the following Saturday.

An InfluxDB cluster enables horizontal scaling and high availability by distributing shards across multiple data nodes, while the cluster metadata is managed by the meta nodes.

This guide outlines the use of the `influxd-ctl`_ and `influx_inspect`_ tools to manage shards in an InfluxDB cluster.

Listing Shards
==============

The ``influxd-ctl`` tool is available on the InfluxDB Enterprise meta nodes.
The ``influxd-ctl`` tool is available on the InfluxDB meta nodes.
To view all shards and their metadata use the ``influxd-ctl show-shards`` command.

.. code-block:: bash
kubectl -n sasquatch exec -it sasquatch-influxdb-enterprise-meta-0 -- influxd-ctl show-shards
This command displays information about all shards in the cluster, including the shard ID, Database, Retention Policy, Replication status, Start time, End time, and Owners.
This command displays information about all shards in the cluster, including the shard ID, Database, Retention Policy, Replication status, Start timestamp, End timestamp, and Owners.

For example, the following command lists the shard IDs for the EFD database and the corresponding Start and End times:
For example, the following command lists the shard IDs for the EFD database and the corresponding Start and End timestamps:

.. code-block:: bash
Expand All @@ -33,9 +34,9 @@ For example, the following command lists the shard IDs for the EFD database and
In Sasquatch deployments, the InfluxDB cluster has replication factor two, meaning that each shard is replicated on two data nodes.
The data nodes where the shards are stored is reported under the Owners column.

Detailed information about the shards such as state (hot or cold), last modified time and size can be obtained by running the ``influxd-ctl show-shards -v`` command.
Detailed information about the shards such as State (hot or cold), Last Modified Time and Size can be obtained by running the ``influxd-ctl show-shards -v`` command.

The `filesystem layout`_ of a data node is as follows:
The `filesystem layout`_ in a data node is as follows:

.. code-block:: bash
Expand Down Expand Up @@ -65,7 +66,7 @@ The ``influx_inspect verify`` command checks the integrity of the shard data, no
Read more about shard compaction operations in the InfluxDB `storage engine`_ documentation.

The `Anti-Entropy service`_ in InfluxDB Enterprise ensures that the data is consistent across the cluster by comparing the data in the shards on different data nodes.
However, this tool is consumes too much resources and InfluxData recommends turning it off on small clusters like Sasquatch.
However, this tool consumes too much resources and InfluxData recommended turning it off in Sasquatch.

Shard movement
==============
Expand All @@ -81,8 +82,8 @@ Backup and restore

The ``influxd-ctl`` tool provides commands to backup and restore shards.

However, a meta node doesn't have enough space to keep the backup files so we need to download the tool
and bind it to a meta node to perform the backup and restore operations.
A meta node doesn't have enough space to keep the backup files.
To perform backup and restore operations, download the ``influxd-ctl`` tool and bind it to a meta node:

Download the ``influxd-ctl`` tool from the InfluxData website:

Expand All @@ -92,7 +93,7 @@ Download the ``influxd-ctl`` tool from the InfluxData website:
rpm2cpio influxdb-meta-1.11.3_c1.11.3-1.x86_64.rpm | cpio -idmv
To backup a shard, use the ``influxd-ctl backup`` command:
To backup a shard, use the ``influxd-ctl backup``:

.. code-block:: bash
Expand All @@ -103,48 +104,50 @@ To restore a shard, use the ``influxd-ctl restore`` command:

.. code-block:: bash
influxd-ctl -bind <meta pod IP address>:8091 restore -db efd -shard <shard ID> -shard <shard ID> -newshard <new shard ID> /backup-dir
influxd-ctl -bind <meta pod IP address>:8091 restore -db efd -shard <shard ID> -shard <shard ID> -newshard <new shard ID> -newrf 2 /backup-dir
Where ``<shard ID>`` identifies the shard to be restored from the backup and ``<new shard ID>`` identifies the shard in the destination database to restore to.
Where ``<shard ID>`` identifies the shard to be restored from the backup and ``<new shard ID>`` identifies the shard in the destination database to restore to. The ``-newrf 2`` option specifies the replication factor for the restored shard ensuring that it is restored to two data nodes.

.. note::

If you are restoring a shard from the same database, ``<new shard ID>`` is the same as the ``<shard ID>``.

If you are restoring a shard from a different database (e.g. restoring data the Summit EFD database to the USDF EFD database) **shard IDs do not align**, and so ``<new shard ID>`` should reflect the shard ID in the destination database which has **the same same start time** as in the source database.

Hot shards should be truncated using the ``influxd-ctl truncate-shards`` command before backup and restore operations.
.. note::

Hot shards can truncated using the ``influxd-ctl truncate-shards`` command before backup and restore operations.

For a cold shard it is also possible to manually copy the shard TSM files to one of the destination data nodes under the appropriate directory, and then use the ``influxd-ctl copy-shards`` command to copy the shard to the other data node.
For cold shards it is also possible to manually copy the shard TSM files to one of the destination data nodes under the appropriate directory, and then use the ``influxd-ctl copy-shards`` command to copy the shard to the other data node.

This procedure was applied to restore shard 786 at the USDF EFD database, after InfluxData ran an offline compaction of the shard to fix a slow query issue.
In this case the shard restore procedure is as follows:
This procedure was applied to restore shard 786 at the USDF EFD database, after InfluxData ran an offline compaction of that shard to fix a slow query issue.
In this case the shard restore is as follows:

.. code-block:: bash
# List owners of shard 786
kubectl exec -it sasquatch-influxdb-enterprise-meta-0 -n sasquatch -- influxd-ctl show-shards | grep 786
# Remove shard 786 from the one of its owners (data-0):
# Manually remove the TSM and index files from shard 786 in data-0:
kubectl exec -it sasquatch-influxdb-enterprise-data-0 -n sasquatch -- /bin/bash
cd /var/lib/influxdb/data/efd/autogen/
rm -r 786
# Manually copy the fully compacted TSM files and the fields.idx file for shard 786 to data-0 under the appropriate directory
# Manually copy the fully compacted TSM and index files for shard 786 to data-0
kubectl -n sasquatch cp efd/autogen/786/ sasquatch-influxdb-enterprise-data-0:/var/lib/influxdb/data/efd/autogen/
# Remove shard 786 from the other data node (data-1):
# Remove shard 786 data and metadata from data-1 using the influxd-ctl remove-shard command
kubectl exec -it sasquatch-influxdb-enterprise-meta-0 -n sasquatch -- influxd-ctl remove-shard sasquatch-influxdb-enterprise-data-1.sasquatch-influxdb-enterprise-data.sasquatch.svc.cluster.local:8088 786
# Copy shard 786 from data-0 to data-1
kubectl exec -it sasquatch-influxdb-enterprise-meta-0 -n sasquatch -- influxd-ctl copy-shard sasquatch-influxdb-enterprise-data-0.sasquatch-influxdb-enterprise-data.sasquatch.svc.cluster.local:8088 sasquatch-influxdb-enterprise-data-1.sasquatch-influxdb-enterprise-data.sasquatch.svc.cluster.local:8088 786
# Finally restart the InfluxDB Enterprise data statefulset to reload the shards data and rebuild the TSM in-memory indexes.
# Finally restart the InfluxDB data statefulset to reload the shards data and rebuild the TSM in-memory indexes.
 
.. note::

Note the difference between removing the shard files manually and using the ``influxd-ctl remove-shard`` command.
The ``remove-shard`` command removes the shard from the meta node and the data node, while manually removing the shard files only removes the shard from the data node.
The ``remove-shard`` command removes the shard from the meta node and the data node, while manually removing the shard TSM and index files only removes the shard from the data node (the data node is still listed as owner of that shard).


.. _influxd-ctl: https://docs.influxdata.com/enterprise_influxdb/v1/tools/influxd-ctl/
Expand Down

0 comments on commit 23fdf30

Please sign in to comment.