Skip to content

Commit

Permalink
Blog post for metadata rework
Browse files Browse the repository at this point in the history
  • Loading branch information
jbaublitz committed Aug 5, 2024
1 parent 7edfc0a commit fb8127c
Showing 1 changed file with 79 additions and 0 deletions.
79 changes: 79 additions & 0 deletions content/metadata-rework.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
+++
title = "On-disk metadata format and storage stack modifications"
date = 2024-07-25
weight = 39
template = "page.html"
render = true
+++

*John Baublitz, Stratis Team*

Until now, Stratis has had a metadata format that, with a few exceptions, works
remarkably well for our current feature set. However, as we plan for more features
like RAID, integrity, and online reencryption, we've decided that we require some
reworks to the order of our devicemapper devices in the storage stack and the addition
of space reservation ahead of time for the metadata of new devicemapper layers. The
metadata rework has been a large effort to pave the way for Stratis moving forward and
will provide the ability for additional flexibility in enabling and disabling certain layers
after pool creation time once the features are added.

### Support for V1

V1 will continued to be supported as-is. Features that no longer work on V1 of the metadata
will be considered bugs, and we intend to continue releasing bug fixes for functionality that
breaks, even if it is only in V1. However, creating new V1 pools is disabled, and users will
only be able to create V2 pools in the next version of stratisd.

### Changes to encryption

The modifications to how we handle encryption are the single largest difference
between V1 and V2 of the metadata. The crypt device now sits above the Stratis metadata
and where RAID and integrity will be in the stack.

Let's dive into the practical applications of this.

#### Switch from full disk encryption

V1 of the metadata provided full disk encryption, but V2 will not. Data, cache, and
thinpool/filesystem metadata will all still be encrypted, but integrity metadata, RAID
metadata, and the Stratis metadata, which is generally just a record of the devicemapper
tables for the Stratis pool, will not be encrypted.

We discussed this at length with the cryptsetup maintainers and the security team and the
general consensus is that leaving this data unencrypted will not result in leaking any
information about the encrypted data stored above it in the stack.

#### Higher position in the stack

The switch of encryption's position in the stack also has a surprising number of benefits.

Firstly, the operation time on encrypted pools no longer scales linearly according to the
number of devices. Because Stratis can add additional devices to a single pool, FDE required
encrypting each new device with its own crypt device which resulted in having to pass
through PBKDF once for each device. This increased execution times for operations like unlock
based on how many devices were in the pool. For V2 of the metadata, operations that require PBKDF
execute in constant time.

Secondly, the encryption layer will be on top of the caching layer. This simplifies the cache handling
and requires no special handling for encrypting cached data.

Thirdly, this will avoid the problem we would have run into with RAID of encryption amplification.
Because previously each leg would have its own crypt device, the CPU load of encryption would
have been multiplied by the number of legs in the RAID array. This would result in reduced throughput
due to too much CPU load encrypting each leg separately.

### Addition of metadata space reservation

In V2 of the metadata layout, we've begun reserving space for the crypt header, integrity metadata and md-raid
metadata. The end result of this is that in future versions, we will increasingly be able to toggle certain features
on and off after pool creation time. Work has already begun on supporting this for encryption, and we intend to
do this at least for integrity as well. We are still discussing the best way to support this for RAID given
certain restrictions in the case of converting from a multi-disk pool to a RAID array.

### Backports

Moving forward, we intend to port all features that are compatible with V1 of the metadata layout back to V1, but
for features that cannot be backported like turning encryption on and off due to the differences outlined here,
we recommend that users migrate data to a V2 pool to take advantage of these upcoming features.

Please let us know what you think of the new metadata layout in a Github issue!

0 comments on commit fb8127c

Please sign in to comment.