Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose WAL format versioning and change strategy. #40

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Nov 26, 2024

Fixes prometheus/prometheus#15200

Also join #prometheus-wal-dev on Slack for the sync discussion!

We recommend the [Two-Fold Migration Strategy](#two-fold-migration-strategy) with one variation: a configuration flag that tells Prometheus to write a Y+1 version if the user desires.


In other words, we propose to add a TSDB **integer** flag `--storage.tsdb.write-wal-format` that tells Prometheus to use a particular WAL format for both WAL and WBL. This kind of flag will change its default to a new version ONLY when the previous Prometheus version can read that version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more specific about the version here. Prometheus defines LTS versions, so this should read

Suggested change
In other words, we propose to add a TSDB **integer** flag `--storage.tsdb.write-wal-format` that tells Prometheus to use a particular WAL format for both WAL and WBL. This kind of flag will change its default to a new version ONLY when the previous Prometheus version can read that version.
In other words, we propose to add a TSDB **integer** flag `--storage.tsdb.write-wal-format` that tells Prometheus to use a particular WAL format for both WAL and WBL. This kind of flag will change its default to a new version ONLY when the previous LTS Prometheus version can read that version.

This would mean 2 years life time of a WAL version (but of course it may mean a jump of more than one version). So let's suppose we implement this flag and the default is 1 in version 3.1 LTS (or something) . To set the default to 2 (or more) we need an LTS release a year later that can read version 2, but defaults to 1. Another year later we could set default 2.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I later read the alternatives - I think it makes sense to instill a sense of stability

Copy link
Member Author

@bwplotka bwplotka Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. Copy-pasta. I really wanted to NOT do LTS for now, but initially I thought that's a good idea. This should be in alternatives. WDYT? EDIT: Actually, no you just propose an idea that's discussed in the alternatives.

This would mean 2 years life time of a WAL version (but of course it may mean a jump of more than one version). So let's suppose we implement this flag and the default is 1 in version 3.1 LTS (or something) . To set the default to 2 (or more) we need an LTS release a year later that can read version 2, but defaults to 1. Another year later we could set default 2.

I'm not sure it makes sense to bind to a yearly LTS (is it once a year?) or generally in some time window for default switch. It's just silly if we have a WAL version that will be extensively tested and better - is it sensible to wait a year to switch defaults? It's a bit heavy process. However, we can consider it.

I added more details on why LTS is not ideal option in alternatives, can you check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could backport the ability to read the new WAL, to the current LTS.

bwplotka and others added 2 commits December 2, 2024 10:31

We propose the addition of a `meta.json` file in the wal directory, similar to [block meta.json](https://github.com/prometheus/prometheus/blob/5e124cf4f2b9467e4ae1c679840005e727efd599/tsdb/block.go#L171), with Version field set to `1` for the current format and `2` for new changes e.g. when we start to write [new records](https://github.com/prometheus/prometheus/pull/15467/files). No `meta.json` is equivalent to `{"version":1} `meta.json` file.

We propose to also store the new WALs in separate directories e.g. `wal.v2`. Thanks to that the rewrite from one version to another is **eventual** and can be done segment by segment without any forcible rewrite. The WAL will be rewritten within the next 2h of a normal operation. The additional advantage of this way of versioning is that it's clear when your WAL fully migrated to a certain version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention checkpoints here.

* We already suffer from replay problems, so I propose an eventual rewrite.
* Rewrite is more risking than read-only WAL (of previous version)

### Don't version WAL, don't introduce a flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Don't version WAL, don't introduce a flag
### Alternative: Don't version WAL, don't introduce a flag

I suggest to put "Alternative" in each of these sub-headings, because it isn't clear once you've scrolled down a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E_TOO_MANY_ALTERNATIVES


## Alternatives

### Require LTS support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an extension than an alternative.

### Record-based or segment based WAL versioning

Given we usually change WAL by changing its records, the WAL version could be simply [max number of types](https://github.com/prometheus/prometheus/blob/5e124cf4f2b9467e4ae1c679840005e727efd599/tsdb/record/record.go#L54) we write to WAL.
Alternatively, it could be per segment e.g. introduce a special version record type that is only in the front of the segment file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention checkpoints here.

* changes that merge records
* sharding?

### Maintain two WALs (well four, with WBL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have separate WAL and WBL ?

We recommend the [Two-Fold Migration Strategy](#two-fold-migration-strategy) with two details:

* A new flag that tells Prometheus what WAL version to write.
* There can be multiple "forward compatible" version, but the official minimum is one (see, the rejected [LTS idea](#require-lts-support))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who rejected it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose it to reject in this proposal. If we decide opposite, will switch. Sorry for confusion, I assume I can't use past lang here?

Cons:
* Extremely heavy process that will make us afraid/refuse to make improvements to WAL, because it's too much work. It might fails our goal of `Balancing development velocity with user data stability risks`
* One mitigation would be an LTS retroactive strategy e.g. LTS 3.1 only supports WALv1, 3.3 adds WALv2, we do 3.1.1 with WALv2 too, 3.4 can now switch to WALv2, 4.0 can remove WALv1 support. It gives us more flexibility, but it's not very realistic to do patch release of LTS with risky feature like a new WAL.
* We literally have no formal process for LTS versions and we don't do them regularly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* We literally have no formal process for LTS versions and we don't do them regularly.
* We have no formal process for LTS versions and we don't do them regularly.

Comment on lines +160 to +164
1. LTS 3.1 only supports WAL v1.
2. 3.3 adds WAL v2.
3. we wait unit next LTS so e.g. 3.24.
4. 3.25 can now switch to WAL v2.
5. 4.0 can remove WAL v1 support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should write down what the upgrade path looks like if this approach is not chosen, in the main section of this document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's written in the Two-Fold strategy section, but I can use the same versions here.


We propose to add a string `--storage.tsdb.stateful.write-wal-version` flag, with the default to `v1` that has a "stateful" consequence -- once new version is used, users will be able to revert only to certain Prometheus versions. Help of this flag will explain clearly what's possible and what Prometheus version you will be able to revert to.

In other words, we propose to add a TSDB flag `--storage.tsdb.stateful.write-wal-version=<version>` that tells Prometheus to use a particular WAL format for both WAL and WBL. This kind of flag will change its default to a new version ONLY when (at least) one previous Prometheus version can read that version (while writing the old one). The initial version would be `v1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think one version (6 weeks) is enough to say "v2 is good enough, turn it on for everyone".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would love to keep the gate open if the change will be trivial (like literally we have this case now with nhcb IMO, but we can argue) (:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some sort of distinction in the policy for when the flag's default value can be changed for cases where the WAL change only impacts an experimental feature (such as NHCB)?

![twofold.png](../assets/2024-11-25_changing_wal_format/twofold.png)

1. We release Prometheus X+1 version that supports both Y and Y+1 data but still writes Y.
2. We release Prometheus X+2 version that supports both Y and Y+1 data, but now it writes new data as Y.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. We release Prometheus X+2 version that supports both Y and Y+1 data, but now it writes new data as Y.

For clarity: does this mean that the new record type (Y+1) will be re-named to Y?

* Gives a bit more stability to users and less surprises.

Cons:
* Extremely heavy process that will make us afraid/refuse to make improvements to WAL, because it's too much work. It might fails our goal of `Balancing development velocity with user data stability risks`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extremely heavy process that will make us afraid/refuse to make improvements to WAL, because it's too much work. It might fails our goal of `Balancing development velocity with user data stability risks`
* Extremely heavy process that will make us afraid/refuse to make improvements to WAL, because it's too much work. It might fail our goal of `Balancing development velocity with user data stability risks`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WAL/WBL: Make iterating on format schema easier; consider versioning & forward compatibility
4 participants