Skip to content

Commit

Permalink
docs: how to resync backups (#390)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexgarel authored Sep 4, 2024
1 parent 9c75150 commit 6513fb2
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 0 deletions.
43 changes: 43 additions & 0 deletions docs/how-to-resync-zfs-replication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# How to resync ZFS replication

It happens that some time, syncoid fails for too long to replicate a dataset,
and it become out of sync.

In this case, we normally get an error email thanks to `sanoid_check` script.

## Checking state

On the backup host and on the source host,
use `zfs list -t snap /path/to/dataset|tail` to check the status.

Search if the oldest snapshot on backup side is still available on the source host:

```bash
zfs list <dataset-name-on-backup>@<snapshot-name-on-backup-side>
```

If it's not the case, you have to search for the nearest common snapshot. It would probably be a "_daily" one or a "_monthly" one.
On the backup host, `zfs list -t snap /path/to/dataset|grep "_daily"|tail` can help you, check then availability on source side (same for `_monthly`).

When you have found the nearest common snapshot, you can resync.

## Resyncing

On the backup host, rewind to the common snapshot:
```bash
zfs rollback <dataset-name-on-backup>@<common-snapshot-name>
```

If you have an existing vzdump snapshot `zfs list -t snap <dataset-name-on-backup>|grep @vzdump`, this is some snapshot used by Proxmox during a backup, but it might not be the same as the one on source side (it's constantly recreated), and will get in your way.
So you have to remove it on backup side, `zfs destroy <dataset-name-on-backup>@vzdump`.
See also [Dealing with vzdump snapshots](./sanoid.md#dealing-with-vzdump-snapshots)

You can then, either wait for next sync to catch up, or launch the sync manually
using syncoid (in this case, you have to craft the command by looking at syncoid-args.conf, but beware of not using --recursive option, and using the dataset name on source side and target side).

## Some other resolutions

The source dataset might have been removed (because a container / VM was removed).
In this case decide if you want to keep the data.
If not, you can destroy it.
In the other case, you have to consider moving the dataset to another location, to avoid messing with current / future datasets on the host.
14 changes: 14 additions & 0 deletions docs/sanoid.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,18 @@ There are generally two kind of templates:

We then have different retention strategies based on the type of data.

### Dealing with vzdump snapshots

vzdump snapshots are created by Proxmox during backups.
It always have the same name but is created and destroyed each time.

This can come in the way of syncoid:
If the ZFS dataset is synchronized while vzdump snapshot is present,
then on next sync it may fail, because vzdump snapshot will be a different snapshot on the source. Blocking the sync and requiring human intervention (see [How to resync ZFS replication](./how-to-resync-zfs-replication)).

To prevent this, we have a script (`sanoid_post_remove_vzdump.sh`) that remove vzdump snapshots on the destination (backup side) after running sanoid (post_snapshot_script). It is configured in "synced" templates in sanoid.conf,
with `post_snapshot_script = /opt/openfoodfacts-infrastructure/scripts/zfs/sanoid_post_remove_vzdump.sh`

## sanoid checks

We have a timer/service sanoid_check that checks that we have recent snapshots for datasets.
Expand All @@ -36,6 +48,8 @@ For example:
# no_sanoid_checks:rpool/obf-old:rpool/opf-old:
```

In case of problem, see [How to resync ZFS replication](./how-to-resync-zfs-replication)


## syncoid service and configuration

Expand Down

0 comments on commit 6513fb2

Please sign in to comment.