An Ansible role which will download, build and install SnapRAID on Debian 12 (other releases are untested).
This role supports defining multiple SnapRAID arrays, each with their own
configuration file, and the syncing/scrubbing process for them can be automated
via the included snapraid_sync
script. It is cron
that is used to
trigger the script on a configurable schedule, and it will send you notification
emails when syncs are successful, threshold levels for deleted/updated files are
exceeded or something else goes wrong.
This repository makes use of a submodule, which is just pointer to another
repository, and it needs to be initialized and downloaded as well before this
role will work. Fortunately it is possible to do this in just a single command,
so move into your roles/
folder and run the following:
git clone --recursive [email protected]:JonasAlfredsson/ansible-role-snapraid.git snapraid
If you would like to download any updates for this role in the future, you may use the following command from within the previously cloned folder:
git pull --recurse-submodules
When the configuration is complete you may then just include this role in your main playbook like this:
- hosts: all
name: Install SnapRAID and push out configuration files
roles:
- snapraid
Since the SnapRAID arrays are often unique to each individual host, I usually
prefer to define these individual configurations in their respective
host_vars/{{ ansible_hostname }}
path. However, if you have multiple
identical machines there should not be any problem to define all of this in
one of the group_vars/
files.
An important thing to remember is that Ansible will overwrite, and not merge,
these kinds of hashes/dictionaries if there are two with the same name. You can
therefore not have a part of this be defined in the group_vars/
and then other
parts in the host_vars/
. If you do not like this behavior you may look into
setting hash_behaviour = merge
, but be aware that this is not a very
good solution. Instead you should probably look into the combine
filter or the merge_vars
action plugin.
There are sort of two parts to this configuration; first SnapRAID itself and
its arrays, and then it is variables related to the snapraid_sync
script.
The second one is not necessary if you don't want to, but it will automate the
syncing/scrubbing if configured.
The following examples have all the available variables included, and all the
default values written out. So any field which is not marked with # Required
may be left out of your configuration if you are fine with the defaults.
List of available SnapRAID versions/tags may be found here. Example:
snapraid_version: "11.3"
snapraid_version: # Required
snapraid_tmp_dir: "/tmp"
snapraid_arrays:
- name: # Required and may only be [a-zA-Z_-] (limit from cron file naming).
conf_dir: # Required
exclude_hidden_items: false
exclude_items:
- "*.unrecoverable"
- "/tmp/"
- "/lost+found/"
blocksize: 256
hashsize: 16
autosave: 500
parity_drives:
- mount: # Required
content_file: false
data_drives:
- mount: # Required
name: # Required and must be unique (space not allowed).
content_file: true
snapraid_sync: [] # See next section for more details
Jump directly to the
snapraid_sync
section.
As can be seen the snapraid_arrays
variable is a list, so it is possible to
expand it to as many arrays that you want. You should only make sure that they
have unique names.
The parity_drives
variable is also a list, and you will need at minimum one
parity drive defined for this role to function (with a maximum of 6 supported
by SnapRAID). The parity mounts must NOT be in a data disk.
Example:
parity_drives:
- mount: "/mnt/parity1"
- mount: "/mnt/parity2"
There are no limits (that I know of) for how many data disks you may have. These are also defined as a list, and it is important that all have unique names since these are used as identifiers by SnapRAID. The name and mount point association of the data disks is relevant for parity, so do not change them afterwards.
Example:
data_drives:
- mount: "/mnt/data1"
name: "D1"
- mount: "/mnt/data2"
name: "D2"
You must also have at least one content file for each parity file plus one.
These content files can be in the disks used for data, parity or boot, but each
file must be in a different disk. The first and primary content file is created
inside the conf_dir
along with the .conf
file for this array.
By default it is also configured so that each data disk includes a content file
located at {{ mount }}/.snapraid_{{ name }}.content
. This has the added
benefit of making total amount of available space on the data disk
slightly less than the full disk amount. This is a recommended thing to do,
because the parity file will be slightly larger than the amount of synced
data, and this content file is excluded from the sync, so it creates a
natural buffer to hinder the parity disk from being overfilled.
Then there are the remaining variables and their short explanations:
exclude_items
- List of files and directories to exclude.- Remember that all the paths are relative at the mount points.
exclude_hidden_items
- Hidden items will be ignored during 'syncs'.- In Unix systems this is usually files beginning with a period.
autosave
- Number of gigabytes to process before saving the state.- This option is useful to avoid having to restart from scratch if a long
'sync' is interrupted (
0
to disable).
- This option is useful to avoid having to restart from scratch if a long
'sync' is interrupted (
blocksize
- The block size in kibi bytes (1024 bytes).- WARNING: Changing this value is for experts only!
hashsize
- The hash size in bytes.- WARNING: Changing this value is for experts only!
This is a script used for automating the syncing and scrubbing process, so manual intervention will only be necessary when the number of deleted/updated files exceed your defined thresholds. A detailed explanation of this "manual intervention" can be found in its repository, along with more information about the inner workings of this script, but there is also some extra info at the bottom of this guide.
Anyway, the automatic syncing is defined on a per-array basis (first mentioned
in the previous section), and the *_schedule
variables are then
normal cron
expressions.
snapraid_arrays:
- name: # Defined in previous section.
config: # Defined in previous section.
...
snapraid_sync:
- sync_schedule: "05 9,22 * * 2-7" # Example -> sync at 09:05 and 22:05 every day except monday.
scrub_schedule: "00 13 * * mon" # Example -> scrub at 13:00 on mondays.
delete_threshold: 0
update_threshold: -1
scrub_percent: 8
scrub_age: 10
attach_log: "false"
If you want to be notified by email, on successful syncs or errors, you should
define the snapraid_sync_email_address
variable. However, in order to be able
to receive emails over the open internet you will need an account on a trusted
provider and configure the mutt
email client to use that account
(details here). As of now there is only support for automatically
configuring Gmail accounts in the muttrc
file, but if you have such an
account the following variables are available:
snapraid_sync_email_address: ""
snapraid_sync_email_subject_prefix: "SnapRAID on $(hostname) - "
snapraid_mutt:
realname: "User Name"
email: "[email protected]"
password: "supersecret"
It is also necessary to handle all the log output it creates. To not have it
fill a single file with a million lines after a while, we will use
logrotate
to only keep a limited amount of old log files. There are
only three options you will need to be aware of, and these are their default
values:
snapraid_sync_log_dir: "/var/log/snapraid_sync"
snapraid_sync_logrotate_interval: "daily"
snapraid_sync_logrotate_count: 7
Below are a couple of other variables related to this role, and their default values. You probably don't need to edit these.
snapraid_sync_script_dir: "/root/snapraid_sync"
snapraid_muttrc_path: "/root/.muttrc"
snapraid_sync_mail_bin: "/usr/bin/mutt"
In the snapraid_sync
repository there are more details regarding the
thoughts behind "manual intervention", but if you have multiple arrays it might
be annoying to always define all the environment variables every time. This role
will therefore create "entrypoints" for each array that you define.
These "entrypoints" are nothing more than small bash scripts, with all your
array specific variables set, which then call upon the original
snapraid_sync.sh
script. With this it should therefore be possible for you
to run an array specific "force sync" like this:
sudo /{{ snapraid_sync_script_dir }}/snapraid_sync_entrypoint-{{ snapraid_sync.name }}.sh force
e.g.
sudo /root/snapraid_sync/snapraid_sync_entrypoint-array1.sh force