Skip to content

Latest commit

 

History

History
658 lines (519 loc) · 28.6 KB

rclone_jobber_tutorial.org

File metadata and controls

658 lines (519 loc) · 28.6 KB

About this rclone_jobber backup script tutorial

This tutorial shows you how to setup and test the rclone_jobber backup script. The final product will perform local and remote backups automatically.

The tutorial is written for Linux and most of the content is also useful for macOS and Windows 10 wsl.

Both rclone and rclone_jobber.sh are command line tools. This tutorial assumes that the reader has basic command-line skills.

The example jobs scripts make backups for a home PC. You can adapt rclone_jobber.sh and the example job scripts to suit your own backup needs.

Backup trivia: the verb form to “back up” is two words, whereas the noun is “backup”.

Disclaimer: Some of the scripts used in this tutorial delete or overwrite data. Unless you know exactly what you are doing, please work this tutorial on a spare computer that doesn’t contain data you don’t want to lose. This tutorial and associated scripts are distributed without any warranty. I am not responsible for any lost data.

Table of Contents

Install rclone

Install the latest rclone from https://rclone.org/downloads/

Install rclone_jobber

Download or clone the rclone_jobber repository.

The “examples” directory contains all the scripts used by this tutorial.

Setup environment path variables

Most tutorials use <value> notation to indicate substitution. This tutorial uses environment path variables to automate the substitutions. Thus, all the examples in this tutorial can run on your system without you having to edit the scripts.

The example scripts use the following environment path variables:

  • HOME
  • rclone_jobber
  • usb
  • remote (explained in “Configure a remote” section)

The above “rclone_jobber” path is the location of the rclone_jobber repository you downloaded.

Tutorial examples use the “usb” path variable to demonstrate backup to local mass storage. The “usb” path doesn’t actually have to be a USB drive.

If you’re on Linux, add these lines to your ~/.profile (or ~/.bash_profile) file, but with your system paths:

export rclone_jobber="/home/wolfv/rclone_jobber"
export usb="/media/wolfv/USB_device_name"

Reload your .profile:

$ source ~/.profile

Now Linux shell should substitute $rclone_jobber and $usb.

Setup a test_data directory

This tutorial’s example scripts backup a small test directory.

The setup_test_data_directory.sh script will setup the small source directory. It recursively deletes the ~/test_rclone_data directory and rebuilds a fresh copy. To setup a test directory from the command line:

$ $rclone_jobber/examples/setup_test_data_directory.sh

Take rclone_jobber.sh for a test drive

Once you have the path variables and test_data directories setup, you can take rclone_jobber for a test drive.

Here is a minimal backup-job script for rclone_jobber:

#!/usr/bin/env sh

source="$HOME/test_rclone_data"
dest="$usb/test_rclone_backup"

$rclone_jobber/rclone_jobber.sh "$source" "$dest"

That last line calls rclone_jobber.sh with arguments source and dest.

Open the examples/job_backup_to_USB_minimal.sh in your favorite text editor. Set options to --dry-run:

options="--dry-run"

Run the backup job:

$ $rclone_jobber/examples/job_backup_to_USB_minimal.sh

The backup, $dest, was not created because –dry-run.

Important: A bad backup job can cause data loss. First test with the --dry-run flag to see exactly what would be copied and deleted.

Here are some more things you can try with rclone_jobber:

  1. Open rclone_jobber.log (rclone_jobber.log is in the same directory as rclone_jobber.sh).
  2. Run the backup job again, this time without --dry-run.
  3. Inspect changes in the destination files.
  4. Change some files in source:
    • delete a file
    • edit a file
    • add a file
    • move a file

    And run the backup job again.

Backup job and rclone_jobber.sh parameters

Backup job script

Each backup job contains arguments and a call to rclone_jobber.sh. Here is an example backup job with all the rclone_jobber.sh arguments defined:

#!/usr/bin/env sh

source="$HOME/test_rclone_data"
dest="$usb/test_rclone_backup"
move_old_files_to="dated_files"
options="--filter-from=$rclone_jobber/examples/filter_rules --checksum --dry-run"
monitoring_URL="https://monitor.io/12345678-1234-1234-1234-1234567890ab"

$rclone_jobber/rclone_jobber.sh "$source" "$dest" "$move_old_files_to" "$options" "$(basename $0)" "$monitoring_URL"

The last line calls rclone_jobber.sh. source and dest are required, the remaining arguments can be empty string "" or undefined.

The next sections describe rclone_jobber.sh parameters:

  1. source
  2. dest
  3. move_old_files_to
  4. options
  5. job_name
  6. monitoring_URL

1) source

source is the directory to back up.

Example source argument:

source="/home/wolfv"

2) dest

dest is the directory to back up to. Data is backed up to destination=$dest/last_snapshot.

Example dest argument for local file system data storage:

dest="/media/wolfv/USB/wolfv_backup"

Example dest for remote data storage:

dest="onedrive_wolfv_backup_crypt:"

If source or dest paths contain a space

If your path contains a space, then you must use extra quotes.

For Linux / OSX source argument:

source="'/home/wolf v'"

For Windows source argument:

source='"/home/wolf v"'

Details at https://rclone.org/docs/#quoting-and-the-shell.

3) move_old_files_to

When a file is changed or deleted, the old version already in backup is either moved or removed. The move_old_files_to parameter specifies what happens to the old files.

move_old_files_to="dated_directory"

Argument to move deleted or changed files to a dated directory:

move_old_files_to="dated_directory" 

Old files are moved to the dated directory in their original hierarchy. This makes it easy to restore a deleted sub-directory. Also convenient to manually delete a directory from a previous year.

backup
├── archive                       <<<<<<<< archive contains dated directories
│   └── 2018
│       ├── 2018-02-22_14:00:14
│       │   └── direc1
│       │       └── f1
│       └── 2018-02-22_15:00:14   <<<<<<<< old files were moved here on dated_directory's date
│           └── direc1
│               └── f1            <<<<<<<< old version of file f1
└── last_snapshot                 <<<<<<<< last_snapshot directory contains the most recent backup
    └── direc1
        └── f1

move_old_files_to="dated_files"

Argument to move old files to old_files directory, and append move date to file names:

move_old_files_to="dated_files"

Old files are moved to the old_files directory in their original hierarchy. This makes it easy to browse a file’s history, and restore a particular version of a file.

backup
├── last_snapshot                   <<<<<<<< last_snapshot directory contains the most recent backup
│   └── direc1
│       └── f1
└── old_files                       <<<<<<<< old_files directory contains old dated_files
    └── direc1
        ├── f1_2018-02-22_14:00:14
        └── f1_2018-02-22_15:00:14  <<<<<<<<< old version of file f1 moved here on appended date

move_old_files_to=""

Argument to remove old files from backup:

move_old_files_to=""

Only the most recent version of each file remains in the backup. This can save a little storage space. Useful for making an extra backup before OS upgrade or OS clean install.

backup
└── last_snapshot         <<<<<<<< last_snapshot directory contains the most recent backup
    └── direc1
        └── f1            <<<<<<<< old versions of file f1 were overwritten or removed

4) options

The options argument can contain any number of rclone options. You can put any rclone options in the options argument, except for these three:

--backup-dir
--suffix
--log-file

Those options are set in rclone_jobber.sh.

Example options argument containing four rclone options:

options="--filter-from=filter_rules --checksum --log-level=INFO --dry-run"

Rclone options used in this tutorial are:

5) job_name

The job_name argument specifies the job’s file name:

job_name="$(basename $0)"

The shell command “$(basename $0)” will fill in the job’s file name for you.

rclone_jobber.sh guards against job_name running again before the previous run is finished.

rclone_jobber.sh prints job_name in warnings and log entries.

6) monitoring_URL

The monitoring_URL argument specifies a ping URL for a cron-monitoring service.

monitoring_URL is optional. This is redundant if the remote data-storage provider offers an integrated monitoring service.

Example monitoring_URL:

monitoring_URL="https://monitor.io/12345678-1234-1234-1234-1234567890ab"

Every time rclone_jobber.sh completes a job without error, it pings the monitoring_URL. If the cron monitoring service hasn’t been pinged within a set amount of time, then it sends you an email alert. Many cron monitoring services offer free plans.

No two jobs should share the same monitoring_URL.

Filter rules

Rclone has a sophisticated set of filter rules. Filter rules tell rclone which files to include or exclude.

Open the examples/filter_rules_excld file. Each rule starts with a “+ ” or “- “, followed by a pattern.

  • a leading “+” means include if the pattern matches
  • a leading “-” means exclude if the pattern matches

For each file in source, filter rules are processed in the order that they are defined. If the matcher fails to find a match after testing all the filter rules, then the path is included. Read the examples/filter_rules_excld file to see how this works.

Lines starting with ‘#’ are comments. Comment at the end of a rule is not supported because file names can contain a ‘#’.

The rclone_jobber options argument specifies the filter_rules_excld file like this:

options="--filter-from filter_rules_excld"

To see the example filter_rules_excld file in action, run:

$ $rclone_jobber/examples/clear_USB_test_backup.sh
$ $rclone_jobber/examples/job_backup_to_USB_excld.sh

Select a cloud storage provider

Rclone uses cloud storage providers to backup data to an off-site storage system. Off-site storage systems are safe from local disaster.

All rclone cloud-storage providers are listed on https://rclone.org/. Some of the cloud-storage-providers’ features are listed in two tables on https://rclone.org/overview/. Most cloud-storage providers offer small storage capacities for free. Pick one. You can always try other cloud-storage providers after you finish this tutorial.

Configure a remote

Once you have an account with your chosen cloud-storage provider, the next step is to configure its remote.

There is one page of configuration instructions for each cloud-storage provider. Links to the configuration instructions are at https://rclone.org/docs/#configure and https://rclone.org/. Follow the instructions to configure your remote now.

$ rclone config

Rclone stores all the configuration information you entered. The default location is ~/.config/rclone/rclone.conf. The remote’s password is stored in the rclone.conf file, so be careful about giving people access to it.

To list all your rclone remotes:

$ rclone listremotes

Here is how to run the tutorial’s example remote backup job on Linux (for tutorial scripts only, don’t do this for production). Add this line to your ~/.profile file, but with your remote path:

export remote="onedrive_test_rclone_backup"

and reload .profile:

$ source ~/.profile

To use a tutorial example script as a template for production backups, edit the tutorial scripts: replace occurrences of “${remote}” with your remote path.

To test your remote, run:

$ $rclone_jobber/examples/job_backup_to_remote.sh

Configure a crypt

“crypt” is a kind of remote that:

  • encrypts and decrypts the data stream for an underlying remote
  • performs encryption and decryption on the client side
  • uses the same command interface as other kinds of remotes

Instructions for configuring a crypt remote are at https://rclone.org/crypt/ and https://rclone.org/docs/#configuration-encryption.

When configuring a crypt remote, rclone will ask you to give it a name. Put some thought into naming your remotes.

name> myremote_myfolder_crypt

And then rclone will ask for the underlying remote. This example will encrypt myfolder in myremote:

remote> myremote:myfolder

You can always rename a remote later via rclone config.

To list all your rclone remotes:

$ rclone listremotes

Most remote cloud-storage providers allow you to view your directory names and file names in a web browser. But that’s not very useful if rclone encrypted the directory and file names. Use rclone to browse encrypted directory and file names.

To list directories in remote:

$ rclone lsd remote:
$ rclone lsd remote:path

To list objects and directories of path (requires rclone-v1.40 or later):

$ rclone lsf remote:path

To list top-level files in path:

$ rclone ls remote:path --max-depth 1 

To list all files in path recursively:

$ rclone ls remote:path

/examples/job_backup_to_remote.sh uses a remote, which could be of type crypt.

To test your crypt remote, set the path variable as described in the “Configure a remote” section, and then run:

$ $rclone_jobber/examples/job_backup_to_remote.sh

pathIsTooLong error

Most cloud storage providers have a 254 character-path-length limit. Crypt limits encrypted paths to 151 characters with some cloud storage providers (this is a known crypt issue). If the path is too long, rclone returns this ERROR:

Failed to copy: invalidRequest: pathIsTooLong: Path exceeds maximum length

There are 3 work-a-rounds:

  • turn off “encrypt directory names” in rclone config (file content can still be encrypted)
  • shorten your paths
  • Long Path Tool (I have not tried this)

Backblaze b2 lifecycle

rclone crypt file-name and directory-name encryption don’t work with Backblaze b2 lifecycle. This is because:

  • b2 lifecycle appends date to end of file names
  • b2 doesn’t strip off the appended date before passing the file name back to rclone

So then rclone can’t decrypt the file names.

There are 3 work-a-rounds:

  • turn off “encrypt file names” and “encrypt directory names” in rclone config (file content can still be encrypted)
  • turn off b2 lifecycle and
    • set move_old_files_to="dated_directory" in the backup job
    • manually delete old files at end of life
  • use a different remote data-storage provider

Schedule backup jobs to run automatically

After you schedule backup jobs, you will have an automated backup system with this workflow:

  1. a job scheduler calls a backup job script
  2. the job script calls rclone_jobber.sh
  3. rclone_jobber.sh calls rclone
  4. rclone consults your filter rules, connects to a backup storage, and uploads modified files

Schedule your backup jobs in your favorite job scheduler.

The following example schedules jobs on cron (cron is a popular job scheduler installed on Linux). The first line runs a local job every hour on the hour. The second line runs a remote job every hour, 30 minutes past the hour. The third line runs at 3:18 and 15:18 every day

$ crontab -e
00 * * * * /home/wolfv/rclone_jobber/job_backup_to_USB.sh
30 * * * * /home/wolfv/rclone_jobber/job_backup_to_remote.sh
18 3,15 * * * /home/wolfv/rclone_jobber/job_backup_recovery_plan_to_remote.sh

The initial backup will take a long time (subsequent backups are much shorter). If your computer goes to sleep while a backup is in progress, the backup will not finish. Consider disabling sleep on your computer for the initial backup. On Linux Gnome desktop:

right click > Settings > Power > Automatic suspend: Off

Logging options

rclone_jobber.sh default behavior places rclone_jobber.log in the same directory as rclone_jobber.sh. Read this section if you want the log in a different location.

Logging options are set in rclone_jobber.sh, headed by “# set log” comments. To change logging behavior, search for “# set log” and change the default values.

Logging options are described in the next 5 sections.

# set logging to verbose

To send more information to the log, use the send_to_log function in rclone_jobber.sh:

# set logging to verbose
send_to_log "$timestamp $job_name"
send_to_log "$cmd"

Additionally, you can set –log-level in the job’s “options” parameter.

# set log_file path

In rclone_jobber.sh, variable log_file contains the log file’s path. The default behavior places rclone_jobber.log in the same directory as rclone_jobber.sh:

# set log_file path
path="$(realpath "$0")"         #path of this script
log_file="${path%.*}.log"       #replace path extension with "log"

You can change log_file to any path you like.

# set log_file path to /var/log/rclone_jobber.log (Linux)

To set the rclone_jobber log location to var/log, create the log file and give it the user’s ownership and read-write permission. In this example, rclone_jobber.log ownership is given to wolfv:

$ sudo touch       /var/log/rclone_jobber.log
$ sudo chown wolfv /var/log/rclone_jobber.log
$ sudo chmod 0666  /var/log/rclone_jobber.log
$ sudo ls -l       /var/log/rclone_jobber.log
-rw-rw-rw-. 1 wolfv root 19 Mar 21 13:58 /var/log/rclone_jobber.log

In rclone_jobber.sh, set the new log_file path:

# set log_file path
log_file="/var/log/rclone_jobber.log"

Logrotate /var/log/rclone_jobber.log (Linux)

Over time a log file can grow to unwieldy size. The logrotate utility can automatically archive the current log, start a fresh log, and delete older logs.

To setup logrotate, set log_file path to /var/log/rclone_jobber.log (described in previous section). Then create a logrotate configuration file in /etc/logrotate.d:

$ sudo vi /etc/logrotate.d/rclone_jobber

And paste this text into the logrotate configuration file:

/var/log/rclone_jobber.log {
monthly
rotate 2
size 1M
compress
delaycompress
}

More options are listed in man:

$ man logrotate

Execute a dry-run to see what logrotate would do:

$ logrotate -d /etc/logrotate.d/rclone_jobber

Send /var/log/rclone_jobber.log to systemd journal (Linux)

Linux and macOS can send all log output to systemd journal. To do so, make these two changes to rclone_jobber.sh script:

  1. change log_option to –syslog
# set log_option for rclone
log_option="--syslog"
  1. send msg to systemd journal (sending msg to log_file is optional, and is commented in this example)
    # set log - send msg to log
    #echo "$msg" >> "$log_file"                            #send msg to log_file
    printf "$msg" | systemd-cat -t RCLONE_JOBBER -p info   #send msg to systemd journal

Back up and restore

Example backup jobs

The following system uses two backup jobs with complementary attributes (this is how I backup my home PC). The latest snapshot can be easily restored from either backup.

examples/job_backup_to_USB.sh has attributes that make it convenient to browse file history:

  • local storage (for fast navigation)
  • move_old_files_to="dated_files" (will group old versions of a file together)
  • not encrypted (easy to browse files in a file manager) (unencrypted local storage is OK if storage is safe from theft, and useful if the remote storage password is lost)
  • schedule hourly, on the hour (this assumes the USB drive is always plugged in and mounted)

/examples/job_backup_to_remote.sh has attributes that make it secure, and easy to restore a deleted sub-directory:

  • remote storage (off-site is safe from on-site disaster)
  • move_old_files_to="dated_directory" (easy to restore a deleted sub-directory e.g. Documents)
  • encrypted (please keep your password in a safe place)
  • schedule hourly, 30 min past the hour (for a back up every 30 minutes when combined with job_backup_to_USB.sh)

In addition, job_backup_recovery_plan_to_remote.sh stores recovery-plan files off-site unencrypted. Recovery-plan files are listed in the “Data recovery plan” section.

Backup to both local and remote locations in case disaster destroys one. If the Internet connection fails, local backup is still made.

Example restore data

To restore data, copy files from backup to destination.

You can use cp (shell command) to restore data from local unencrypted backup.
To copy a single file from local backup:

   $ cp -p local_backup_path dest_path

To copy a last_snapshot directory from local backup:

   $ cp -a local_backup/last_snapshot dest_path

Use rclone to restore data from remote or encrypted backup.
To copy a single file from remote backup:

   $ rclone copy remote:source_path dest:dest_path

To copy a single file from remote backup, use one of these scripts:

Test backup jobs and test restore-data jobs

The following commands test the example backup and restore jobs. They test your entire data recovery system end to end, testing both the data backup and data recovery together. Don’t worry, the tutorial’s environment is setup to make testing painless.

Clear and setup test directories in preparation for a new test run:

$ $rclone_jobber/examples/clear_USB_test_backup.sh
$ $rclone_jobber/examples/clear_remote_test_backup.sh
$ $rclone_jobber/examples/setup_test_data_directory.sh

Back up data:

$ $rclone_jobber/examples/job_backup_to_USB.sh
$ $rclone_jobber/examples/job_backup_to_remote.sh

In job_restore_last_snapshot.sh, uncomment source variable to restore data from. Then restore data:

$ $rclone_jobber/examples/job_restore_last_snapshot.sh

Verify that the files were faithfully restored:

$ diff -r $HOME/test_rclone_data/direc0 $HOME/last_snapshot/direc0

Notice that rclone does not back up empty directories.

Follow a similar test procedure when practicing your recovery plan, but with real data.

Monitor your backups

Monitor your backups to make that data is actually being backed up. Do not rely solely on warning messages or rclone_jobber.log. They do not prove that data was saved to destination.

Manually run a checklist script once per month, similar to this monitor_backups.sh:

#!/bin/bash

echo ""
echo ">>>> Check recently changed file time in local backup:"
ls -l /run/media/wolfv/big_stick/wolfv_backup/last_snapshot/DATA/Documents/tasks/tasks.org

echo ""
echo ">>>> Check recently changed file time in remote backup:"
rclone lsl onedrive_wolfv_backup_crypt:last_snapshot/DATA/Documents/tasks --max-depth 1

echo ""
echo ">>>> Check last log time:"
tail -5 /var/log/rclone_jobber.log

echo ""
echo ">>>> Check my Monthly Report emailed from my monitoring service."

echo ""
echo ">>>> Check space usage and available space."

Data recovery plan

Data recovery plan documents

A data recovery plan is a documented process to recover and protect data in the event of a disaster. The data recovery plan presented here also includes re-installing the operating system.

Example data recovery plan:

  1. Retrieve recovery-plan files from an on-site or off-site location
    • notes for installing OS
    • recovery plan (this file)
    • job_restore_last_snapshot.sh
    • ~/.config/rclone/rclone.conf
  2. Install operating system
  3. Install rclone
  4. Restore ~/.config/rclone/rclone.conf
  5. Edit source variable in job_restore_last_snapshot.sh, and then run job_restore_last_snapshot.sh

The rclone.conf configuration file contains the encryption key for backup. Keep it in a secure location. I keep my backup rclone.conf in a password manager (LastPass). The other recovery-plan files (listed in item 1.) are not encrypted so that they can be accessed before rclone is installed. With this setup, all I need to bootstrap the recovery process is a web browser and my LastPass master password.

Schedule the backup of your backup recovery plan. This ensures that your backup recovery-plan files are always up-to-date. Do not encrypt the recovery-plan files so that they can be accessed before installing rclone. For each backup location, place the recovery-plan files in a directory to be backed up.

  • If a backup is not encrypted, then the recovery-plan files will be accessible in the backup.
  • If a backup is encrypted, create an unencrypted backup job to the same underlying remote. Like this example:

Practice the recovery plan. Start from scratch with a blank environment (or use a different location on the current machine). You’ll run into snags, and that is the point. Workout the snags BEFORE data is lost. If you have enough disk space, restore all your data to a different directory. And then use diff to verify the accuracy of the restored data.

Data-recovery-plan monitoring checklist

Example annual recovery-plan checklist:

  1. review your recovery plan
  2. make sure the recovery-plan files are still accessible and up-to-date (listed in previous section)
    • on-site copy
    • off-site copy
  3. practice restore-data on small test directory, from $rclone_jobber/examples:
    1. setup_test_data_directory.sh
    2. job_backup_to_USB.sh
    3. job_backup_to_remote.sh
    4. delete the ~/test_data_directory
    5. job_restore_last_snapshot.sh

License

https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png
rclone_jobber_tutorial.org by Wolfram Volpi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Based on a work at https://github.com/wolfv6/rclone_jobber. Permissions beyond the scope of this license may be available at https://github.com/wolfv6/rclone_jobber/issues.

rclone_jobber is not affiliated with rclone.