Skip to content
Jamie Furness edited this page Aug 19, 2014 · 5 revisions

A backup job starts at the client side, with a bash script usually run by cron or manually. The scripts makes use of the HTTP api to signal the server the beginning of a backup job, after that the script dumps the data it wants to backup from the service (database, etc) where it resides and uploads that data to the backup service.

Usually the client script won't deal with things such as compression and encryption, just with the specifics of extracting a consistent view of the data from the service it is backing up. The service has a basic file extension based algorithm to detect compressed files and avoiding re-compressing them. This way you can save network bandwidth when doing a backup by compressing and you can afford the extra cpu cycles in the host where the backup runs.

After uploading the file(s) to the backups service, the script must exit with an exit status of 0 if everything went ok or any other exit status if it detected any error. For example, a timeout while trying to connect to the service it wants to dump data from.

The backup script makes use of a shared library which includes all the functions you need to perform these steps. Additionally this library takes care of all the common things you might need, such as preventing two instances of your script to run simultaneously, retrying on failure (the number of times is configurable), automatically signalling to the service the result of the backup script (by trapping the exit status) and uploading to the service any output produced by the script to stdout or stderr. Also, the library will signal your script with SIGUSR1 in case you want to trap it for doing some cleanup, like removing temp files, etc.

Once the backup is done, the backups service will take care to alert if your scheduling of the script is not fulfilling the required backup policy (such as that there has to be a backup at least every 24 hours), or if your script crashed badly and we haven't had any news from it after it started a backup, etc.

That was only half of the story. As we want to be 100% sure that the data we are backing up will allow to restore the service from scratch, you need to write a verification script. This script will be usually run as well by cron or manually and its job is to download the latest available backup from the backups service, and somehow validate that it can restore the service from that data. Think of it as TDB (Test Driven Backups). This way we make sure we are not missing files, configurations, etc. that need to be backed up in first place. The backups service will also take care of alerting if we don't receive periodic verification results, if the verifications fail, or if you are doing backups without verifications.

Once there has been at least one single backup for a given service, the backup service will expect recurrent backups for that service in compliance with the backup policy. If for some reason a service shouldn't be backed up ever again (Data moved to a different service, etc) the only way to make the backups service forget about it is to manually delete all existing backups for that service. This is a deliberate design decision.

Clone this wiki locally