Skip to content
lcreid edited this page Apr 21, 2013 · 7 revisions

So I decided to roll my own backup, for Linux and hopefully in the near future, Mac. Here's what I want to achieve.

  • Simple to implement and maintain, to be reliable, and to let me know when it's not working
  • Simple to restore from, or look up what's on the backup. Simple to restore, that is, for me, an old Unix geek
  • Back up without user intervention or really even knowledge that it's happening
  • Back up to my own file server for now, but in such a way that anyone else who provides me with generic storage accessible securely (e.g. ssh or WebDAV) could be a backup target
  • Provide real security. No can anyone see anything that they wouldn't be able to see if there was no backup
  • Work for both desktop/laptops and servers
  • Require minimal amount of space on the backup target
  • Most client computers wander from network to network these days. It would be nice not to attempt backups when the computer was off or the backup target wasn't reachable, and run backups whenever it could reach the target (without the user thinking about it). Originally this was a stretch goal, but this is really core to what's different about Simbur

Some stretch goals:

  • Turning the computer off in the middle of a backup, or taking it off-line, should not cause all sorts of downstream problems. Obviously I won't have a backup, but the next backup should run well, and I should not lose any previous backups
  • Allow the rest of my family to browse their backups and restore files from them
  • Allow some ability to look an incremental changes. Sometimes you don't realize that something's wrong for a few days. It's nice to be able to go back and look at changes, at least for the past week
  • Users can see their back ups, but not change them. Users can restore their own files, but not those of others

One popular solution is to copy one or more file systems on a machine to another file systems accessible to the machine using rsync. I found lots of rsync-based solutions for backup through Google like this one, which is Arch-based but still easily generalizable. (Note that the rsnapshot how-to documentation indicates that rsync-based solutions may not work exactly the same on BSD, and therefore I want to check Mac carefully when I go there.)

It wasn't obvious to me how to get incremental backups with rsync. Fortunately, Mike Rubel did the hard work for me.

Also not obvious how to provide the robustness on failure -- rsync makes sure individual files don't get corrupted, but you won't have a known snapshot in time. This can probably be solved with the strategies to get incrementals to work.

Setting up the target is simple, but not trivial, due to the security requirements. I'll have to set up password-less login with ssh.

The stretch goal of having users view and restore by themselves gets interesting in the general case. The backup target has to either share authentication with all the clients, or the problem becomes bigger. On my simple home network I just have to make sure that UIDs are the same everywhere.

Scheduling/Running

Real computers shut down and start up according to their own schedule, so backups should run when the computer is available. The users of the computer probably have a typical use schedule, that drives a preferred backup time. So I want something that runs when the user has the computer on, but isn't using it or planning to use it.

Or is it? We can nice the backup job so it doesn't affect the user so much. So really all we need is to say when the backup should run, and then run as soon as possible after that time. So the parameters are, how often to back up, when to do a full backup (why would I do a full), when to start pruning. Pruning is interesting, because we could keep further into the past, but with less granularity.

Or, just run every hour, like Time Machine. I wonder if Time Machine is actually hooked into the OS to copy files when they change. That would make more sense.

Clone this wiki locally