Backing up

Backups are boring, and usually only become interesting at the point where they are needed. This necessary evil has been preoccupying me at work and at home recently.

My backup strategy is roughly:

1. Use rsync to back up all my machines to a central file server
2. Use rsync to back that central file store up to one of two sets of disks
3. Rotate the two sets of disks with one being off-site at all times

I used to use level 0 dumps rather than rsync for stage 2, which was fine when I had 500GB or so to backup, but coming up on 3TB this simply takes too long at the roughly 50-60MB/s that the disks I can afford can manage. Using a more elaborate dump scheme seemed overly complex especially since I was dumping to disk rather than to tape. This coupled with some issues associated with the OpenBSD to FreeBSD migration project led me to look for a quicker solution.

I decided to switch from the previous method of sequentially dumping over 4 disks (essentially treating them as block devices like tape) to using a software RAID 0 volume over the 4 disks. This instantly gives a big performance boost for sequential and random IO with relatively little overhead. The software raid framework in FreeBSD is also dramatically more efficient than that of OpenBSD (as well as being much easier to use thanks to GEOM).

I also decided to switch from dump to rsync, which would reduce the number of files which needed to be copied on each run of the backup. This is the major speedup to the process. Given the static nature of most of the files to be backed up this works very well and has slashed the backup times from days to hours or minutes.

GEOM and gstripe are very easy to use, you simply load the gstripe module and then initialise a new striped device using gstripe create/label. The metadata for the array is stored in the last block of the devices (only using the label command, create doesn’t set up a persistent array). Running gstripe load will bring up any persistent arrays connected to the machine. From some experimenting I found the best performance for sequential IO to come from 512k blocks but decided to use 64k blocks since the performance hit for sequential IO was negligible and the smaller blocksize should provide better random IO performance which should boost rsync performance for smaller files.

Performance in the real world is roughly 40MB/s for rsync, upwards of 100MB/s for straight copying (dd /dev/zero to the raw device goes a bit quicker than that ofc). 40MB/s doesn’t sound so great but given the much smaller amount of data that needs to be transferred thanks to rsync the backups finish a lot quicker. The performance is pretty much CPU bound at this point as can be seen from rsync maxing out the processor so an upgrade to that might be in order in future.

I run rsync with the options:

-aAXhi --delete --size-only --stats

Which provides the following:

# archive gives:
# -rlptgoD (no -H -A -X)
# recurse, symlinks, permissions, mod times, group, owner, special devices
# archive, ACLs, (xattrs), human readable number output, itemize changes, delete, verbose statistics

This is wrapped in a simple script that handles loading the GEOM and re-mounting the filesystems and so-on. This is called from a cron job every night, so to run a backup I just leave the server on overnight and the magic happens automatically.

Similar rsync scripts do similar things on all my other boxes, pushing down data to this central file store nightly (again, assuming it is on…)

One improvement which I need to work on is getting Wake-On-Lan set up for the file server to enable backups to proceed on a schedule without direct intervention. Our router can easily handle the scheduling then. That will require a new NIC however as the built-in nVidia one has no WoL support in FreeBSD.