Data Protection

My main focus when I act as some combination of tech support, admin and technician, is data protection. I don't worry that I'll make a mistake and break a monitor, not because it can't happen but because it's easy to replace. Not so for data. I thought I'd give some examples of how to protect data and to what extent these examples actually does protect data.

User oops Hard drive failure Admin oops Lightning strike Building fire City-wide conflagration Meteorite strike
No backup
Sync to external hard drive/RAID
Acronis True Image to external hard drive
ZFS snapshots
Backup to tape
Backup to tape(offsite)
Synchronize backups to distant facility

The gist of this table is as follows: No backups equals no protection. You mark the wrong folder and press delete? A clever person writing this article can almost certainly help you out if you shut down your computer immediately and hand over the hard drive. But rescue does not equal protection.

Synchronizing data to an external hard drive(rsync or something similar) protects you only against a hard drive failure, which isn't half bad. But synchronization means that mistakes that you make on an original file are forced upon the other copy and you are more likely to have an "oops" moment than you are to have a hard drive failure. Sure, a hard drive failure means total data loss, but "oops" can be quite destructive as well.

A proper backup program like Acronis True Image or even the built in Windows Backup, placing the backups on an external hard drive, that's what you want! If you screw up, restore an earlier version of the files and you're right as rain! It doesn't protect you against serious mistakes like an administrator formatting all hard drives(including the external hdd) but that's reaching... For most people hard drive failure and run-of-the-mill "oops" are the main issues.

ZFS snapshots by themselves only protect data from user mistakes and while it is hard for an administrator to destroy data with this system in place, it is possible

Backup to tape is nice because it disconnects snapshots of data from both the administrator and the electrical grid. An admin can be as wreckless as he wants, the tapes lying in a drawer in the basement aren't going to get deleted by any combination of commands issued at the command line. Similarly an electrical surge from a lightning strike won't jump out through a mains socket, across a room and into a closed wood drawer.

Placing your backup tapes offsite is even better because any one builing burning to the ground has limited impact on data integrity. Of course, if the entire city burns to the ground and your offsite location is in the same city... well you're out of luck.

Synchronizing backups to distant locations is the best in terms of disaster recovery. Note how we synchronize backups(snapshots) of data and not filesystems themselves. The only thing that can destroy our data now is a massive meteorite strike that destroys all of civilization. If that happens you have bigger problems than your customer database disappearing.


Why do I go through such trouble to leverage ZFS snapshots for my Network Attached Storage when Acronis True Image gives better protection? Well we have a lot more dimensions than the two shown in the matrix above. I'm not too keen on making a full backup of my... 1 300 GB of data once a day. Even using incrementals it's a tall order. With ZFS snapshots I don't use 200%+ space(100% original data, 100% initial backup + X% overhead for each incremental) for my data with snapshots and I can make a snapshot once an hour without stressing any part of the equipment.

Similarly synchronizing data offsite is a lot better than physically moving tapes to a remote facility. You can sync the backups to another continent every night but how often can you make a three hour car ride to a remote facility? Once a week? Once a month? Sure tapes have a benefit in being hard to erase by mistake but a sensible uploading method of backups should be quite resilient.


I don't envy the poor people who have to protect banks' datasets. Even if we forego the whole certification-issue and just look at a bank trying to protect its interests, we have problems. Theoretically it's not that hard. Any transaction must be reported "committed to permanent storage" on multiple servers in multiple physical locations before it is actually completed. The problem is that banks have huge datasets and massive throughput. Multiply the overhead in communication and added delay due to multi-site synchronization with the size of the dataset and the frequency of modifications and you get data protection issues that I for one wouldn't want to address.

From the same sphere we have issues with snapshots. Even if datasets are small or throughput is limited, if there is a requirement that we be able to restore a dataset to any state held in the previous 48 hours down to the second, we get a big head ache. This isn't feasible for arbitrary datasets, or at least that is what I argue. If we were to create some ZFS-inspired file system that can make snapshots once a second we still run into the trouble of applications writing data in prolonged bursts that stretch across snapshots. Yes, ZFS is transactional but there is nothing to prevent a file server from writing three changes to a file as three file system transactions. Thus we can have change 1 written in snapshot 16421 with change 2 and 3 written into snapshot 16422. From a file system point of view this is perfectly logical but if the user's three changes only make sense if applied jointly then the data is not coherent if split up like I just described.

As far as I understand it workloads that require these second-accurate snapshots are handled in hard core databases like Oracle without a file system interfering. The database knows what constitutes a logical transaction and so can store sequences of transactions in a way that ensures that snapshots are always consistent. I have considered taking the notion of Intent Log in ZFS and extrapolating it so that the ZIL stretches over several days but that is a pipe dream as I'm not really in the file system-business.

Thank you to raemi( and lpenz(