As part of the procedure for making sure a large set of work-related
data remains intact and recoverable, I keep backups at home.
In the early days this was compressed and then burned to CD-ROMs
and DVD-ROMs, but the data sets got larger, so I've been keeping them
on the file server. (Which is on RAID-6 and backed up nightly.)
Force of habit meant that, while I did each monthly backup by rsync, I
still compressed the results separately as if I were doing to burn
them. So while an individual month's backup might have shrunk from 22G
to 11G or so, the next month took up another 11G even though most of
the contents were unchanged. This was up to about 2.3 tebibytes (out
of about 48 on the server, so while it wasn't a major burden it was
starting to make itself felt).
So I've decompressed them all, and then hard-linked identical files
together; and the result is a mere 110G, less than 5% of the
compressed size.
Compression might still offer some space saving, but only if it were
done at the individual file level, and it's useful having the files
immediately ready for access. This is why I do my backups with rsync
into a filesystem: almost always, what I want to restore is not the
full machine image but a single file or directory, and having to
rootle through some non-filesystem interface to get out what I want
produces more faff than having smaller backups would save.
- Posted by John Dallman at
09:33am on
17 February 2019
So a transparently compressing filesystem, which also had hardlinks, would be ideal? Like you, I do backups into filesystems and for the same reason; recovering one or two files is much commoner than needing to rebuild a machine.
- Posted by RogerBW at
09:45am on
17 February 2019
Well, with zfs (which is what I'm using) it's actually standard practice to turn on filesystem compression anyway - even if the data are relatively incompressible, like video files, the cost in CPU time is less than the saving in disc transfer time. But any saving from that doesn't show up in the disc usage stats. I can see how much space a whole filesystem is taking up, but this particular one contains both the backups and other things.
The next stage would be to turn on block-level deduplication in the filesystem - at which point the hardlinks would become irrelevant. But this is moderately expensive in both CPU and memory on the file server.
- Posted by Peter C at
12:26pm on
18 February 2019
The popular advice to avoid dedupe is based on systems from the 2000s running the original Sun ZFS.
The usual warning is about the size of the DDT in RAM. Essentially, the DDT should not grow larger than RAM. People tend to assume that this is because it is accessed randomly and needs to be cached to not crater performance on hard disks, but a more important reason is that if the system crashes, it may run out of kernel memory trying to replay the journal on reboot. ZoL is more forgiving than FreeBSD on this front, and is how I got my data back when a FreeNAS box went castors-up.
The size of the DDT scales with the number of records. (It's a B-tree so slightly greater than linear scaling, but close enough for our purposes.) The advice of 2-5 GB of RAM per TB of disk is based on recordsize=128k, which was the historical maximum. Contemporary OpenZFS lets you set recordsize=1M -- which I recommend you use by default unless you can justify some other value -- and with a "I know what I'm doing" sysctl, recordsize=16M. This reduces the DDT size by 8 or 128 on large files which is nice, although the downside is that it makes no difference if you mainly have small files, and will do less deduplication except in the cases where the files could have just been hardlinked together anyway.
On a sample test box with 4.3TB of "Linux ISOs", with recordsize=1M, compression=lz4 and dedup=on, the average record is 1022kB, which compares favourably with the record size of 1049kB/1MiB. There are 4.2M DDT entries, "size 895B on disk, 144B in core", i.e. about 3GB of diskspace and 600MB of memory (or 150MB/TB). Both are negligible on a modern system. Dedupe and compression save about 100GB each, so it seems to have been worthwhile to turn both on despite these files supposedly having already been compressed and shouldn't have much intra- or inter-file redundancy.
The larger recordsize may already reduce the I/O hit of dedupe to acceptable levels, but it can be mitigated further by adding a small L2ARC such as a reasonable-quality USB key and setting "secondarycache=metadata" to ensure that the L2ARC only accumulates DDTs, directories, inode tables etc, and is not filled up with large files which are cheap to get from disk and would wear out the flash. This is not a bad idea even without dedup. A €20 128GB key serves my needs here; there is never more than about 8GB written to it, but my old 8GB keys are too slow and knackered for this purpose.
(All numbers given here are proper power-of-ten SI units, unless I'm quoting somebody elses misuse of power-of-two non-SI units. Some figures obviously have enough slop in them that it doesn't really matter anyway.)
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.