Building a File Server 1: planning 18 September 2017

Many people these days want to store more data than can be conveniently accommodated on one hard disc. You can buy boxes to store files, or build your own. I've built and upgraded several, and in these posts I'm going to talk about how I did it.

"NAS" (network attached storage) is just the current trendy term for a file server. The objective of all this is not just to construct a reliable file store, but to allow access to it from multiple machines on the local network; most people don't seem to run multiple PCs any more, but they do have phones and tablets and things.

Why not cloud storage? Because it's at the other end of a relatively thin pipe (compared with gigabit ethernet, at least); because you're likely to have to pay bandwidth charges to get stuff out of it; because the cloud is just someone else's computer, which introduces a business relationship and a whole new layer of unreliability.

Personally I can't see a good reason not to build one's own file server, but I'm a reasonably experienced Unix/Linux sysadmin and PC builder. There are two general categories of pre-built box: the "home" sort (I hear good things about Synology), which are jolly expensive, and the "business" sort, which are vastly out of my price range (and, by current report, are really not significantly better except that there's someone you can sue when they lose all your data). But I enjoy doing this myself, and I know just what's gone into the system and how to fix it when it breaks.

The first consideration when designing your own fileserver is the amount of data you'll want to put on it. Blu-ray discs hold 25 or 50 gigabytes each. A DVD holds up to around five gigabytes of data, or ten-ish for dual-layer. A CD is up to 367 megabytes (assuming FLAC or, if you must, ALAC compression, which gives you back a perfect copy of the original). You may be happy with lossy compression, in which case this usage can drop by 90%. Consider the size of your current data collection (once you've ripped everything) and think about how much you need to store. Then double it, because once you can store stuff more easily you probably will. There are also good reasons to keep the storage array less than about 80% full, so take that into account too.

If you've bought data with DRM, you're probably naffed.

Remember that making copies of your own media for your own use is currently illegal again in the UK.

The second thing to consider is how much redundancy you want to build in to your system. Discs fail, and you want your data to survive when they do. There are two good options here at the moment: full mirroring (RAID1), in which you store each thing on two independent discs, or RAID6, in which via clever tricks with parity you can build a cluster of identical discs that have the capacity of all but two of them - but which can survive the loss of any two. The full mirror (two completely separate copies) is obviously better, but may not be affordable or practicable at large scales.

When a disc goes bad, you need to be able to replace it and get data copied back onto it before more discs fail. In practice it's generally considered good practice not to have more than about eight devices in a single RAID6 array; if you want more capacity than this, you can chain multiple arrays together into a single virtual device.

Using software RAID, you can combine discs, mirrors and RAID arrays in arbitrary ways. For example, say you have sixteen terabytes of data to store, and you want this mirrored. The data won't fit on one drive (at time of writing), but you can join two 10TB drives together (RAID0; for historical reasons this is generally called "striping"), make another striped pair out of two more 10TB drives, and mirror the stripes. That way you have two separate copies of any individual item.

There's more on the RAID numbers, with diagrams, at Wikipedia. One important thing to remember is that with modern drives RAID5, which is like RAID6 but can only survive the loss of one drive, isn't really worth doing any more – there's just too much risk of another drive failure while you're waiting for the replacement and then waiting (perhaps 12+ hours) for the array to rebuild itself. Just go straight to RAID6.

(As a side note, any time I build a non-fileserver PC these days, I tend to put in two hard drives in a mirror arrangement. The cost is a relatively small part of the total, and the lack of hassle when a drive fails is well worth it.)

In short, if you're using mirrored (RAID1) discs, double again the capacity you've estimated; that's the total capacity of discs you'll need to buy. Using RAID6 with eight-disc arrays, add ⅓ to the estimated capacity.

Example: you want to store 8,000 CDs in FLAC. That's about 2.9 terabytes. So you'll eventually want six terabytes. Eight one-terabyte drives in RAID6 would fill that nicely; or you could get four three-terabyte drives, stripe them in pairs, and mirror the stripes.

OK, so what sort of discs will you buy? I check my vendor of choice and see what's cheap; generally, the capacity one below the biggest available is cheaper per amount stored than the very biggest. It'll often be more reliable, too. Avoid any disc labelled as "green"; they power down quickly, to save electricity, and so use up their lifetime allotment of head load/unload cycles. (If you end up with a disc like this, you can often tweak its behaviour with hdparm, but green discs are basically not designed for constant operation and in my experience tend to fail quickly in a file server.) As far as interface goes, SATA is the current cheap option; SAS may in theory get you better diagnostics and earlier warning of disc failure, but costs significantly more (both for the discs and for the controller).

If you're a purist, you'll buy discs from different manufacturers and of different models, just in case all the discs from one batch fail after the same amount of use. I'm not quite that much of a purist.

The configurations with which I've had most experience are a single 8-disc RAID6, and a combination of two or three 8-disc RAID6s concatenated into a single storage area, so that's what I'll be using as examples in future posts.

Next: other hardware considerations.

