Many people these days want to store more data than can be
conveniently accommodated on one hard disc. You can buy boxes to store
files, or build your own. I've built and upgraded several, and in
these posts I'm going to talk about how I did it.
"NAS" (network attached storage) is just the current trendy term
for a file server. The objective of all this is not just to construct
a reliable file store, but to allow access to it from multiple
machines on the local network; most people don't seem to run multiple
PCs any more, but they do have phones and tablets and things.
Why not cloud storage? Because it's at the other end of a relatively
thin pipe (compared with gigabit ethernet, at least); because you're
likely to have to pay bandwidth charges to get stuff out of it;
because the cloud is just someone else's computer, which introduces a
business relationship and a whole new layer of unreliability.
Personally I can't see a good reason not to build one's own file
server, but I'm a reasonably experienced Unix/Linux sysadmin and PC
builder. There are two general categories of pre-built box: the "home"
sort (I hear good things about Synology), which are jolly expensive,
and the "business" sort, which are vastly out of my price range (and,
by current report, are really not significantly better except that
there's someone you can sue when they lose all your data). But I enjoy
doing this myself, and I know just what's gone into the system and how
to fix it when it breaks.
The first consideration when designing your own fileserver is the
amount of data you'll want to put on it. Blu-ray discs hold 25 or 50
gigabytes each. A DVD holds up to around five gigabytes of data, or
ten-ish for dual-layer. A CD is up to 367 megabytes (assuming FLAC or,
if you must, ALAC compression, which gives you back a perfect copy of
the original). You may be happy with lossy compression, in which case
this usage can drop by 90%. Consider the size of your current data
collection (once you've ripped everything) and think about how much
you need to store. Then double it, because once you can store stuff
more easily you probably will. There are also good reasons to keep
the storage array less than about 80% full, so take that into account
too.
If you've bought data with DRM, you're probably naffed.
Remember that making copies of your own media for your own use is
currently illegal again in the UK.
The second thing to consider is how much redundancy you want to build
in to your system. Discs fail, and you want your data to survive when
they do. There are two good options here at the moment: full mirroring
(RAID1), in which you store each thing on two independent discs, or
RAID6, in which via clever tricks with parity you can build a cluster
of identical discs that have the capacity of all but two of them - but
which can survive the loss of any two. The full mirror (two
completely separate copies) is obviously better, but may not be
affordable or practicable at large scales.
When a disc goes bad, you need to be able to replace it and get data
copied back onto it before more discs fail. In practice it's generally
considered good practice not to have more than about eight devices in
a single RAID6 array; if you want more capacity than this, you can
chain multiple arrays together into a single virtual device.
Using software RAID, you can combine discs, mirrors and RAID arrays in
arbitrary ways. For example, say you have sixteen terabytes of data to
store, and you want this mirrored. The data won't fit on one drive (at
time of writing), but you can join two 10TB drives together (RAID0;
for historical reasons this is generally called "striping"), make
another striped pair out of two more 10TB drives, and mirror the
stripes. That way you have two separate copies of any individual item.
There's more on the RAID numbers, with diagrams, at
Wikipedia. One
important thing to remember is that with modern drives RAID5, which is
like RAID6 but can only survive the loss of one drive, isn't really
worth doing any more – there's just too much risk of another drive
failure while you're waiting for the replacement and then waiting
(perhaps 12+ hours) for the array to rebuild itself. Just go straight
to RAID6.
(As a side note, any time I build a non-fileserver PC these days, I
tend to put in two hard drives in a mirror arrangement. The cost is a
relatively small part of the total, and the lack of hassle when a
drive fails is well worth it.)
In short, if you're using mirrored (RAID1) discs, double again the
capacity you've estimated; that's the total capacity of discs you'll
need to buy. Using RAID6 with eight-disc arrays, add ⅓ to the
estimated capacity.
Example: you want to store 8,000 CDs in FLAC. That's about 2.9
terabytes. So you'll eventually want six terabytes. Eight one-terabyte
drives in RAID6 would fill that nicely; or you could get four
three-terabyte drives, stripe them in pairs, and mirror the stripes.
OK, so what sort of discs will you buy? I check my vendor of choice
and see what's cheap; generally, the capacity one below the biggest
available is cheaper per amount stored than the very biggest. It'll
often be more reliable, too. Avoid any disc labelled as "green"; they
power down quickly, to save electricity, and so use up their lifetime
allotment of head load/unload cycles. (If you end up with a disc like
this, you can often tweak its behaviour with hdparm, but green discs
are basically not designed for constant operation and in my experience
tend to fail quickly in a file server.) As far as interface goes, SATA
is the current cheap option; SAS may in theory get you better
diagnostics and earlier warning of disc failure, but costs
significantly more (both for the discs and for the controller).
If you're a purist, you'll buy discs from different manufacturers and
of different models, just in case all the discs from one batch fail
after the same amount of use. I'm not quite that much of a purist.
The configurations with which I've had most experience are a single
8-disc RAID6, and a combination of two or three 8-disc RAID6s
concatenated into a single storage area, so that's what I'll be using
as examples in future posts.
Next: other hardware considerations.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.