RogerBW's Blog

Building a File Server 1: planning 18 September 2017

Many people these days want to store more data than can be conveniently accommodated on one hard disc. You can buy boxes to store files, or build your own. I've built and upgraded several, and in these posts I'm going to talk about how I did it.

"NAS" (network attached storage) is just the current trendy term for a file server. The objective of all this is not just to construct a reliable file store, but to allow access to it from multiple machines on the local network; most people don't seem to run multiple PCs any more, but they do have phones and tablets and things.

Why not cloud storage? Because it's at the other end of a relatively thin pipe (compared with gigabit ethernet, at least); because you're likely to have to pay bandwidth charges to get stuff out of it; because the cloud is just someone else's computer, which introduces a business relationship and a whole new layer of unreliability.

Personally I can't see a good reason not to build one's own file server, but I'm a reasonably experienced Unix/Linux sysadmin and PC builder. There are two general categories of pre-built box: the "home" sort (I hear good things about Synology), which are jolly expensive, and the "business" sort, which are vastly out of my price range (and, by current report, are really not significantly better except that there's someone you can sue when they lose all your data). But I enjoy doing this myself, and I know just what's gone into the system and how to fix it when it breaks.

The first consideration when designing your own fileserver is the amount of data you'll want to put on it. Blu-ray discs hold 25 or 50 gigabytes each. A DVD holds up to around five gigabytes of data, or ten-ish for dual-layer. A CD is up to 367 megabytes (assuming FLAC or, if you must, ALAC compression, which gives you back a perfect copy of the original). You may be happy with lossy compression, in which case this usage can drop by 90%. Consider the size of your current data collection (once you've ripped everything) and think about how much you need to store. Then double it, because once you can store stuff more easily you probably will. There are also good reasons to keep the storage array less than about 80% full, so take that into account too.

If you've bought data with DRM, you're probably naffed.

Remember that making copies of your own media for your own use is currently illegal again in the UK.

The second thing to consider is how much redundancy you want to build in to your system. Discs fail, and you want your data to survive when they do. There are two good options here at the moment: full mirroring (RAID1), in which you store each thing on two independent discs, or RAID6, in which via clever tricks with parity you can build a cluster of identical discs that have the capacity of all but two of them - but which can survive the loss of any two. The full mirror (two completely separate copies) is obviously better, but may not be affordable or practicable at large scales.

When a disc goes bad, you need to be able to replace it and get data copied back onto it before more discs fail. In practice it's generally considered good practice not to have more than about eight devices in a single RAID6 array; if you want more capacity than this, you can chain multiple arrays together into a single virtual device.

Using software RAID, you can combine discs, mirrors and RAID arrays in arbitrary ways. For example, say you have sixteen terabytes of data to store, and you want this mirrored. The data won't fit on one drive (at time of writing), but you can join two 10TB drives together (RAID0; for historical reasons this is generally called "striping"), make another striped pair out of two more 10TB drives, and mirror the stripes. That way you have two separate copies of any individual item.

There's more on the RAID numbers, with diagrams, at Wikipedia. One important thing to remember is that with modern drives RAID5, which is like RAID6 but can only survive the loss of one drive, isn't really worth doing any more – there's just too much risk of another drive failure while you're waiting for the replacement and then waiting (perhaps 12+ hours) for the array to rebuild itself. Just go straight to RAID6.

(As a side note, any time I build a non-fileserver PC these days, I tend to put in two hard drives in a mirror arrangement. The cost is a relatively small part of the total, and the lack of hassle when a drive fails is well worth it.)

In short, if you're using mirrored (RAID1) discs, double again the capacity you've estimated; that's the total capacity of discs you'll need to buy. Using RAID6 with eight-disc arrays, add ⅓ to the estimated capacity.

Example: you want to store 8,000 CDs in FLAC. That's about 2.9 terabytes. So you'll eventually want six terabytes. Eight one-terabyte drives in RAID6 would fill that nicely; or you could get four three-terabyte drives, stripe them in pairs, and mirror the stripes.

OK, so what sort of discs will you buy? I check my vendor of choice and see what's cheap; generally, the capacity one below the biggest available is cheaper per amount stored than the very biggest. It'll often be more reliable, too. Avoid any disc labelled as "green"; they power down quickly, to save electricity, and so use up their lifetime allotment of head load/unload cycles. (If you end up with a disc like this, you can often tweak its behaviour with hdparm, but green discs are basically not designed for constant operation and in my experience tend to fail quickly in a file server.) As far as interface goes, SATA is the current cheap option; SAS may in theory get you better diagnostics and earlier warning of disc failure, but costs significantly more (both for the discs and for the controller).

If you're a purist, you'll buy discs from different manufacturers and of different models, just in case all the discs from one batch fail after the same amount of use. I'm not quite that much of a purist.

The configurations with which I've had most experience are a single 8-disc RAID6, and a combination of two or three 8-disc RAID6s concatenated into a single storage area, so that's what I'll be using as examples in future posts.

Next: other hardware considerations.

Tags: computing

See also:
Building a File Server 2: hardware
Building a File Server 3: software
Building a File Server 4: maintenance


  1. Posted by Owen Smith at 12:15am on 19 September 2017

    You only really talk about ripped Blu Rays, DVDs and CDs as driving the amount of storage required. That doesn't drive my data needs at all. My server size is driven by legally downloaded DVD-Audio images, and for my laptop the Raw part of my digital photos.

    I think it's a mistake to assume people need storage space for the same reasons you or I do. Data is data. Just talk about how to achieve a certain storage space I suggest.

    By the way, how do you build non server PCs with RAID1 mirroring? You may be making Unix assumptions here, I've yet to see a Windows PC with mirrored system discs actually work and I've seen many people try and fail. On top of that, most laptops only have physical space for one hard disc, and that space is getting smaller all the time.

  2. Posted by RogerBW at 08:47am on 19 September 2017

    Thank you for explaining how to write my blog. You clearly know far more about it than I do.

    I must admit I did regard it as obvious to the meanest intellect that if you know how much data you want to store, you know how much data you want to store, and can therefore multiply by disc overheads to work out the total you need.

    Windows apparently supports RAID in its "server" (i.e. even more overpriced) versions. I'm not interested in fighting with Windows to determine what I'm allowed to do on my own computer, so I don't use it. I don't think I'm missing much.

Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.

Search
Archive
Tags 1920s 1930s 1940s 1950s 1960s 1970s 1980s 1990s 2000s 2010s 3d printing action advent of code aeronautics aikakirja anecdote animation anime army astronomy audio audio tech aviation base commerce battletech beer boardgaming book of the week bookmonth chain of command children chris chronicle church of no redeeming virtues cold war comedy computing contemporary cornish smuggler cosmic encounter coup covid-19 crime cthulhu eternal cycling dead of winter doctor who documentary drama driving drone ecchi economics en garde espionage essen 2015 essen 2016 essen 2017 essen 2018 essen 2019 essen 2022 essen 2023 existential risk falklands war fandom fanfic fantasy feminism film firefly first world war flash point flight simulation food garmin drive gazebo genesys geocaching geodata gin gkp gurps gurps 101 gus harpoon historical history horror hugo 2014 hugo 2015 hugo 2016 hugo 2017 hugo 2018 hugo 2019 hugo 2020 hugo 2022 hugo-nebula reread in brief avoid instrumented life javascript julian simpson julie enfield kickstarter kotlin learn to play leaving earth linux liquor lovecraftiana lua mecha men with beards mpd museum music mystery naval noir non-fiction one for the brow opera parody paul temple perl perl weekly challenge photography podcast politics postscript powers prediction privacy project woolsack pyracantha python quantum rail raku ranting raspberry pi reading reading boardgames social real life restaurant reviews romance rpg a day rpgs ruby rust scala science fiction scythe second world war security shipwreck simutrans smartphone south atlantic war squaddies stationery steampunk stuarts suburbia superheroes suspense television the resistance the weekly challenge thirsty meeples thriller tin soldier torg toys trailers travel type 26 type 31 type 45 vietnam war war wargaming weather wives and sweethearts writing about writing x-wing young adult
Special All book reviews, All film reviews
Produced by aikakirja v0.1