In this part of the series on building a file server, I'll talk about
software.
You could just install FreeNAS. I'd rather
have a server that I can patch and fix like all my other servers. So
I'll ignore that option and do it the fun way.
You have to make several choices here. To combine discs together into
mirrors, stripes, and RAID volumes, you can use md (the Linux
multi-disc driver), or ZFS (available on Linux but supposedly more
robust on FreeBSD/OpenIndiana). If you use md, you should probably put
a volume manager on top of it (so that you can extend the array later
without major pain), and you'll need to put a filesystem on top of
that (for example the current Linux standard ext4; I think btrfs is
probably still too flakey for production use); if you use ZFS it acts
as a volume manager and filesystem too. ZFS uses different
terminology: its RAID6 is "raidz2", its RAID1 is "mirror", and it uses
no special term for RAID0. At this point I use ZFS for convenience (it
also incorporates incremental remote backups), though ext4 on md has
served me well in the past.
One caution: ZFS starts slowing down when it gets more than 80% full,
and at 90% is downright sluggish. Plan capacity accordingly.
My OS drives use md RAID-1, because boot support for ZFS was not
reliable when I built these machines. I understand it's better now.
Create the pool. Yes, you must use ashift=12 so that sector sizes
match what a modern disc wants.
zpool create -o ashift=12 storage raidz2 /dev/disk/by-path/…
Create filesystems within the pool. This compression mode is so light
on CPU usage that it provides a speed increase (fewer bytes have to be
read off the disc).
zfs create -o compression=lzjb storage/foo
For filesystems that may have significant duplicated data (e.g.
backups of multiple machines), you can add -o dedup=sha256
to save
some space; note that this wants lots of RAM.
Remember that RAID is not a backup system. I'll repeat that, because
it's important: RAID is not a backup system. If you delete or
corrupt a terribly important file, that change will be faithfully
mirrored across all your redundant discs before you have time to say
"oh shit". ZFS offers snapshots as a way of getting round this, but
really you want a full backup too. Which, in practice, probably means
building another machine to do the same job, though maybe with less
redundancy and it doesn't need to be running full-time. My current
setup has a full mirror with the same hardware setup and capacity as
the primary.
Use the tools of your choice to map the internal device names of your
discs to their actual serial numbers. I tend to use
hdparm -I /dev/disk/by-path/… |grep Serial.Number
Keep the results of this somewhere that isn't only on the fileserver.
When a disc fails, the software will tell you the internal device
name, but it's nice to be able to confirm that with the serial number.
I like to run relatively little software on my fileservers, because I
have other machines too and I want the fileserver to put all its
efforts into serving files; get_iplayer
will run on a different box.
(I do run mpd
on the file server, though, for convenience of
access.) If you don't have other machines that run all the time, you
may want to put other software on the server, which is much easier
with a straight Linux or FreeBSD installation then with FreeNAS.
To get existing data on, if you're using a conventional PC chassis,
you may have had room for a DVD drive, in which case you can copy DVDs
and CDs directly; otherwise just pull data across the network.
(dvdbackup
and cdparanoia
are recommended). I was running four CD
drives in parallel (on different machines) when I did my own mass
ripping. This may take days or weeks, but you only have to do it once;
then the physical media can go into the loft to serve as
unusually-bulky licence keys.
NFS is the traditional way of getting data onto and off a storage
server in the Unix world. Authentication by anything other than IP
address (and remember, this is UDP, so anyone on the LAN can send a
packet claiming to be from anywhere) is such a nightmare that I've
never got it working, even with Kerberos, so I supply NFS read-only.
For access I use sshfs, which with modern CPUs is plentifully fast.
For Windows machines, Samba is still the way to go. I don't have any
Windows machines any more, hurrah. I think Macs probably talk this
too. iOS and Android can barely do anything by default, but apps can
persuade them to talk sensible protocols.
If you have a smart TV or similar closed-source hardware, you may want
to look into a DLNA server.
And of course boring old HTTP still works.
The final part will deal with maintenance.
Comments on this post are now closed. If you have particular grounds for adding a late comment, comment on a more recent post quoting the URL of this one.