Home File Server - SnapRAID

SnapRAID - A quick how-to Software RAID for protecting your data when you don't want to actually use RAID

Josh Wood

Jan 6, 2019 • 7 min read

The first question is, what is SnapRAID? Snapraid is a Snapshot Parity RAID-like system, or from the blurb on (SnapRAID's website)[https://www.snapraid.it], "A backup program for disk arrays. It stores parity information of your data and recovers from up to six disk failures".

RAID and redundancy is very popular, but what does RAID solve? Why would you want to use it? RAID (Redundant Array of Independent Disks) is meant to solve problems of potentially losing operational data when a disk in your array fails and to keep the array online and running. It ensures you still have a copy of that data somewhere else in the array to restore the data from when you replace the bad drive. It's great for enterprise level systems with many many disks, and proper backup solutions (tape, array clones, off-site, cloud, etc...). However, this isn't generally great for a home file server or a file server that doesn't have a lot of small, frequently changing files.

For my use case, this is overkill, and RAID has quite a few scary edges to it that make me shy away from it. These are generally related to hardware raid, which means you are reliant on the exact make and model of hardware to support your RAID array, and should any piece of your hardware fail, it's possible to lose all data in the array. Mitigating this is as simple as moving to a software-based RAID, but this generally requires more compute resources than I'm willing to invest in a simple file server.

SnapRAID for Snapshot Parity

To try to capture the best of both worlds, JBOD (Just a Bunch Of Drives) storage array, and some sort of data parity, SnapRAID is the perfect solution. This means I can keep a bunch of hard drives in my home server and drop files anywhere I want on them, and at the same time, that data will have some redundancy as protection from one of those drives failing. So, why not use something like ZFS which sounds like it solves just about the same problem? In short, ZFS does not allow me to easily add new drives to my JBOD pool without much ado, but configuring SnapRAID to protect a new drive under its parity calculation is as simple as editing a single file and re-running the Sync function to generate parity changes resulting from adding the new drive.

There is one major downside to using SnapRAID instead of a realtime raid parity calculation though. For SnapRAID to function, you must run the Sync command, which will read data from all of your disks, run a computation to calculate the parity of your data (the result of the calculation allows you to regenerate the data should one of the disks go missing by figuring out the missing bits required to make the parity of the remaining disks match the existing parity). Since it depends on this sync command, which is run at a point in time, it only gives you redundancy for your data at that point in time. If you were to sync your array, then add a new file to a disk and that disk crashes, your new file is missing and not recoverable since it was added after the parity calculation was run. The upside is if you don't add files very often you're not likely to lose files between sync commands! This sounds like the perfect option for a home file server.

Installation and Configuration

To get set up, first you need to download and compile SnapRAID, so let's get Debian based Linux OS ready by ensuring we're up to date, and have gcc, git, and make installed:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install gcc git make -y

Now let's download, compile and install:

cd ~/Downloads
wget https://github.com/amadvance/snapraid/releases/download/v11.3/snapraid-11.3.tar.gz
tar xzvf snapraid-11.3.tar.gz
cd snapraid-11.3/
./configure
make
make check
make install
cd ..
cp ~/snapraid-11.3/snapraid.conf.example /etc/snapraid.conf
cd ..
rm -rf snapraid*

If you don't have disks ready and you need to partition them, then also install parted and gdisk, then partition your disk(s):

sudo apt-get install parted gdisk

Partition disk b (/dev/sdb) and repeat for all disks that need to be partitioned (warning! This will destroy data on your disks):

sudo parted -a optimal /dev/sdb
GNU Parted 2.3
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel gpt
(parted) mkpart primary 1 -1
(parted) align-check
alignment type(min/opt)  [optimal]/minimal? optimal
Partition number? 1
1 aligned
(parted) quit

I find it very helpful to configure Parity disks and Data disks to live in separate mount directories within my system, it also makes configuring tools like MergerFS much simpler since you can reference a whole directory vs individual mounted drives. Configure a place to mount your data and parity drives:

mkdir -p /mnt/data/{disk01,disk02,disk03,disk04}
mkdir -p /mnt/parity/parity-01

Configure your drives to be mounted via /etc/fstab

sudo blkid ### Make note of the UUID of each disk
sudo vi /etc/fstab
### Append the following as suited for your disks

# Data Disks
UUID=disk01        /data/disk01        ext4    defaults    0    2
UUID=disk02        /data/disk02        ext4    defaults    0    2
UUID=disk03        /data/disk03        ext4    defaults    0    2
UUID=disk04        /data/disk04        ext4    defaults    0    2

# Snapraid Disks
UUID=parity01    /parity/parity01    ext4    defaults    0    0

Note: You can configure your fstab's last column to 0 for the data disks to avoid boot time disk checks.

Make a file system on each of the disks (assuming the disks are sdb1 through sde1):

sudo mkfs.ext4 -m 2 -T largefile4 /dev/sdb1
sudo mkfs.ext4 -m 2 -T largefile4 /dev/sdc1
sudo mkfs.ext4 -m 2 -T largefile4 /dev/sdd1
sudo mkfs.ext4 -m 2 -T largefile4 /dev/sde1

This also makes a reservation of 2% so if a disk is the same size as your parity disk, you're not able to completely fill it, otherwise, there will not be enough room for the additional data that the sync operation needs to store about your data parity. For the parity disk, however, we can make a 0% reservation since we don't need to preserve any space or prevent it from filling completely:

sudo mkfs.ext4 -m 0 -T largefile4 /dev/sdf1

Now that we have a filesystem on each of our drives, they're ready to be used. Mount them all with:

sudo mount -a

Now configure SnapRAID:

sudo vi /etc/snapraid.conf

This is similar to my configuration

parity /mnt/parity/parity01/snapraid.parity
 
content /var/snapraid/content
content /mnt/data/disk01/content
content /mnt/data/disk02/content
content /mnt/data/disk03/content
content /mnt/data/disk04/content

disk d1 /mnt/data/disk01/
disk d2 /mnt/data/disk02/
disk d3 /mnt/data/disk03/
disk d4 /mnt/data/disk04/

exclude *.bak
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude .AppleDouble
exclude ._AppleDouble
exclude .DS_Store
exclude .Thumbs.db
exclude .fseventsd
exclude .Spotlight-V100
exclude .TemporaryItems
exclude .Trashes
exclude .AppleDB
 
block_size 256

autosave 250

Create a directory for the content file on your root drive:

sudo mkdir -p /var/snapraid

Now run the 1st sync to calculate parity for your drives. This may take a long time depending on the amount of data you have.

sudo snapraid sync

Now that everything is set up and the first sync of the data array is in progress, how should parity be kept in sync? Originally I found a similar article to this one on ZachReed.me, where he had written a very nice script to use the diff command of SnapRAID, check the number of changed files output by that command, and compare against a threshold. If the threshold was breached, no further action would be taken and the user is emailed to let them know there are more changed or deleted files than the threshold. For the past few years, I've used the script to run nightly sync jobs, and a weekly partial scrub of my array to ensure everything is up-to-date and there is no "bit-rot". You can find the reference to the (original script here)[http://zackreed.me/updated-snapraid-sync-script/], and the (updated script which supports split parity here)[https://zackreed.me/snapraid-split-parity-sync-script/].

Concluding

This is the base setup I use for my home server's data array, but I actually have more than just 4 data drives, so I employ SnapRAID's recommended 2-parity setup at the moment. This means, for every 4 data drives, I have at least 1 parity drive. At the moment this is 6 data drives (ranging from 2 TB to 4 TB) and 2 parity drives at 4 TB each (a requirement by SnapRAID for your parity drives to be at least as big as your biggest data drive). A couple of these are quite old and I expect them to fail soon, when they do I'll be sure to write a quick post on running the recovery to replace the drives and rebuild the data that was destroyed by the failing drive (I've already had practice with this one so hopefully the next time will go much better, or I can replace the drives before they fail!).

This post also does not cover the pool feature of SnapRAID, which joins multiple drives together into one big "folder". However, I find this feature lacking in SnapRAID and prefer to use MergerFS as my drive pooling solution (coming in a future blog post).

One final note is that it's possible to use SnapRAID on encrypted volumes as well. You could entirely encrypt data drives and parity drives, automatically mount and decrypt them at boot with a key file securely stored under your root account on your server after successfully entering a passphrase to unlock your root filesystem, and it can all be done remotely through SSH. This will probably be coming in a future blog post as well.

Any questions or comments, it should be possible to reach out through the Disqus comments below.