One or two years ago, I decided to make my own NAS from a USB hard drive and a Raspberry Pi. It has since grown considerably, and this post covers everything about it. Hopefully, this will help you build your own similar backup system(that is, if you think mine is a good one).
Backup vs. Redundancy
Before I get into the actual post, let me explain the difference between a backup and redundancy.
Redundancy exists to protect you against hardware failures, and ensure that you have a high uptime. Probably the most well known method for this is RAID(which stands for Redundant Array of Independent Disks), which stores data across multiple drives. There are many levels of RAID, the popular ones(i.e. what I’ve heard about most) being RAID 0, RAID 1, RAID 5, and RAID 6.
The simplest RAID level to explain is RAID 1, which simply duplicates/mirrors everything from one disk to another(it actually writes to both at the same time, but for the sake of simplicity…). Should one disk ever encounter a hardware failure, or get unplugged, everything can continue functioning off of the second disk. When the first one is replaced, all data is synced from the second one to the first. After the rebuild/sync process is complete, the array can afford to lose another drive. But, until the process is complete, you can’t afford any more failures. While you lose half of the capacity, it ensures that you pretty much always have access to your data.
In case you were wondering, RAID 0 just stripes everything across all of the member disks. This RAID level actually ends up increasing the chances of data loss(so don’t use it for redundancy). Losing any one disk in RAID 0 results in all the data being lost. The reason for this is that the information is spread across all the disks, and there’s no way to recover the lost information once a disk is lost. The advantage of this level is speed, and it can be used for data which isn’t that important. It is often used for data which can easily be regenerated(e.g. page cache for a website). However, it’s generally not a good idea to use RAID 0 for your NAS. It provides no redundancy, and shouldn’t be used unless you understand all the risks.
RAID 5 and RAID 6
RAID 5 works by striping the data across multiple disks, but also adds in something called parity. This parity allows for any one disk to be lost, and for the missing data to be rebuilt from the parity of the remaining disks. RAID 6 is similar, but uses twice the parity, allowing for the loss of up to 2 disks. However, RAID 5 and 6 require more than 2 disks(RAID 0 requires only 2), so it isn’t likely that you’ll use it outside of a professional environment.
Backups are more general that redundancy in that they protect you from more than drive failures. The biggest additional protection offered by backups is being able to recover deleted files. Should a file be deleted for any reason, be it a virus or just you trying to free up disk space at 2:00 in the morning, a backup will allow you to recover that file. On the other hand, once a file is deleted from a RAID array, it’s also removed from the other disks. While data recovery is always fun, it’s best not to rely on it.
The “Fun” Stuff
Congratulations if you made it this far!
First off, let’s talk about my NAS. I have a Raspberry Pi 3 connected via ethernet to my router. It’s also connected to my VPN server, so I can access my data from any device that also happens to be on my VPN. Of course, it’s also connected to a UPS(Uninterruptible Power Supply), with an auto-shutoff configured after the battery gets too low. I usually connect to it via SMB, but I can also use SFTP, SSHFS, and a bunch of other fun stuff. Most importantly, that Raspberry Pi is connected to 3 2TB USB hard drives.
Two of those drives are in a RAID 1 array, so I have redundancy taken care of. If one drive fails, I don’t even notice because everything keeps running off the second drive. Once I do realize one disk is damaged, I can replace it and start the rebuilding process.
My On-Site Backup
The third disk connected to my Raspberry Pi is used as my on-site backup. I have a script that uses
rsync to copy all the data from the RAID array to the third disk that runs every night. I intentionally left out the
--delete option of
rsync , so deleted files stay on the backup drive. This isn’t the best way to do it, and there’s probably a bunch of free software that can do it a thousand times better. But, that’s what I use, and it’s good enough for me. If I ever do accidentally delete a file, I can simply search for it on the backup, and copy it back into the main storage(the RAID array).
My Off-Site Backups
The final component of my backup system is my off-site backup. I have a smaller 2TB USB drive that I use to copy the RAID array to every now and then. Once I’m done syncing it, it goes back into a safe deposit box at the bank. That way, even if something happens to my house, I still have a backup of all of my data. The only thing I’ll lose is all of the work done in between my last backup and when everything goes wrong.
Once I have the budget, I also plan on using something like DigitalOcean Spaces or even AWS Glacier as another backup. That way, I can afford to lose 2 disks of my RAID array, 1 on-site backup, another off-site backup, and still have no data loss 😃.
It’s important to use encryption whenever you make off-site backups! In the event a backup disk is stolen from you, you don’t want the thief to have access to all of your data. Encrypting the disk ensures that only you, and anyone else with the key(generally a password), has access to the data on your backup disk. It’s also a good idea to encrypt your NAS, but it’s somewhat less important if your house has good security. Also, if you ever decide to backup to a cloud service, be sure to encrypt everything before uploading. If you don’t, that cloud service will be able to see all of your data, and potentially sell it.
Sources: How-To Geek