A printed photo in an album can easily last 20+ years without deterioration.
Digital files are another story altogether: computers fail, hard drives crash, and CDROMs deteriorate.
Reliably and robustly storing digital files used to require an IT staff and rack of expensive hardware. Recent innovations have made this affordable and easy to do, even for non-nerds.
TL;DR: Make sure you have multiple copies, on separate hardware, and preferably at least one copy offsite. Newer filesystems available on NAS devices can provide additional data integrity safeguards. And remember, RAID doesn’t count.
The old axiom: “3-2-1 backups” #
The “3-2-1” backup strategy from several decades ago was one of the more common strategies for keeping files safe.
This strategy requires:
- at last three different copies of every file, with
- at least two different storage formats, and
- at least one copy offsite.
Nowadays, though, you can do better.
Why different storage formats? #
If you store your data on different formats, the hope is that the different formats will have different lifespans. When one copy fails before the other, it gives you time to make a new copy.
There are several problems with this line of reasoning, though.
The first issue is that storage formats available to consumers are typically only hard drives and optical storage media (CD-R, DVD-R, or BD-R). The latest optical storage technology (as of July, 2020), is quad-layer blue ray which can store 100GB per disk. External hard drives are regularly in the 6-12TB range, which are 60-120x larger. Backing up a $150 12TB hard drive on optical media would be ridiculously herculean and expensive: it would take 120 BD-R disks, and $1,400 in media, using $12 M-Discs.
Why 3-2-1 isn’t enough: data integrity #
All commonly available storage formats eventually suffer from data degradation or corruption.
CD-R’s with organic dyes may only last a couple years.
SD cards, when left unplugged, experience “cell voltage drift,” and lose integrity in 5-7 years.
Hard disk drives, commonly called HDD or “spinning rust”, have substantial failure rates after 4-6 years of continuous service. Even when kept offline, HDD data integrity will degrade after 5-10 years due to demagnetization of the iron oxide substrate, or servo and platter motor failures. (Feel fortunate if your decade-old external drive still spins up!)
Data integrity is the big issue that “3-2-1 backups” doesn’t address: even if you’ve got several copies of your data, how do you know that each copy hasn’t suffered from data degradation?
An example of data degradation #
Your photos and videos are comprised of hundreds of thousands, millions or even tens of millions of bits of data.
If a handful of those bits are “flipped” due to storage defects or media degradation your file can become unreadable.
Here’s an example of an image with only a couple bits flipped:
Note that the rightmost image contents are 99.999% correct, yet isn’t viewable.
Error correction codes #
Hard drives and optical disks use error correction codes to provide some amount of resiliency to bit rot. Typical ECC allows for several bits in a file to flip, be detected, and be automatically repaired.
ECC can only detect and repair a specific rate of bit rot, however. Data degradation in older storage media may exceed this rate.
Treating the symptom #
In developing PhotoStructure, we found that many of our photos on older hard drives had succumbed to some amount of bit rot. We taught PhotoStructure how to detect and skip over photos and videos that are corrupt, in the hope that you’ll have several copies of a given photo, and one of them won’t have bit rot.
But this feature just treats the symptom: this doesn’t fix the underlying problem.
Overcoming bit rot #
Several advanced filesystems, including btrfs and ZFS, support data scrubbing, which detects and repairs bit rot automatically.
Unfortunately, neither Windows nor macOS support these filesystems (and their newer filesystems, like APFS, still don’t detect bit rot). Time Machine and Backup and Restore don’t detect bit rot either.
So: how do normal people use these fancy new filesystems?
NAS to the rescue #
Network-attached storage (NAS) devices hold several large hard drives and quietly do their work safely storing your files. You can keep using your favorite OS, but you don’t have to worry about bit rot anymore.
Most NAS also support Docker, which is a way of packaging and running third-party applications, like PhotoStructure.
That sounds great. Which NAS should I get? #
Synology’s proprietary hardware comes at a bit of a cost premium than if you built it yourself, and docker support via their UI is a bit limited (which can be overcome by using a tool like Portainer).
Unraid OS runs on hardware you already have, supports XFS and btrfs, has great docker support, but requires a commercial license to use. It’s very flexible in adding and removing drives.
TrueNAS CORE (previously “FreeNAS”) is free and fairly easy to install. It uses ZFS, and supports a wide variety of hardware. If building a computer is intimidating and you’d like to try TrueNAS CORE, the TrueNAS Mini comes pre-assembled.
Note that expanding existing volumes by adding additional disks (which is simple on Synology and Unraid) is not fun on FreeNAS, due to limitations in ZFS. FreeNAS natively uses FreeBSD jails, which are technically excellent but decidedly less popular than Docker. More recent versions of FreeNAS have added docker support.
SnapRAID is different from these other alternatives, in that it’s not an operating system nor a filesystem: it’s an open-source application that runs on Linux, macOS and Windows.
SnapRAID is the only non-realtime data integrity product here: it must be run periodically to build snapshots or check and repair data integrity errors. SnapRAID requires command-line expertise and manual setup.
(PhotoStructure has no commercial affiliation with any of these products. We’d caution against cheaper NAS brands, however, as lax security updates may lead to malware infections and data loss.)
Getting started with your NAS #
Consider getting a NAS that supports 4 or more drives. More slots for more drives gives you a lot more flexibility in the future, should your data exceed your original drives. If you’re building a box to run Unraid, Fractal makes some nice cases: the new Define 7 and Meshify 2 support 12+ HDDs (!!)
Buy 50% more storage than you need right now so you have room to grow in the near future.
Enable weekly data scrubbing.
Enable snapshotting, if available.
Enable monthly S.M.A.R.T. self-tests.
Set up your NAS to either apply security patches automatically or notify you to do so.
Consider installing a virus scanner and malware detection package on your NAS. Synology has a “security audit” tool as well.
Make sure your router has recently-updated firmware
Use secure admin and user passwords. Enable 2FA if available. Using a password manager like Bitwarden makes this easy to do.
Configure your NAS to tell you if it has any errors: you don’t want any disks dying or backups failing without you knowing it.
How to get “at least 3 copies” #
Please understand: RAID is not a backup. Please read that post before continuing.
Satisfy the 2nd or 3rd copy of the “at least 3 copies” rule by copying the entire contents of your NAS to an external hard drive that normally stays powered down and offline. Do this quarterly.
Consider storing this drive near your emergency kit so you can grab it as you leave your house in case of an emergency.
This external drive can also reduce your dataloss if your NAS catastrophically fails, or if you get hit by malware like cryptolockers.
How to get “at least 1 copy offsite” #
Satisfy the “at least 1 offsite” rule by setting up your NAS to back up to a cloud service automatically. Backblaze and tarsnap are both well-regarded offsite storage solutions, and both have solutions that work with your new NAS.
If you don’t want to pay for cloud storage, you can set up another NAS (like at a friend or family member’s house). Both FreeNAS and Synology support NAS-to-NAS replication.
Make sure you configure the replication job to run in the middle of the night, and throttle network bandwidth so you don’t make your friends or family grumpy.
How do I back up files on my phone? #
Resilio Sync (for iOS and Android), SyncThing (for Android), and PhotoSync (for iOS and Android) will automatically back up your phone to your NAS at home.
You install the software on both your phone and your NAS, and then configure your phone to automatically back up to your NAS.
We recommend this approach, rather than uploading to a cloud service, simply to ensure your original files stay intact.
Howdy, hacker news visitors! #
There’s a discusson of this page here: https://news.ycombinator.com/item?id=25902030
And now that your files are safe… #
You’re welcome to try PhotoStructure, a self-hosted photo management solution that runs both on your desktop, server, or NAS, using Docker.