Site logo

NAS Adventures Switching to a tiered storage solution 

Sat Dec 03 2022 

NAS Adventures Switching to a tiered storage solution 

Clear and Black Cassette Tape on Brown Wooden Surface

I believed everyone who plans on storing files for the long-term convinces themselves they'll remember where and how everything is stored, at least retrospectively. Like a new garage, or a storage cellar, we tell ourselves - "not this time. I'm going to make sure everything stays neat and tidy."

Instead, what inevitably happens is that we start loading our crud into it. Smaller things that are harder to quantify value end up scattered across the storage space like lost trinkets, making finding certain things more like memory guessing game than an easy chore.

Me, like I image many others, ended up with this same situation. But instead of it being solely isolated to the physical confines of my storage closet, it bleed into my digital life as well. This became a huge problem for me when trying to figure out how to effectively do backups. Important files weren't in one area, they were scatted across multiple areas across several different storage areas on my NAS.

Important family photos? Make sure you include the folder nested along-side some disposable DVD rips or you might lose a year's worth of your daughter's childhood memories. That's the brutal reality of what could happen if backups take place without understanding where things are, but also why.

If you don't want the fluff, you can skip to "The three tiered NAS backup solution" near the end of the article for what I did allow for better backups

How to backup the chaos 

brown and green houses under blue sky during daytime
"This is literally me omfg lmao" - me probably 

When I started making regular backups for my family photos, documents, etcetera; I did what any normal sysadmin would probably do and built a recurring script on my server. It took a list of directories, archived them using a backup solution (which in my case is borg) and then wrapped it within a systemd service and timer, which ran nightly.

I'd give a code example, but the point I'm making is that this method was actually horrible and a nightmare to maintain. And no, I'm not talking about having to occasionally spend a few moments updating the script so the next backup picked up the other little bits that might have changed.

I'm talking about deduplication and how I was fighting it for no reason. Sure, having a static list of file paths that could change at any moment, and cause the archiving job to possible fail is problematic. But moving files around in a deduplicating backup can have even more problematic consequences.

Deduplicated backups are smarter than linear/statically loaded backups. In the case of borg, it will keep a flattened dictionary of file nodes and track their file signatures (along with other important metadata like the date and name information). When a file gets renamed, modified, or plopped into a different place in the filesystem structure - borg can use an array of checks to see if it's actually a new file, or just a permutation of an existing one.

This saves a LOT of bandwidth when you're dealing with backups that only have incremental differences. Instead of having to compare the entirety of the two disks and ensure parity during a sync-based or linear backup, it just uses that dictionary to check for differences.

Cool right? So what's the problem, actually. Is there even a problem this sounds like it might be in my head?

Deduplicated backups have a quietly documented disadvantage, and that's the cost of processing. When a directory structure isn't stable, or files are being plopped from one place on a file structure tree to another, or worse - sometimes were being moved AND having their metadata updated through external software: borg, or really any good deduplicating backup system still needs to check the file signatures and normal checks, but depending on the severity, this can make a speedy backup become even slower as it has to decrypt the exiting bundles, compare other metadata, choose if the modification is worth an update - and then finally rectify this all with a process called rechunking.

All of my backups use encryption, but not all of my backups are done through directly attached storage. This becomes a real issue when dealing with synchronizing data with low-powered hardware, such as a Raspberry Pi 3, which shares the network and usb (disk) bandwidth as if it were the same. Even as a SSHD client, the amount of effort to push and pull object signatures over a connection like that is miserable - and this was happening frequently as the structures in my nas were changing, files being moved around by certain docker images... Things were getting out of hand, very fast.

Tiered storage 

I needed to clean house. This deduplication process was supposed to be saving my bandwidth, but not costing extra time. Thankfully, the problems I encountered were indicators of things that were needed for a favorable backup situation.

Now normally, if you asked a tech-savy person what they thought above the above, more likely than not: the answer you would get would be some form of "just practice, get good at naming things properly and make it a habit." If you spend time researching this topic, like I did, you will probably be faced with the same conclusion.

I was not satisfied with this. There had to be a better way...

An oversimplified overview of classical storage tiers in servers 

In high performance storage solutions, there are typically ~two~ three types of storage solutions:

Taking the above and applying it to backups 

After looking at the above, I basically derived the following conclusions:

And from there, my tiered backup solution was born:

The three tiered NAS backup solution 

Three Piled Books on White Wooden Table
I am the Bookman. My book is delicious. 

On a 2.5 gigabit network, the maximum transfer speed is 312 megabytes per second. A normal SATA-3 SSD (which is what my NAS uses) can theoretically max out at roughly 600+ megabytes per second, which means storage speed is not the bottleneck there.

Knowing this, in terms of hardware I broke my NAS into two primary tiers:

And then, I broke these mediums into their own tiers:

With this structure, my backups have not only become much easier to maintain, but the logical file structure has also helped me track the priority and persistence of certain digital content. My highly important documents and photos are on a much more robust medium, and my disposable files are on a much more disposable medium.

Bonus - LTO tapes 

There's a way to get the cost benefit of hard-drives, without all of the mechanical liabilities. In the future I plan on experimenting with enterprise mechanical tapes for long-term at rest backups. Instead of relying of an always-on server sync, I give people I trust some LTO tapes and check on them once in a while. There will be a post made once this happens.

Return