Tech/Engineering

NAS backup — overcoming innate architectural limitations

Stephen Manley, CTO

“Backup is broken!” raged a large NAS user. They had too many files, too much data, and not enough time. The customer’s business depended on their data, and it was not protected. NAS backup challenges are neither unique nor new. It was 1997 when I met the infuriated NAS customer at NetApp HQ. They couldn’t back up a 40GB server that held 1 million files. Since storage capacities have grown exponentially, the NAS backup problem has too.

Still, there is hope. After two decades, we understand why NAS is so difficult to protect, the architecture you need to succeed, and how to solve your NAS backup pain.

NAS backup is a big problem

There are three challenges with protecting NAS servers:

  1. The data is really important
  2. It is slow and expensive to back up
  3. It is even more slow and expensive to recover

Customers use NAS for everything from home directories to virtual machines to databases to custom applications. The workloads are so varied that most NAS administrators do not fully understand what is running on their systems. They do know, however, that file count, capacity, and business risk trend in only one direction — growth.

Since NAS servers are so important, organizations endure acute pain trying to protect them. NAS backups now take so long that some customers cannot meet a 60-hour backup window (Friday night → Monday morning). Even worse, the backup process often decimates the NAS server’s performance, which is why NAS administrators grumble about backup being their most intensive workload.

While data backup is painful, data restore is agonizing. One customer joked that the best part of glacial NAS recoveries is that “I have enough time to update my resume, interview, and land a new job before anybody realizes that the data is never coming back.” 

Metadata sets NAS apart

While datasets everywhere — objects, VMs, NoSQL databases — are growing, NAS protection remains uniquely painful.

Metadata, the information about your data (e.g. file names, access permissions, etc.), sets NAS apart from everything else in your environment. NAS servers store and use metadata more than any other part of your environment. To read a file, the system must walk the directory tree to get to your file, load the file’s security information to ensure you are allowed to access it, and only then does it actually read the data blocks. That is a lot of metadata overhead! Moreover, files are smaller than VMs, database tables, or objects. More metadata on smaller objects means that the density of metadata in a NAS server is dramatically higher than on any other data source.

Metadata is the root cause of NAS backup pain

NAS’s metadata is the root cause of backup performance issues because it puts so much pressure on the I/O layer. Storage devices can process a limited number of data reads and writes per second (also known as Input/Output per Second — IOPS). While performance varies by CPU, storage media, and file system, there is always a limit to the number of IOPS. Since almost every system stores metadata and data together, every metadata read consumes IOPS that could have been used for data reads/writes.

The metadata challenge stresses every layer of the NAS backup environment: the NAS backup process, the backup infrastructure, and the NAS restore.

Each NAS backup generates a barrage of reads that can swamp the NAS server. For each file that it backs up, there is the accompanying metadata reads from traversing directory trees and extracting security information (e.g. Access Control Lists). Just by walking around the file system, the backup consumes so many IOPS that there is not enough left to back up the data or serve the active workloads.

Even worse, many backup products cannot optimize NAS backup performance because the metadata pushes them past their breaking point. While NAS servers struggle to manage one copy of the metadata, users expect their backup system to manage dozens of copies of the NAS server’s metadata (i.e. every backup). Since most (> 98%) of the files on a NAS server never change, backing up only changed data (incremental-forever) dramatically reduces both the backup time and NAS server load. To enable recovery, however, the backup software must create a full copy of the NAS metadata from that point in time. Unfortunately, most backup products cannot process the metadata at that scale, so they can’t support incremental-forever NAS backups. Instead, they force the backup team to run expensive weekly or monthly full NAS backups (the kind that can take more than 60 hours).

NAS metadata even slows recovery. The NAS server bottlenecks large restores, regardless of the data source (e.g. cloud, disk, or tape) because it cannot create the directories and files fast enough. Since the NAS server was not built to write the metadata for hundreds of millions of files at once, a backup product’s “hero numbers” does not apply to NAS. Your restore will run only as fast as the NAS system allows.

A metadata-optimized backup architecture

Since metadata constrains NAS backup performance at every point in the process, the backup architecture must take an innovative approach to metadata.

The foundation of the next-generation NAS backup architecture is a high-performance backup metadata store. The traditional approach of storing metadata with the data on fixed resources does not work. Therefore, a modern architecture splits data from metadata and enables dynamic resource scaling. By separating out the metadata, the architecture can independently optimize both metadata and data operations. With dynamic scaling, the system can support the most intense workloads, without permanently overprovisioning resources.

The next-generation architecture builds optimized backups on top of the metadata-optimized backup store. The first optimization is to run incremental-forever NAS backups, while synthesizing them into full backups. The backup metadata store can process any size backup set, as often as the customer wants. The second optimization is to parallelize the backup. When the NAS server has resources available for backup, it should work across the files as quickly as possible. To do this, the backup process should run across multiple directories at the same time. The back-end can then synthesize the separate metadata into one tree. A metadata-optimized backup store enables front-end optimizations to reduce the impact and time of backups.

The metadata-optimized architecture also improves recovery performance. Customers can now easily search for which files to restore and prioritize recovering those. They could even restore different data sets to different servers to maximize the parallelism of the recovery. By optimizing for metadata, the next-generation backup architecture transforms how customers protect and recover their NAS servers. It also lays the framework for subsequent value — content-based search, cross-platform recovery, and even NAS disaster recovery in the cloud.

Druva — a metadata-optimized NAS backup system

Druva has a metadata-optimized NAS backup architecture.

Druva began with a cloud-native, high-performance backup store. It stores the metadata in a database that is tuned for small, random I/O accesses. Meanwhile, the system stores the data in objects, where the IOPS are used only for processing data. The system then uses AWS’s dynamic resource allocation to scale metadata and data resources up and down. The system does not hit scaling limits and allocates only the resources it needs.

Druva NAS backup enables customers to run changed-data incremental forever backups. Druva’s customers also run multiple concurrent NAS server backups, so they can protect even the largest systems in their backup windows.

When it comes time to restore data, Druva pulls back only the data the customers need. This enables the customers to respond to their users, application owners, or legal team quickly and efficiently.

Conclusion

It is time to end a quarter-century of NAS backup frustration.

With the right architecture, the discussion becomes simple. Instead of constantly tuning to work around product limitations, the solution just works. It is simple to understand, deploy, and manage.

With the right architecture, NAS metadata can even shift from a challenge to an asset. A metadata-centric backup architecture in the cloud can solve the backup challenges and put your company in a position to extract even more value from your NAS data. It’s time for a new NAS architecture because as one NAS admin exulted after using Druva, “NAS Backup is no longer broken!”

Leave behind complex and costly data protection solutions that aren’t built for the cloud. Learn more about the Druva Cloud Platform and how you can start saving time and money with your data protection.