Tech/Engineering

NAS data protection best practices

Stephen Manley, CTO

NAS data protection continues to be one of the most challenging topics for every organization, large or small. Even as organizations move to cloud, Kubernetes, and SaaS applications, the volume and importance of data on their NAS systems leaves them vulnerable to outages, attacks, and audits. Too often, we see decades-old backup architectures struggling to protect some of the most important and vulnerable data. You can do better.

Since each customer uses their NAS systems differently, there is no “one size fits all” answer to NAS backup. There is, however, a path we guide our customers through to help them meet the best practices for protecting their NAS environment. 

In this post, we walk through the steps to protecting your NAS systems — the data to gather, how to define requirements, the right solution bundle for each workload, and best practices for NAS backup configurations.

Why NAS, why now?

You are spending too much time, money, and energy on your NAS data protection because everybody is spending too much on NAS data protection. NAS systems will continue to expand over the next decade because even as VMs, databases, and business applications move to the cloud, core NAS workloads — homegrown applications, analytics, and business projects — will grow on-premises. 

The most common issues in NAS data protection are:

  • Protecting data that does not need to be protected.
  • Applying a “one size fits all” policy to all datasets on a NAS system, rather than applying a “per dataset/workload” policy.
  • Using overlapping protection tools, creating redundant copies while failing to provide full protection.
  • Not configuring backups properly.

Each of these issues occurs because the organization has not followed the best practices for NAS data protection. Now is the time to invest in creating the right architecture, so your organization will be well-positioned for the growth to come.

NAS data assessment

Begin by understanding what data is on your NAS system. 

First, you need to understand the types of workloads running on your NAS system. Our customers run everything from home directories to business applications, VMs and databases on their NAS systems, and they protect each of those workloads differently. Furthermore, when we run a data assessment, we always find some surprises. For example, many database administrators store TBs of database dumps on expensive NAS systems. If the organization protects the NAS system with a common policy, those database backups are stored, snapshotted, replicated, and backed up with the production NAS workloads, making them the most expensive database dumps in the world. 

We recommend running a scanner to identify the types of files you are storing, so you can capture the types of workloads your NAS system supports. You can then select the appropriate protection policy for each workload. 

Second, you need to understand the characteristics of the files on the system. This includes the number of files, distribution of file sizes, and the rate of change for the files and data. Protection tools are optimized for different file sizes and change rates, so this information is critical for selecting the right protection solution.

The scanner should be able to capture the file distribution. Most modern NAS systems can calculate differences between snapshots  e.g. NetApp with ‘snap delta’ and Isilon with ‘changelist.’ You should run these commands to get an hourly, daily, and weekly change rate. Since the commands evolve over time, we encourage you to work with Druva or your NAS vendor to get the exact commands necessary. 

Third, it is important to identify the system’s customizations, including protocols (e.g. SMB, NFS, both, or mixed), stubbing data to the cloud, and unusual file usage (e.g. millions of symbolic links, unique ACLs per file, etc.).

The best way to gather this information is to talk to the NAS administrator. Some protection tools cannot support some customizations, so it is critical to know what constraints you face.

Define (and refine) your requirements

You need to get business, legal, and IT organizations to agree on requirements. As you set requirements, do not try to apply a single policy across an entire box. The goal is to apply the appropriate policy to the different types of workloads.

To define the requirements for each workload, ask the following key questions:

  • Recovery time objective (RTO): How quickly do we need to recover from user/application failures, system failures, and site failures?
  • Recovery point objective (RPO): How up-to-date does the recovered copy need to be when we recover from the different types of failures?
  • Retention/deletion: How long do we need to retain different types of data, and when do we absolutely need to delete data (e.g. for privacy regulations)?

NOTE: We used to ask about ransomware protection, but now we just assert that it must be part of the solution.

To balance the requirements with the costs, data change rates and file sizes become very critical. You will want to:

  • Calculate the cost of storing backups for different periods of time.
  • Calculate the network cost of transferring backup data. 
  • Calculate the NAS system I/O requirements for finding and reading backup data.

In the assessment phase, we collected hourly, daily, and weekly change rate information, to more accurately assess both the cost of the backup process and long-term retention. The network and I/O costs are calculated from the hourly and daily change rates, and long-term retention storage costs are calculated from the weekly change rates. Customers who “protect everything for seven years” are invariably shocked when they see the bill for long-term retention of high-churn log files and databases which are not subject to regulation.

Choose the right solution bundle

To meet data protection requirements, most organizations bundle tools. The challenge is to select the bundle that minimizes cost and overlap, while still meeting the requirements. Remember, you do not need to use the same bundle for the entire NAS system. Apply the appropriate bundle for each workload you are running.

The most commonly used NAS protection tools:

  • Snapshots: Best RPO/RTO for user errors (98%+ of recoveries). Does not protect against system failures, site disasters, or cyber attacks. Not suited for long-term retention. 
  • Replication: Best RPO/RTO for system and site failures. Does not protect against cyber attacks. Duplicates infrastructure costs. Not suited for long-term retention. 
  • Disk backup: Average RPO/RTO for user errors and system failures. Does not protect against site failures or cyber attacks. Increases infrastructure costs, since it is another box that makes another disk copy. Not suited for long-term retention. 
  • Cloud backup: Variable speed RPO/RTO (based on storage tier and network bandwidth) for all failures. Inexpensive long-term retention with reliable recovery because the data stays online.

NAS protection tools are optimized for different types of failures and retention periods. Some of the most common bundles include:

  • Snapshots + cloud backup: This bundle is optimized for workloads that do not need elite RPO/RTO for disaster recovery. Customers usually choose a higher tier of cloud backup for higher performance full recoveries.
  • Snapshots + replication + cloud backup: This bundle is optimized for workloads that need elite RPO/RTO across all failures. Customers usually choose a lower tier of cloud backup to optimize cost for long-term retention, instead of rapid recovery.
  • Snapshots + disk backup + cloud backup: This bundle is optimized for customers who want to protect against NAS system failure but do not have a second site or budget for replication, but do have excess backup appliance capacity. Customers usually leverage the cloud tier in the disk backup solution to make offsite copies and address long-term retention.

The bundles always include snapshots and the cloud as they are uniquely powerful for short-term and long-term/offsite protection. 

Therefore, the real question you face is — “How can I best fill the gap between hot snapshots and cold cloud backups — warm cloud backups, NAS replication, or disk backups?”

Configuring the backups

Now that you have chosen a data protection bundle, how can you best configure your backups?

First, you need to choose a backup protocol. Some customers use network data management protocol (NDMP), a protocol designed for NAS tape backup, but it does not natively support incremental-forever backups. While some vendors have hacked the format, we do not recommend that you build your last line of defense on reverse-engineered backups. 

We recommend you back up via either NFS or SMB. NFS tends to be 2-3 times faster than SMB. However, if your dataset is accessed via SMB, including mixed-mode access, we recommend you choose SMB to ensure you protect all permissions and attributes of your data. 

Second, you need to select the backup source. If you are replicating, our recommendation is usually to protect the replicated copy. It minimizes the load on the production NAS system and usually increases the size of the backup window. Before making the decision, however, assess the networking. You will need sufficient network bandwidth from your disaster recovery site to the cloud, and, in case of recovery, networking from the cloud to the production site. 

Third, you need to configure your backup environment. The factors to consider:

  • Network bandwidth: The data change rate analysis will help identify the network bandwidth requirements to meet the backup window.
  • NAS system load: The data change rate and file analyses will help calculate the NAS I/O performance requirements. The change rate identifies the data to read. Note that small file backups (< 64KB) put about 2x the load on a NAS system. Additionally, factor in an additional 5GB of I/O for every 1 million files in the data set because the backup needs to run a tree walk to find the changed files. 
  • Unusual datasets: Traversing deep (100+ directories) or wide directories (1 million+ files) will be slower than normal. Your backup will not put as much load on the NAS system or network, but it will take longer. 

With the network and I/O requirements, you can now set up your backup environment. Remember to factor in the protocol (SMB vs. NFS) and unusual datasets when you consider how long the backups will take. 

Getting started

We encourage you to run the NAS data assessment, either on your own, with a partner, or with Druva. Most customers quickly realize they are either protecting too much data, or protecting much of the data with the wrong service levels. Even a quick assessment can show enough inefficiencies to justify looking into NAS data protection more deeply.

Key takeaways

NAS systems will be part of your environment for years to come, so it is time to assess how you protect the data on those systems. Instead of patching decades-old architecture, build a solution that will last for the next decade. By following the steps outlined in this blog, you can achieve best practices for your NAS data protection. 

With insight, clear requirements, modern technology, and a well-configured backup, even NAS data protection can be secure, simple, and low cost. It is time to end the pain of NAS backups and move into the future. We invite you to explore Druva’s cloud platform for NAS backup — visit the NAS backup page of the Druva site to learn more, or download the datasheet.