Tech/Engineering

Target dedupe’s time has come and gone

September 20, 2019 W. Curtis Preston, Chief Technologist

Target deduplication appliances like Data Domain solved a number of difficulties with backups yet also introduced new challenges. This blog outlines the challenges Data Domain and its competitors were trying to solve and whether target dedupe appliances still fulfill a standalone need in companies today.

This is the third in a series of blog posts on the challenges presented by various data protection systems. My first two blog posts in this series focused on traditional systems like NetBackup, CommVault, TSM, and Networker.  These challenges include:

  • Sizing your initial deployment
  • The capital purchasing process
  • OS and backup software maintenance
  • Managing multiple vendors
  • The impossibility of tape performance tuning
  • Tapes getting lost by you or your vaulting vendor
  • Scaling the system to handle additional load
  • Unexpected costs throughout the year

If you haven’t read them, make sure to check out my previous blog posts part one and part two, that explains the challenges in greater detail.

As mentioned in previous posts, tape drives must be supplied with a constant stream of data at a particular speed in order to perform properly – a nearly impossible task. In contrast, a disk array can accept multiple simultaneous backups, each running at its own speed, without impacting restore speed the way tape multiplexing does. But since disk is much more expensive than tape, it was mainly used only for staging, which helped backups but didn’t help restores. Customers wanted to store all of their backups on disk, but couldn’t afford to. It was this challenge that Data Domain and its competitors decided to address.

Two Approaches to Deduplication: Revolutionary Replacement or Evolutionary Augmentation

Avamar (then called Undoo) was the first deduplication system I looked at. I actually helped develop their TCO model, which is how I know that Avamar technology and pricing only made sense if you did what most call a forklift upgrade. It required abandoning your favorite backup product, your tape library, your disk arrays that you might be using for disk staging – even your backup server itself. Throw out all that and replace it with Avamar.

The story was strong: an entire disk-based backup system that could also be sent off-site via replication. Sounds amazing! How much does it cost? The answer was a lot, and you were still left with a system whose OS, application, and disk system you had to maintain. (Interestingly enough, this is a very similar story to those told by Rubrik, Cohesity, and Actifio, which I will cover in a future post)

Data Domain (and others) had a different idea: what if you put deduplication technology inside an NFS/SMB server and allow customers to store whatever backups they want and then deduplicate the backups in the appliance? You could then copy it to tape or replicate the deduplicated backup to another appliance. Where Avamar was revolutionary, Data Domain was evolutionary. You could evolve your backup system without the forklift upgrade Avamar required.  You could even keep using tape where it made sense (primarily off-site backups). This evolutionary approach took hold and seemingly overnight almost everyone had a target deduplication appliance of one kind or another. The tape backup market would never be the same.

Data Domain, Exagrid, Falconstor, IBM, Quantum, Sepaton, NEC, and others all had a piece of this market. Data Domain didn’t have the biggest or fastest appliances, but they worked, and they had a great marketing and sales engine behind them. A few years later, both Avamar and Data Domain were acquired by EMC, and as of this writing, they’re still the most popular target deduplication system.

I have great respect for what Data Domain accomplished in such an entrenched marketplace.  I’ll first explain what they did right, and then discuss some challenges they actually introduced, as well as why I wonder if the time of target dedupe has come and gone.

Disk or tape, backups got better

Data Domain and their competitors made backups much better for many customers. Their dedupe appliances accepted backups regardless of speed and could copy those backups to tape much faster since they were cached on local disk. Customers still using tape for offsite had faster backups and copies. Some customers went all in and completely got rid of tape by replicating deduplicated backups to an off-site appliance. The main barrier to that configuration was cost, which is why many customers have yet to go down this route.

A demon in the shadows: easy access for ransomware attacks

Simultaneously, putting backups on an NFS/SMB mount removed a level of protection tape backup environments always had. It isn’t possible to delete or corrupt backups stored in an offline tape, but it is very easy to do with backups stored on disk directly accessible via the backup server. Therefore, while target dedupe appliances made backups better, customers introduced a new risk to their data.

If a backup server is compromised and a hacker or disgruntled employee gains the appropriate level of access, he or she can easily corrupt or delete all backups stored on disk. If those backups are then replicated to another appliance, even the remote copy of backups can be deleted or corrupted.

Sadly, many ransomware victims are finding their backups are being attacked via this or similar attack vectors.  You can see this in a tiny phrase in many of the stories about ransomware attacks: “their backups were also affected.”  This is how many companies and cities find themselves paying the ransom.

It is possible to address this risk via advanced configuration designs. Unfortunately, most customers of target deduplication appliances are completely unaware their backups are at risk; therefore, they make no attempt to make them more secure. Others are aware, but it becomes just one of many information security problems they must address.

Are standalone target dedupe appliances still necessary??

Target deduplication appliances like Data Domain made backups better for many customers while allowing them to keep their favorite backup software. But it is incredibly inefficient and costly to create a backup in NetBackup, Commvault, or Veeam, then pass it to an appliance to dedupe. It uses much more bandwidth and is computationally complicated, compared to doing it up front as a source dedupe system would do.

Today, every leading backup product also now has its own dedupe, yet standalone target dedupe products are still popular, due to their core competency and focus in dedupe. With some vendors, its because their dedupe is simply faster than what backup vendors are offering. In the case of Exagrid, the post-process dedupe they use allows for much better performance during instant recovery and backup testing scenarios.

Despite this specialized core competency, this is not necessarily advantageous to the customer, as the combined cost of a backup system and a separate dedupe system is quite high – even higher if some of it is in the cloud.

My next blog post will talk about another industry juggernaut: Veeam. Like Data Domain, they also came along at just the right time, solved a lot of challenges, but also introduced a new one.

For further reading, please refer to;