Tech/Engineering

Hyper-Converged Backup Is Not Ready For The Future

W. Curtis Preston, Chief Technology Evangelist

Traditional data protection systems are rife with challenges, and I spelled out many of them in parts one and two of this blog series. Nothing proves the existence of these challenges more than my career of over 25 years helping companies make sense of dozens of backup software packages, including tape, disk, deduped disk, and the cloud. Prior to the advent of the data protection as a service (DPaaS) model Druva uses, I think the hyper-converged data protection vendors such as Rubrik & Cohesity came the closest to solving most of those challenges. But as I will explain later in this blog post, some of the challenges are simply not solvable with their business model.

Benefits of Hyper-Converged Data Protection

No more finger-pointing

The biggest challenge solved by this converged model are the difficulties of a multi-vendor backup system. I can remember a particularly egregious instance of this years ago where a particular combination of a server, HBA (host bus adapter), switch, and tape library created an intermittent problem. The main thing I remember about that incident was that every component blamed every other component. (It was totally the HBA.)

The hyper-converged approach stops this problem cold. There are still multiple vendors in that box, but it is the hyper-converged vendor’s responsibility to figure out which one isn’t working. Backup is hard enough without finger-pointing, and these vendors solve this issue.

Easier to scale

A new customer still has to do a significant amount of design work before ordering their first cluster, but scaling a cluster to meet increased demand is relatively easy. This is done by adding nodes to the cluster, which is a relatively simple process – once the order and shipping process is done of course.

(Almost) No more tape

The HC backup architecture also allowed many of their customers to go tapeless, ridding themselves of the difficulties of designing a tape-based backup systemmaking sure those backups get off-site, and worrying about those tapes getting lost. HC backup vendors did this by incorporating deduplication into their architecture, which allows them to use as little disk as possible to store backups. This also allows their customers to replicate backups between clusters, allowing you to have on-site and off-site backups without having to use tape.

The reason I said “almost” above is that the cost per gigabyte of these systems can often be quite high – especially if you attempt to run them in VMs in the cloud – so many of their customers pressured them to support archiving older data to tape. Interestingly enough, no one has ever asked me whether or not Druva supports tape out. Perhaps it’s because of our native support for Amazon’s long-term storage options, some of which are price-competitive with tape.

Reusing backup

The first vendor I remember promoting the idea of reusing backups for multiple purposes was Actifio, and Rubrik and Cohesity have also done a lot to get this idea into the minds of average IT practitioners. It is now considered a relatively mainstream idea and something that all backup vendors should be striving for. It also appears that Cohesity has gone farther down this path than anyone else, with their marketplace of apps that can reuse backup data.

APIs everywhere

I’m an old school UNIX guy that came before REST APIs, which are the modern equivalent of me requiring a command line interface of backup products back in the day. So I can certainly understand the importance of these APIs, and these products get major kudos for pushing this idea forward. It’s definitely an area where Druva can and will improve, as evidenced by our API projects that have already borne a lot of fruit – so watch this space.

Outstanding Challenges of Hyper-Converged Data Protection

Although hyper-converged backup vendors solved a lot of the challenges of traditional backup, they didn’t solve all of them. Here are a few examples.

You still have to design and buy

Although these vendors have simplified the process, someone still has to run the numbers and figure out how big of a system you need to buy, lease, or rent. How many nodes will your cluster need, and how many terabytes of storage will you need? You really have no idea until you run your first and second backups because everything is variable – especially in the deduplication world.

You also still have to design for peak demand, even though the system is going to go virtually unused most of the time. This is the equivalent of owning a car that you only drive for a few minutes a day. Just like with the car idea, this probably doesn’t sound strange or wrong. It’s only wrong if you have an option to not do that – and save money at the same time. But it means that even for these well-designed, scalable systems, the only way to properly design them is to over-provision them. This is the only way to make sure you have enough compute and storage capacity to meet the peak demand, after which the system will go virtually unused most of the time. What a waste. In addition, neither you nor the vendor wants the system to be underpowered, which means you tend to over-buy what you need.

Maintenance Responsibilities

Whether you are purchasing a set of boxes directly from the vendor, leasing them, or renting a bunch of VMs in the cloud, you are still responsible for maintaining the operating system and application. You are responsible for securing the systems and making sure that they are not vulnerable to rogue admins or malware. When something like the Spectre hardware vulnerability rears its ugly head, you will be the one that needs to figure out how much extra compute capacity you’re now going to need because you just lost a bunch of compute power due to a bug in the hardware you didn’t even know you had. Just like the previous challenge, this one probably sounds totally normal. Of course you’re responsible for the maintenance of your servers and applications. The point is that this is a challenge with traditional backup that hyper-converged backup vendors did not solve.

Target deduplication

All of these products implemented deduplication into their storage systems, which is one of the reasons they can address the entire scope of backups with one converged solution. It is important to understand however, that they chose to implement target deduplication, not source deduplication. This means that these products will require an appliance with storage everywhere they wish to backup anything. It also means that the scope of any particular deduplication set is limited to a single cluster. Products using target deduplication cannot perform global deduplication across multiple sites, increasing the amount of storage needed to store your backups. Products that use source deduplication (like Druva) do not have these limitations.

Backup and copy process efficiency

As mentioned in the previous blog post, it is still up to you to make sure that backups and copies happened. While this also sounds normal, this is one more challenge these products did not address.

Unpredictable costs

Although HC backup vendors have this issue less than other products, there are still huge swings in cost due to a variety of factors. Ask one of these vendors what happens to your cost when you cross certain boundaries. There tend to be giant cost “cliffs” when you have more than certain amounts of data. Cost variability becomes even worse if you dare to run these products in VMs in the cloud since your costs will be different each month based on how many backups and restores you actually perform, how much you migrate to and from S3, and other factors. Druva’s customers’ costs do not experience this issue.

Hardware-centric approach in the cloud era

When Druva adopted a direct-to-cloud SaaS backup model several years ago, some of us felt like Kevin Costner’s character in the movie “Field of Dreams”1. No one understood why we would design our infrastructure directly to the AWS architecture; AWS was relatively unproven at the time. Our bet paid off and we are now the only SaaS backup product that can backup servers, laptops, SaaS apps, and native cloud services with a single service.

While these hardware-centric hyper-converged products are much easier to use and much better design than the backup systems of the past, it’s hard to understand why they took such a hardware-centric approach in an era that is clearly moving away from on-premises hardware. Their representatives will, of course, say that they also work in the cloud. My next blog post will explain why that really isn’t the case. To make another movie reference, “You keep using that word. I do not think it means what you think it means.”2

In the meantime, check out this short list of related blogs and whitepapers as you consider a future proof data protection strategy.

1“Field of Dreams, Phil Alden Robinson (1989)
2“Inigo Montoya (Mandy Patinkin), in “The Princess Bride”, Rob Reiner (1987)