Cloud-Out vs Cloud-Native Data Protection – What’s The Difference?

W. Curtis Preston, Chief Technology Evangelist

Cloud-Out vs Cloud-Native Data Protection – What’s The Difference?

The possibilities of storing backups in the cloud have led to a mandate in many environments to adopt data protection in the cloud as soon as possible. This mandate was passed on to backup software and hardware vendors, and they reacted to it in various ways. The initial method by which backup software products used the cloud was to archive some backups to object storage, and it is this idea to which this particular blog post is dedicated.

This blog is the latest in a series of posts on the evolution of the data protection industry over the last two decades. Like each previous advancement in data protection, copying backups to cloud storage solved some problems and created a few new ones.

Cloud-out to object storage

Backup environments typically had two challenges: getting backups offsite, and storing some backups for longer periods of time. Historically offsite backups were accomplished via tape or a similar removable medium. In recent years, some environments began using deduplicated disk systems that could replicate backups offsite. While effective, this model can be very expensive, requiring some environments to continue using tape for off-site. Long-term storage of backups on disk – even de-duplicated disk – can also be expensive, so many people still use tape for that as well.

The idea behind what many products call cloud-out is simple: solve both problems by copying some backups to the cloud. Backups in the cloud are off-site, and older backups could also potentially be deleted from the on-site disk systems.

This allowed some environments to go tapeless by enabling them to have backups on-site and off-site without having to touch a tape – and without having to manage another storage array. The question is whether or not this was a good idea. Some companies tried to address this by copying backups to object storage such as AWS S3.

Increased usage (and cost) of cloud accounts

Almost all systems that copy backups into cloud storage do so by copying it into the customer’s account. The customer creates a dedicated account for backups and enters the appropriate authentication credentials into the backup system. They then specify which backups would get copied into the cloud (e.g. all backups or weekly full backups) and kick off the process. The backup system automates the process of copying backups into the cloud and deleting them from the source (if the customer specified to do so).

The challenge with this methodology is that the customer often finds themselves with very large cloud bills. They are billed when data is copied into S3, deleted out of S3, or restored from S3 – and (of course) charged each month for the number of gigabytes stored in S3. Many customers find that while they were able to meet off-site and long-term retention requirements, they do so at their pocketbooks peril. This bill shock is caused by a variety of reasons.

A second, full copy

One of the reasons for the large bill is that the cloud copy is rarely seen as an extension of the on-premises copy: it is another copy entirely. If data is to be sent to the cloud, the data in the cloud needs to be a standalone copy, meaning it can not rely on backups that are stored on-premises. This actually makes perfect sense if you consider that one of the purposes for sending data to the cloud is to act as a “just in case” copy, and idea that does not work if you still rely on the original.

What might come as a surprise, though, is deduplication is often not used when storing data in the cloud – backups are rehydrated when copied to the cloud. Rehydrating describes the process of reversing deduplication. Deduplication allows backups to save space by sharing common blocks of data. Rehydration reverses that process and ensures that every block needed for each backup is stored with that backup. The effect is that a copy of data in cloud storage would require 10 to 100 times the amount of storage that the same copy would need if it were on deduplicated storage in the data center.

One large object

In addition to rehydrating backups that are stored in the cloud, most backup products supporting cloud-out store each backup as one large object. For example, the latest backup of the image of a particular VM is stored as one object in the cloud. This is why these backups are typically rehydrated; it allows the backup system to restore a VM or file by simply grabbing the latest object of that file stored in the cloud.

Unfortunately, this comes with another downside: performance. If you can only restore a VM or large file by retrieving a single object, its restore speed will be dictated by how quickly that single object can be retrieved from the cloud. In contrast, if backups were stored at a more granular level, you gain two benefits: deduplicated backups that take up less space, and quicker restore speeds by simultaneously restoring hundreds or thousands of objects.

Two-step restore

Most products that use a cloud-out design cannot restore directly from object storage. They must first pull the entire image back from the cloud, and import it into the primary backup system before initiating a restore. This two-step restore process costs you even more time at a time when you have so little time to spare.

Secondary copy

A cloud-out is a second-class citizen. It is stored just in case – and you really hope you don’t need to use it. Restores are not going to be very fast, and egress charges mean you’re probably not going to do a lot of test restores to make sure that they meet your recovery requirements. It’s basically an online version of tapes stored in a box at Iron Mountain. It’s certainly not something you’re going to build your disaster recovery plan around; it just allows you to say that you’re using the cloud for some of your backups.

Forget cloud-out, go cloud-native

If you’re interested in using the cloud for data protection, consider using a product or service that doesn’t think of the cloud as a second-class place to store your data. A backup system properly designed for the cloud costs less and restores faster than anything that’s just using the cloud as a replacement for Iron Mountain.

Learn more about data protection in the cloud: