For almost two decades, the data protection industry has defined efficiency as deduplication, but it is time to change how we think of efficiency. In the debate about “deduplication block sizes” and “source vs. target deduplication,” we stopped measuring what matters: reducing the cost of compute, storage, and network. Today, even organizations with deduplication appliances find themselves with underutilized systems, three or more full data copies, and overloaded networks.
Traditional data protection architectures are struggling to keep pace with customers’ move to the cloud. With new workloads in more locations, evolving regulations, and relentless cyber threats, organizations are working harder to just try to keep up.
Data Protection-as-a-Service (DPaaS) has one goal — do the best job of protecting your data. This requires a modern approach to efficiency, performance, reliability, simplicity, and breadth of coverage.
In this post, we will discuss the efficiency of protection, including why efficiency has changed, the inefficiencies of legacy architectures in the cloud, and the modern requirements for best-in-class efficiency.
Why efficiency has changed
The cloud has changed how we measure efficiency because neither the production workloads nor the protection solution is confined to the data center anymore.
Organizations need network efficiency to protect data residing at the edge, in SaaS applications, on the public cloud, and in the data center. With data in locations that cannot support a backup appliance, the only approach is a centralized protection solution. In the data center, network bandwidth was practically ubiquitous. With edge, remote offices, SaaS applications, and a distributed workforce, network efficiency has become a critical requirement.
Meanwhile, even data center protection is moving to the cloud. Traditionally, IT over-provisions backup appliances to handle the load spike during the backup window, while systems idle throughout the day. Even worse, to expand hyper-converged backup appliances, customers must buy more compute and storage, even if they only need one type of resource. The market accepted this inefficiency because on-premises hardware appliances cannot scale on-demand — but the cloud can. The cloud is the ideal platform for modern data protection because it scales on-demand, scales compute and storage independently, and offers the lowest-cost storage options.
Because of the cloud, efficiency is no longer measured at an appliance level — it’s measured by the true core resources: network, compute, and storage.
Legacy data protection in the cloud is inefficient
Every vendor recognizes the value of the cloud, which is why they all have cloud offerings. They are, however, using the cloud inefficiently by lifting and shifting their legacy architectures onto the cloud. Some inefficient approaches include:
- Cloud retention tiers duplicating data — The “cloud copy” for long-term retention is a separate copy that does not deduplicate with the active backups. Customers now have a second copy of their data — the opposite of deduplication.
- Air-gapped ransomware copies duplicating data and incurring risk — Creating an air-gapped ransomware copy duplicates the data. Since it is separate from the retention copy, it is the third backup copy. Even worse, since the data is still in the customer’s account, it remains vulnerable to ransomware attacks.
- Virtual appliances wasting CPU and using expensive storage — In-cloud “virtual appliances” consume expensive compute instances and block storage 24 hours a day. Like on-premises appliances, virtual appliances cannot dynamically scale up or down, so you still need to provision for maximum usage. Unlike on-premises appliances, however, you are now paying a premium for cloud flexibility that you are not utilizing.
The cloud is the future of data protection, but if it is misused, it can be even more inefficient than legacy appliances. An efficient data protection solution needs to be architected for the cloud.
Cloud efficiency requirements
An efficient data protection solution in the cloud must optimize network, compute, and storage resource consumption to minimize the cost and complexity for the customers.
- Global source-based deduplication (storage and network) — Traditional deduplication is limited to a single appliance or cluster, but in the cloud, it should scale globally to achieve maximum space savings. Global source-based deduplication achieves maximum bandwidth efficiency — which is critical for remote offices, endpoints, and edge applications.
- Deduplication across data tiers (storage) — Deduplication should span across storage tiers, so the protection solution does not need to create a full data copy just to use cold storage (e.g. Amazon S3 Glacier). Customers should instantly save money by leveraging cold storage, not pay more to store another copy.
- Built-in ransomware protection (storage) — Cloud protection solutions should automatically store backups offsite in a separate account (the new 3-2-1 rule for backups), eliminating the need for another copy of the backup.
- On-demand scalability (compute) — Cloud protection solutions must scale up and down, so there is no need for overprovisioning. By automatically allocating and freeing resources to meet the customers’ performance needs, the solution provides performance and efficiency.
- Independent scalability (compute and storage) — Cloud protection solutions should scale compute and storage independently, so customers can get the performance and data retention they need, without extra overhead.
- Flexible restore (network)
- Efficient solutions support high-performance network options to the cloud — e.g. direct connect with partners like Equinix.
- Customers should be able to restore from the cloud to on-premises via tools like AWS Snowball Edge.
- Customers should be able to restore data to either the cloud or on-premises, to enable maximum bandwidth efficiency.
A cloud-native data protection solution has a fundamentally different architecture than a data protection appliance, and the difference in efficiency is one of the easiest litmus tests to tell which you are evaluating.
Druva’s cloud efficiency
Druva was born-in-the-cloud and architected for cloud efficiency. We run over eight million backups a day around the world. Many of those backups are over low-bandwidth, unreliable network connections, enabled by our global source-based deduplication. Many customers use our long-term data retention option, which immediately cuts the costs of their backups. All backups are stored in Druva’s account, providing built-in ransomware protection with no extra copy or cost. Druva automatically scales to meet their backup and recovery needs — even at data center scale — at no extra cost. Customers can always recover their data how and where they want — including cloud disaster recovery for their on-premises environment.
After a generation of debating the deduplication efficiency of backup appliances, the market has changed. Backup appliances are not efficient for distributed data environments, long-term retention, ransomware protection, dynamic environments, and rapid recovery.
The cloud provides a new architectural paradigm that is changing how we evaluate protection efficiency focused on the core storage, compute, and network resources. Unfortunately, legacy architectures are unable to take advantage of the cloud’s flexibility, scalability, network connectivity, and low-cost storage options.
It is time to expect more efficiency from your cloud-based data protection solution. Demand global source-based deduplication, deduplication-aware tiering, native ransomware protection, on-demand scalability, independent scalability, and flexible recovery options.
If you want true cloud efficiency, Druva is the standard against which all other solutions compare themselves. We encourage you to compare for yourselves; explore all that Druva has to offer and experience our award-winning cloud data protection solution in action.