Use Cases
- AI Resilience
  - AI Resilience
  - AI
    - AI
    - Claude
    - Copilot
    - MCP
  - Endpoints
    - Endpoints
    - Endpoints
- Cloud Native
  - Cloud Native
  - AWS
    - AWS
    - Amazon EC2
    - Amazon RDS
    - Amazon S3
    - Amazon EFS
  - Microsoft & Azure
- Data Center
  - Data Center
  - Virtualization
    - Virtualization
    - VMware
    - Hyper-V
    - Nutanix
  - Databases
  - Unstructured Data
    - Unstructured Data
    - NAS
- SaaS Apps
- Adopt AI with Confidence
  Recover, govern, defend, and accelerate AI data, workflows, and operations
  
  Accelerate Cyber Resilience
  Reduce costs, accelerate cyber recovery and simplify management
  
  Secure Multi-Cloud Environments
  Secure data within AWS/Azure or across clouds without hardware headaches
  
  Modernize Data Protection
  Data protection for data centers, cloud workloads, SaaS apps, and edge devices
Why Druva
- The Druva Difference
  The Druva Difference
- About Druva
  About Druva
- Explore
  Explore
  - Customers
  - Careers
  - Events
  - Newsroom
  - Blog
- Customer Spotlight
  
  ZS Associates cuts recovery from days to just hours
  Case Study
  
  Contact Us
  
  Our experts are here to help.
  Reach out
Products
- The Resilience Cloud
  The Resilience Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
  Dru AI
  Ensure backup health and trends, accelerate troubleshooting using Agentic AI
  
  Dru Metagraph
  
  Dru SRE Agent
- Dru AI
  Dru AI
  Ensure backup health and trends, accelerate troubleshooting using Agentic AI
  - Dru Metagraph
  - Dru SRE Agent
- AI Resilience
  AI Resilience
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Identity Resilience
  Identity Resilience
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- eDiscovery & Compliance
  eDiscovery & Compliance
  Ensure compliance and accelerate eDiscovery with Druva’s cloud-native SaaS. Instantly search backup data, apply legal holds, and simplify governance.
  - eDiscovery & Legal Hold
  - Compliance & Sensitive Data Governance
- Data Resilience
  Data Resilience
  Discover Druva's data resilience solutions to protect, backup, and recover your enterprise data effortlessly in the cloud. Ensure business continuity with secure, scalable, and automated data protection solutions.
- Cyber Resilience
  Cyber Resilience
  Explore Druva's cyber resilience framework featuring real-time threat insights and 24/7 managed data detection
Learning Center
- Resource Library
  Resource Library
- Explore
- Product Resources
- Druva is a 2026 Gartner® Magic Quadrant™ Leader
  Get the Report
  
  Switch to Druva, Reduce TCO by up to 40%
  Calculate Your Savings
Partners
- Alliances
  Alliances
  - AWS
  - Dell
  - Microsoft
- Ecosystem
  Ecosystem
  - Security Integrations
  - Technology Partners
- Value Added Resellers
  Value Added Resellers
- Managed Service Providers
  Managed Service Providers
- Partner Portal
  - Partner Portal Login
  - Managed Service Center
- Join Our Partner Network
  
  Deliver cyber resilience with ZERO hardware, ZERO infrastructure, ZERO hassle
  Apply now
  
  Druva Marketplace
  
  Discover trusted integrations to extend Druva and simplify your cyber resilience workflows.
  Explore the Marketplace
Get Started
Search queries sent to third parties.
Support
Login

Tech/Engineering

Long-term retention of backups — a new architecture

April 09, 2020 Stephen Manley, CTO

You’ll never go in the water again.

In space, no one can hear you scream.

Be afraid. Be very afraid.

We need to get data from 10-year-old tapes.

The last tagline may not be from Jaws, Alien, or The Fly, but long-term retention of backups is more scary because it’s real.Since the dawn of backup, organizations have expected the backup team to retain and recover data for years or even decades. Despite a constantly evolving environment, business leaders expect backup administrators to keep old backups accessible at a low cost. Meeting long-term retention (LTR) requirements has been almost impossible because the backup systems were not architected to handle years of metadata for data retention. Now, however, next-generation backup solutions can manage the metadata and take the fear out of long-term retention.

Why have long-term retention?

Backup administrators do not do long-term retention because they want to, they do it because they have to. In some industries, regulations demand that backups be kept for 3, 7, 10, or 30+ years. In some organizations, the business team wants to retain all data for an extended time period — e.g., to support IP lawsuits, reference old projects, or recall old data for further analysis — regardless of cost. Perhaps the most prevalent reason, however, is that “We’ve always done it this way, and nobody wants to be responsible for changing it.”

The challenges of long-term retention

There are only three challenges with long-term retention of backups. Unfortunately, they are the big challenges of cost, recovery, and vendor lock-in.

Regardless of the media, long-term retention of backups is expensive.

Tape: LTR increases media consumption (can’t reuse and have to clone tapes) and your off-site administration costs (storing the tapes). Additionally, LTR is the main reason why companies still even maintain their costly tape infrastructure.

Disk: Deduplicated disk tries to support LTR, but it doubles the storage footprint (duplicates data across active and archive tiers) and increases the expensive deduplication metadata storage requirements (to track the additional deduplication metadata).

Cloud: Traditional backup architectures are bolted on object storage support, so not only do the disk challenges (more data, more metadata) remain, but there are additional customer costs for data ingress, egress, and request fees.

As difficult as it is to store LTR backups, retrieving data is near-impossible. Imagine trying to restore a dataset from a 7-year-old backup. Today, it lives in a VM on an ESX server, but it may have been virtualized and migrated multiple times over 7 years. Since backup software tracks datasets by server, you will need to trace the migration history of the application to find the right historical backup(s). If you don’t know the specific folder to recover, you have to restore everything because there is no easy way to search across historical backups. Even if you find the dataset, now you need to find a tape device, an old version of backup software, and a server that can support the old backup software. Restoring from LTR backups is a wish built on a dream topped off by a miracle.

Business leaders become most frustrated when they find out that they cannot eliminate legacy backup infrastructure because of LTR backups. As their grand plans dissolve, they go through the 5 stages of grief:

Denial — “We don’t need those old backups.” (Yes, we do.)
Anger — “How can backup vendors hold us hostage?!” (Our data is in their format.)
Bargaining — “Can’t we migrate the old backups?” (Then they see the cost.)
Depression — “Am I stuck with this legacy infrastructure forever?” (It’s called “long-term” retention for a reason.)
Acceptance — “We’ll minimize the investment in the old environment and let it age out over seven years. Seven. Long. Years.” (Welcome to the backup life.)

Metadata is the root cause of LTR backup pain

Long-term retention puts a unique strain on a backup system’s metadata management. It stresses the data protection infrastructure while locking customers into their backup software.

On traditional deduplicated storage systems, long-term retention requires costly metadata storage. Regardless of a scale-up or scale-out architecture, the system needs reliable, high-performance access to deduplicated metadata. That’s why so many deduplicated systems now store their metadata on expensive solid-state drives (SSDs). Even when it archives backups to cloud, the system still needs to retain the metadata, so it can retrieve those backups. Therefore, even as it moves data blocks to less expensive storage, long-term retention consumes even more high-cost metadata storage

Long-term retention doubles the storage consumption because deduplicated systems can’t share metadata between the active and retention tiers. Deduplication shares common blocks across all the backups…until long term retention. Customers need to retain a complete image of the most recent backups on fast storage, so the system cannot tier shared blocks to slower storage. Meanwhile, since the cloud tier is a remote bolt-on, for resiliency, it also needs a copy of all the blocks. Therefore, long-term retention in the cloud re-duplicates shared blocks and creates a separate deduplication domain. The result — more data and even more expensive metadata.

Finally, backup software always stores your data in a proprietary format. The backup catalog tracks your backup metadata and without it, you’ve got unusable bits on tape/disk/cloud. Furthermore, the metadata in the backup stream (permissions, extended attributes, etc.) can only be interpreted by the backup agents. When backups were written to tape or VTL, proprietary formats were inevitable, and so was vendor lock-in.

A metadata-optimized backup architecture

Since metadata constrains long-term retention, a modern backup architecture must take an innovative approach to metadata. The traditional approach of storing deduplicated metadata with the data on fixed resources does not work. Neither does creating a proprietary backup catalog that is separate from the deduplication metadata. Therefore, the foundation of the next-generation long-term retention architecture is a centralized backup metadata store.

A modern architecture unifies the metadata management, splits it from the data, and scales dynamically. By unifying the metadata, the architecture can make informed decisions about data placement. By separating metadata and data, it can independently optimize the storage of both. Dynamic scaling enables the system to meet the customers’ needs without incurring overprovisioning costs.

The next-generation architecture builds optimized long-term retention backups on top of a metadata-optimized backup store.

Archive only cold data blocks: With all of the metadata, the system can identify the blocks unique to the LTR backups and move only those to the lowest cost storage. Customers can now have true global deduplication across their backups.

Minimize metadata storage: With a global deduplication namespace, there is no need to double the metadata storage.

Enable rapid search: With dynamic scaling, the system can support high priority metadata search and recovery across all their backups, without permanently overprovisioning resources.

Better privacy support: Rapid search means that customers can find and redact files that are not supposed to be stored (e.g., GDPR’s right to be forgotten), so they do not get accidentally restored or accessed in the future.

Reduce vendor lock-in: Efficient search enables customers to find and extract the data they need from long term retention backups, so they do not need to retain the full historical versions... or the backup vendor that made them.

By optimizing for metadata, the next-generation backup architecture transforms how customers store and retrieve data from long-term retention backups. It also lays the framework for better managing privacy and freeing customers from backup vendor lock-in.

Druva — a metadata-optimized backup system

Druva built a metadata-optimized backup architecture in the cloud. Druva stores backup and deduplication metadata together in a high-performance metadata store. The backup data is stored separately, as objects that can seamlessly move between Amazon object storage tiers. Metadata and data can scale up and down, on-demand, so Druva can always meet the customers’ needs — without them even knowing what’s happening.

Druva’s architecture simplifies long-term retention of backups. To activate long-term retention on a data set, customers simply click a button. Druva automatically identifies the data to tier, pushes the cold data from Amazon S3 storage to Amazon S3 Glacier Deep Archive, and, when needed, retrieves the data for the customers. To make long-term retention even simpler, Druva offers a standard cost reduction for LTR backups — as soon as they enable the long-term retention option. Customers don’t need to worry about tracking how many blocks are migrating to the lower cost storage, optimizing cloud infrastructure, or paying unexpected cloud fees.

Druva’s metadata-optimized backup system makes long-term backup retention simple, effective, and inexpensive.

Conclusion

Long-term retention is one of a backup team’s most difficult challenges.
With the right architecture, however, long-term retention becomes a natural extension of the solution, rather than an almost impossible challenge to surmount. A metadata-optimized backup architecture in the cloud can solve your long-term retention challenges and put your company in a position to extract even more value from your backups. With a modern architecture, long-term retention of backups gives you nothing to fear.

Leave behind complex and costly data protection solutions that aren’t built for the cloud. Learn more about the Druva Cloud Platform and how you can start saving time and money with your data protection.

Long-term retention of backups — a new architecture

Why have long-term retention?

The challenges of long-term retention

Metadata is the root cause of LTR backup pain

A metadata-optimized backup architecture

Druva — a metadata-optimized backup system

Conclusion

Druva Blog: Cloud Technology & Data Protection Articles

The Druva Platform

Use Cases

Industries

Druva vs. Competitors

Company

Druva is a Gartner® Magic Quadrant™ Leader — Again.