Platform
- Data Resiliency Cloud
  Data Resiliency Cloud
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Resiliency
  Cyber Resiliency
  Be ready for cyber attacks with data that is always safe, always ready
  - Accelerated Ransomware Recovery
  - Security Posture & Observability
- Governance & Compliance
  Governance & Compliance
  Secure, protect, and streamline data governance for all your critical data, wherever it lives
  - eDiscovery and Legal Hold
  - Sensitive Data Management
- Take a Tour
Solutions
- Business Drivers
  Business Drivers
  Learn how Druva helps you accelerate key business initiatives
- SaaS Applications
  SaaS Applications
  Druva provides comprehensive data protection that supports multiple SaaS applications from a single platform. Discover the Druva difference today.
- Enterprise Workloads
  - Virtualization
    Virtualization
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Nutanix
  - Databases
    Databases
    Reduce the cost and complexity of data protection for enterprise databases
    
    Oracle
    
    MS SQL
    
    SAP HANA
  - Files
    Files
    Discover a more cost-efficient way to protect on-premises and cloud NAS
    
    NAS/files
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Microsoft Azure
- Enterprise Endpoints
  Enterprise Endpoints
  Unify SaaS apps and end-user device protection to reduce data risks. Improve cyber resilience and compliance by protecting enterprise workloads and assets.
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- 2023 Gartner® Magic Quadrant™
  See why Druva is recognized as a Visionary
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Blog
  - Diversity, Equity & Inclusion
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language

Tech/Engineering

What is Data Deduplication: Meaning of Moving Data to the Cloud

March 24, 2015 W. Curtis Preston, Chief Technology Evangelist

What is data deduplication?

If you work in IT and are responsible for backing up or transferring large amounts of data, you’ve probably heard the term data deduplication. In this blog, we’ll be providing a clear definition of what “data duplication” means, and why it is a fundamental requirement in migrating your organization’s data to the cloud.

First, the basics

At its simplest definition, data deduplication refers to a technique for eliminating redundant data in a data set. In the process of deduplication, extra copies of the same data are deleted, leaving only one copy to be stored. The data is analyzed to identify duplicate byte patterns and ensure the single instance is indeed the only file. Then, duplicates are replaced with a reference that points to the stored chunk.

Given that the same byte pattern may occur dozens, hundreds, or even thousands of times — think about the number of times you make only small changes to a PowerPoint file — the amount of duplicate data can be significant. In some companies, 80% of corporate data is duplicated across the organization. Reducing the amount of data to transmit across the network can save significant money in terms of storage costs and backup speed — in some cases, up to 50 percent.

A real-world example

Consider an email server that contains 100 instances of the same 1 MB file attachment, for example a sales presentation with graphics sent to everyone on the global sales staff. Without data duplication, if everyone backs up his email inbox, all 100 instances of the presentation are saved, requiring 100 MB of storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is referenced back to the one saved copy, reducing storage and bandwidth demand to only 1 MB.

Data deduplication solutions evolve to meet the need for speed

While data deduplication is a common concept, not all deduplication techniques are the same. Early breakthroughs in data deduplication were designed for the challenge of the time — reducing storage capacity and bringing more reliable data backup to servers and tape. One example is Quantum’s use of file-based or fixed-block-based storage, which focused on reducing storage costs. Appliance vendors like Data Domain further improved on storage savings by using target-based- and variable block-based techniques that only required backing up changed data segments rather than all segments. This provided yet another layer of efficiency to maximize storage savings.

As data deduplication efficiency improved, new challenges arose. How do you backup more and more data across the network without impacting overall network performance? Avamar addressed this challenge with variable block deduplication and source-based deduplication, compressing data before it ever left the server and reducing network traffic, the amount of data stored on disk, and the time to back up. With this step forward, deduplication became more than simply storage savings; it addressed overall performance across networks, ensuring that even in environments with limited bandwidth, data had a chance to be backed up in a reasonable time.

Another step function improvement to data deduplication was achieved by Druva when it addressed data redundancies at object level (versus file level), and solved for deduplication across distributed users at a global scale.

Advances in global data deduplication to manage massive volumes of data

By the early 2000s, business data was moving global, real-time, and mobile. IT teams were challenged to back up and protect massive volumes of corporate data across a range of endpoints and locations with increased efficiency and scale. To address this challenge, Druva pioneered the revolutionary concept of “app-aware” deduplication, which analyzes data at the file object level to identify file duplicates in attachments, emails, or even down to their origin folder. The approach added significant gains in accuracy and performance for data backups, lowering the barrier for companies to efficiently manage and protect large volumes of data.

Data deduplication offers a new foundation for data governance

Today, as cloud adoption reaches a tipping point and companies are increasingly moving their data storage to virtual cloud environments, data deduplication plays a more strategic role than simply saving on storage costs. In combination with cloud-based object storage architecture, efficient data deduplication opens up new opportunities to do more with stored data.

A key example is data governance. With global data deduplication techniques, massive volumes of data can be backed up and stored in the cloud, and made available to IT (and the C-Suite) to address compliance, data regulation, and real-time business insights. This is done by creating a time-index file system that stores only the unique data required using metadata. The time-indexed view of data means that you now have historical context for information, and data is always indexed and ready for forensics teams. This is a radical departure from the traditional “backup to the graveyard” approach, which is written as a serial stream of incremental or full backups. Additionally, being able to understand and analyze data in common among a set of users helps IT understand data usage patterns and further optimize data redundancies across users in distributed environments.

Today, advanced data deduplication is helping address two competing forces that threaten to impede fast-growing enterprise businesses today — managing the massive increase in corporate data created outside the traditional firewall, and solving for the growing need to govern data across its lifecycle by time zone, user, devices, and file types.

Why Druva leads in its approach to global data deduplication

Druva’s patented global data deduplication approach has four unique attributes:

It is performed on the client (versus the server), thereby reducing the amount of data needed to be shipped over the network
The analysis is done at the sub-file or block-level to find duplicate data within a file
It is aware of the applications from which data is generated, looking inside files to find duplicate data
Druva’s deduplication scales beyond a single user to find duplicate data (say, an email sent to an entire organization) across multiple users and devices

Ready to get started with data deduplication for your organization? Discover Druva’s global data deduplication and get a demo today!

What is Data Deduplication: Meaning of Moving Data to the Cloud

What is data deduplication?

First, the basics

A real-world example

Data deduplication solutions evolve to meet the need for speed

Advances in global data deduplication to manage massive volumes of data

Data deduplication offers a new foundation for data governance

Why Druva leads in its approach to global data deduplication

Blog

Druva Data Resiliency Cloud

Cloud Backup & Recovery

Data Protection

Governance & Compliance

Cyber Resilience

Business drivers

Workloads

Partners

Customers

Resources

Company