Platform
- Data Resiliency Cloud
  Data Resiliency Cloud
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Resiliency
  Cyber Resiliency
  Be ready for cyber attacks with data that is always safe, always ready
  - Accelerated Ransomware Recovery
  - Security Posture & Observability
- Governance & Compliance
  Governance & Compliance
  Secure, protect, and streamline data governance for all your critical data, wherever it lives
  - eDiscovery and Legal Hold
  - Sensitive Data Management
- Take a Tour
Solutions
- Business Drivers
  Business Drivers
  Learn how Druva helps you accelerate key business initiatives
- SaaS Applications
  SaaS Applications
  Druva provides comprehensive data protection that supports multiple SaaS applications from a single platform. Discover the Druva difference today.
- Enterprise Workloads
  - Virtualization
    Virtualization
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Nutanix
  - Databases
    Databases
    Reduce the cost and complexity of data protection for enterprise databases
    
    Oracle
    
    MS SQL
    
    SAP HANA
  - Files
    Files
    Discover a more cost-efficient way to protect on-premises and cloud NAS
    
    NAS/files
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Microsoft Azure
- Enterprise Endpoints
  Enterprise Endpoints
  Unify SaaS apps and end-user device protection to reduce data risks. Improve cyber resilience and compliance by protecting enterprise workloads and assets.
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- 2023 Gartner® Magic Quadrant™
  See why Druva is recognized as a Visionary
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Blog
  - Diversity, Equity & Inclusion
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language

Tech/Engineering, Innovation Series

Druva Innovation: Big Data Analytics In Data Protection

December 18, 2019 Preethi Srinivasan, Director of Innovation

Druva’s data protection solution has visibility into the lifeblood of your business – your data. Therefore, beyond protecting the data, Druva’s innovation team is focused on helping its customers with more than just backup and recovery. For example, we enable innovative capabilities and benefits for Druva customers such as metadata search within backed up data.

But first, before we could enable capabilities such as search, we needed the underlying capability to support search across the massive amount of data we protect for our customers. At Druva we perform more than 4 million backups every single day. This means search capabilities for backup event data must be enabled across an unprecedented scale. How did we create the foundation to enable search of the big data from the backup events we handle for customers?

Big data scalability for innovation

Conventional data management and querying techniques are not scalable enough to handle billions of backup events. While our competitors are confined to appliances and physical space, we live and thrive in the cloud. To transform backup event data into an asset, we built hyper-scalable, high performance big data analytics pipelines in the cloud. These pipelines ingest unstructured data about the billions of backup events and the utilization of the corresponding infrastructure and transform that for innovation into Druva products. One example is generating insights for compliance and eDiscovery.

Druva internal data analytics platform

As we designed our internal data analytics platform we had critical design criteria to consider:

Rapidly reduce storage costs
Serve up to 25K events/sec elastically
Fast deployment and iteration (of data pipelines)
Sub-second query response time for high interaction use cases

We built the platform using a suite of AWS services coupled with our custom solutions for faster and cost effective query processing. For instance, we built on-demand scaling to manage loads across the pipeline for ingestion, data partitioning, and query processing. Raw data is streamed real time and is ingested via AWS Kinesis. Running Spark on Amazon EMR with custom ETL (Extract, Transform and Load) management, the raw data is transformed, and partitioned. For fast, ad hoc queries, our query engine uses Presto, leveraging its distributed query engine capabilities on large datasets.

Our robustly designed and engineered platform achieved the following results:

Processing 100 million events/hour
Handling 400 thousand queries/hour

The internal big data analytics platform drives new Druva data protection capabilities.

Our elastic, cost-effective, and high performance internal data analytics platform runs in production at scale. The platform capabilities are extended throughout Druva products. These pipelines run under the hood to power advanced capabilities beyond backup protection functionality. For example, Druva’s Data Analytics Platform helps unlock the value of data stored through your Druva data protection platform, and it unleashes Druva’s data-driven solutions for our customers’ enterprise data such as the metadata search (MDS) capability.

To learn how Druva’s innovation in big data analytics is applied to enhanced capabilities in your data protection platform today read about the Druva product feature “Federated Search for Backed up Data” and the blog post.

Druva Innovation: Big Data Analytics In Data Protection

Big data scalability for innovation

Druva internal data analytics platform

Blog

Druva Data Resiliency Cloud

Cloud Backup & Recovery

Data Protection

Governance & Compliance

Cyber Resilience

Business drivers

Workloads

Partners

Customers

Resources

Company