Use Cases
- AI Resilience
  - AI Resilience
  - AI
    - AI
    - Claude
    - Copilot
    - MCP
  - Endpoints
    - Endpoints
    - Endpoints
- Cloud Native
  - Cloud Native
  - AWS
    - AWS
    - Amazon EC2
    - Amazon RDS
    - Amazon S3
    - Amazon EFS
  - Microsoft & Azure
- Data Center
  - Data Center
  - Virtualization
    - Virtualization
    - VMware
    - Hyper-V
    - Nutanix
  - Databases
  - Unstructured Data
    - Unstructured Data
    - NAS
- SaaS Apps
- Adopt AI with Confidence
  Recover, govern, defend, and accelerate AI data, workflows, and operations
  
  Accelerate Cyber Resilience
  Reduce costs, accelerate cyber recovery and simplify management
  
  Secure Multi-Cloud Environments
  Secure data within AWS/Azure or across clouds without hardware headaches
  
  Modernize Data Protection
  Data protection for data centers, cloud workloads, SaaS apps, and edge devices
Why Druva
- The Druva Difference
  The Druva Difference
- About Druva
  About Druva
- Explore
  Explore
  - Customers
  - Careers
  - Events
  - Newsroom
  - Blog
- Customer Spotlight
  
  ZS Associates cuts recovery from days to just hours
  Case Study
  
  Contact Us
  
  Our experts are here to help.
  Reach out
Products
- The Resilience Cloud
  The Resilience Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
  Dru AI
  Ensure backup health and trends, accelerate troubleshooting using Agentic AI
  
  Dru Metagraph
  
  Dru SRE Agent
- Dru AI
  Dru AI
  Ensure backup health and trends, accelerate troubleshooting using Agentic AI
  - Dru Metagraph
  - Dru SRE Agent
- AI Resilience
  AI Resilience
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Identity Resilience
  Identity Resilience
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- eDiscovery & Compliance
  eDiscovery & Compliance
  Ensure compliance and accelerate eDiscovery with Druva’s cloud-native SaaS. Instantly search backup data, apply legal holds, and simplify governance.
  - eDiscovery & Legal Hold
  - Compliance & Sensitive Data Governance
- Data Resilience
  Data Resilience
  Discover Druva's data resilience solutions to protect, backup, and recover your enterprise data effortlessly in the cloud. Ensure business continuity with secure, scalable, and automated data protection solutions.
- Cyber Resilience
  Cyber Resilience
  Explore Druva's cyber resilience framework featuring real-time threat insights and 24/7 managed data detection
Learning Center
- Resource Library
  Resource Library
- Explore
- Product Resources
- Druva is a 2026 Gartner® Magic Quadrant™ Leader
  Get the Report
  
  Switch to Druva, Reduce TCO by up to 40%
  Calculate Your Savings
Partners
- Alliances
  Alliances
  - AWS
  - Dell
  - Microsoft
- Ecosystem
  Ecosystem
  - Security Integrations
  - Technology Partners
- Value Added Resellers
  Value Added Resellers
- Managed Service Providers
  Managed Service Providers
- Partner Portal
  - Partner Portal Login
  - Managed Service Center
- Join Our Partner Network
  
  Deliver cyber resilience with ZERO hardware, ZERO infrastructure, ZERO hassle
  Apply now
  
  Druva Marketplace
  
  Discover trusted integrations to extend Druva and simplify your cyber resilience workflows.
  Explore the Marketplace
Get Started
Search queries sent to third parties.
Support
Login

Tech/Engineering

From Backup to Intelligence: Building a Security Data Lakehouse with Apache Iceberg

May 04, 2026 Anand Apte, Distinguished Engineer and Shubham Deshmukh, Sr. Principal Engineer

Every day, Druva processes tens of millions of backups—hundreds of petabytes that most treat as a dormant insurance policy. While backup has traditionally been treated like an insurance policy, something you hope you never have to use, we see it differently.

When a security incident unfolds, this "insurance policy" becomes the most valuable asset in the room. It effectively acts as a forensic black box: a continuous, versioned record of every file and every change across every device leading up to the breach.

The challenge we faced was that traditional backup systems were never designed to be read this way.

They were built to answer Recovery questions: “Give me this file from yesterday.” They stumble when asked Security questions: “Show me every device where this specific malicious hash has appeared in the last 90 days.” To bridge this gap, we had to stop thinking about backups as snapshots and start thinking about them as intelligence.

Why is traditional backup snapshot mounting too slow for incident response?

In the world of traditional backup vendors, performing a deep security search across historical data is a heavy lift. Because their data is stored in proprietary, static snapshots, they often require you to mount a snapshot to see what’s inside.

It is infrastructure-heavy, painfully slow, and increasingly difficult to scale at large data volumes. During a live attack, you don't have hours to wait for a snapshot to mount; you need answers in seconds.

The Shift: A Metadata-First Architecture

We realized that to find threats at scale, we shouldn't have to "open" the backup at all. We needed a way to query the metadata, the DNA of the files directly.

We turned to Apache Iceberg on AWS S3 Tables. By building a Security Data Lakehouse, we decoupled the intelligence from the storage. Instead of a series of disconnected snapshots, we created a continuously evolving, versioned timeline of file state. Together, this stack provided:

ACID Transactions: Ensuring data consistency as millions of backups land daily.
Native Time Travel: Allowing us to "rewind" the environment to any specific second to see exactly when a file arrived.
Columnar Performance: Making metadata queries lightning-fast without ever needing to mount or re-hydrate the actual backup data.

The Foundation: A Versioned View of Reality

The key architectural shift was moving from independent snapshots to a continuously evolving timeline. Each file is modeled as a time-bounded lifecycle: when it appears, how it changes, and when it is no longer present.

What previously required stitching together multiple snapshots manually becomes a set of bounded queries:

What did this system look like before the incident?
When did this file first appear?
Is it still present?

These are no longer recovery operations; they are analytical queries over versioned data. This shift from snapshots to timelines is what makes large-scale forensic analysis practical.

Ingestion: Continuous Change Processing

Rather than relying on periodic batch pipelines, we process data incrementally. Across multiple sources, including file activity streams and threat intelligence feeds, we continuously ingest and apply changes to maintain an up-to-date view of the system. This approach allows us to:

Avoid expensive full refreshes.
Keep ingestion latency low.
Maintain consistency between historical and current state.

How does a Security Data Lakehouse accelerate threat hunting?

By treating our metadata as a high-performance analytical dataset, we transformed the backup process into a dual-purpose security engine.

1. Threat Watch (Proactive Detection)

Threat Watch is our "always-on" sentinel. As data is backed up, our Iceberg-based engine performs continuous, incremental scanning against a live library of global threat signatures (IOCs) in near real-time.

Impact: If a match is found, the system can Auto-Quarantine the affected snapshot, ensuring you don't accidentally invite the intruder back into your environment during a restore.

2. Threat Hunt (The Forensic Investigator)

When your SOC identifies a new "bad hash," Threat Hunt allows you to go on the offensive. Because our metadata is indexed and queryable, you can run a Global Search to find that signature across your entire history.

Impact: You can instantly map the Blast Radius, finding where a file landed first and how far it spread, turning days of manual forensic labor into a single analytical query.

From Insurance to Intelligence

This architecture represents a fundamental shift in the value of a backup. We have moved from a passive recovery model to an active source of security truth.

Every backup contributes to a growing, searchable intelligence pool.
No infrastructure overhead: Search across time and thousands of workloads without mounting a single volume.
Clean Recovery: Transition from "I hope this works" to "I know this is clean."

At Druva’s scale, this means turning hundreds of petabytes of “insurance” data into actionable insights. Backup is no longer just the last line of defense; it is the most powerful tool in your security stack for detecting and responding to threats.

How Resilient is Your Backup?

Take our 5-minute Cyber Resilience Assessment to identify gaps in your current forensic and recovery workflows.

Take the 5-Minute Assessment | Watch Demo Video

FAQs: Security Data Lakehouse Architecture

What is a Security Data Lakehouse in backup?

It is a centralized architecture that decouples metadata from storage, allowing you to query backup data like a database without mounting individual snapshots.

Why use Apache Iceberg for cyber resilience?

Iceberg provides ACID transactions and native time travel, enabling forensic teams to "rewind" to any second and see exactly when a file was first compromised.

How does metadata-first search speed up threat hunting?

By indexing the "DNA" of files, you can search for malicious hashes across petabytes of history in seconds, rather than hours spent mounting traditional backups.

From Backup to Intelligence: Building a Security Data Lakehouse with Apache Iceberg

Why is traditional backup snapshot mounting too slow for incident response?

The Shift: A Metadata-First Architecture

The Foundation: A Versioned View of Reality

Ingestion: Continuous Change Processing

How does a Security Data Lakehouse accelerate threat hunting?

1. Threat Watch (Proactive Detection)

2. Threat Hunt (The Forensic Investigator)

From Insurance to Intelligence

How Resilient is Your Backup?

FAQs: Security Data Lakehouse Architecture

What is a Security Data Lakehouse in backup?

Why use Apache Iceberg for cyber resilience?

How does metadata-first search speed up threat hunting?

Druva Blog: Cloud Technology & Data Protection Articles

The Druva Platform

Use Cases

Industries

Druva vs. Competitors

Company

Druva is a Gartner® Magic Quadrant™ Leader — Again.