Platform
- Data Security Cloud
  Data Security Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Response & Recovery
  Cyber Response & Recovery
  Bounce back from cyber attacks with data that is always safe and ready.
- eDiscovery & Compliance
  eDiscovery & Compliance
  Secure, protect, and streamline data governance.
- Meet Dru - Your Copilot for Data Security
Solutions
- Use Cases
  Use Cases
  Learn how Druva helps you accelerate key business initiatives
- Key Technologies
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    Amazon EC2
    
    Amazon RDS
    
    Azure
  - Hybrid Workloads
    Hybrid Workloads
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Hyper-V
    
    Nutanix
    
    Oracle
    
    MS SQL
    
    SAP HANA
    
    NAS/files
  - Endpoint and SaaS Apps
    Endpoint and SaaS Apps
    Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
    
    End User Protection
    
    Microsoft 365
    
    Salesforce
    
    Google Workspace
    
    Microsoft Entra ID
    
    Microsoft Dynamics 365
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- Druva vs. Veeam TCO Calculator
  Find the hidden costs of legacy backup
  
  Forrester: Total Economic Impact of Druva 2024
  Customers see 224% ROI: Find out how
Partners
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Diversity, Equity & Inclusion
  - Blog
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language
- English
- Deutsch

Tech/Engineering

Data Restore: How Druva Ensures Security and Safety

June 21, 2023 Anand Apte and Neeraj Thakur, Principal Engineer

Introduction

Modern data protection solutions provide simple and self-serve workflows to restore data. However, with the rising importance of cybersecurity, it is becoming increasingly vital to ensure that the data about to be restored is free of malware-infected content. One should also include extra protection for data restoration and data download workflows.

Hence, we at Druva sought to build a system that can pre-scan data for malware before it gets downloaded or restored. This blog talks about how we approached this problem and explains the cost-effective solution that we built to ensure that data is scanned for vulnerabilities.

Existing Data Restore/Download Workflow

Druva supports backup and recovery of varied workloads such as file servers, network attached storage (NAS) shares, and endpoints (laptops/desktops) to name a few. The restore workflow for these workloads follows a similar pattern.

The previous workflow for users to restore/download data from backed-up copies is as follows.

Users browse the folders for backed-up copies to identify the files they want to restore.
Users choose the restore target location and start the restore operation.
Appropriate Druva services take over and transfer the selected files to the target location.

This workflow needed to change to provide protection against malware.

The Malware Scan Solution Design

We wanted to introduce a scan stage where data selected for restoration is scanned for the presence of malware. The system needed to label infected files appropriately and create a list of all such files. Clean data should be marked as safe and should be restored to the specified location. This functionality was named Restore with Confidence.

Implementing the Pre-Restore Scan Feature

Trigger points and frequency for data restore requests are very sparse over a period of time and hard to predict. This makes it impossible to right-size the infrastructure needed to run the pre-restore malware scan operations we were interested in.

So we devised an on-demand infrastructure-based approach to provide predictable RTO (Recovery Time Objective) by paying for the infrastructure only when it is needed and getting used. Here’s the sequence of steps that the scan job follows.

The malware scan job is triggered when the user initiates a restore operation.
The job reads the content of each file marked for restoration from the Druva backed-up data storage.
The file contents pass through the malware scanner.
Based on the scan outcome, each file is tagged as clean/infected.

Depending on the size of the data set, the scan job duration may vary from a few minutes to multiple hours. The scan job is scheduled by a homegrown load balancer that is purpose-built for long-running jobs.

The load balancer uses a fleet of AWS spot instances which are allocated based on need. The load balancer ensures that it spawns spot instances and makes capacity available whenever needed. With this, be it a regular day with hundreds of restores, or high-traffic scenarios of thousands in less than an hour, the scaling requests get handled in a predictable manner.

Druva products store backed-up data in a proprietary storage structure (refer to Druva documentation for more details). This data is accessed for malware scan operation via APIs hosted by the storage services. These APIs accept file identifier information (file path and point-in-time snapshot for backed-up data) and return the file data back. This kind of API access provides an easy and flexible means to consume data and operate on it.

Malware scan is deployed as a scale-out service. It embeds an off-the-shelf scan engine which is responsible for scanning the file content and identifying whether it has traces of malware. The service provides a simple REST API to accept file content as part of the request and gives out the label (clean/infected) as a response.

A simple CPU utilization-based scaling policy ensures that the infrastructure needed for malware scan scales up whenever a scan job starts.

Finally, the catalog of clean and infected files in the context of a particular restore job is prepared and stored based on the scan outcome for individual files. This catalog is stored in the NoSQL database: AWS DynamoDB. The catalog serves multiple purposes. It is referred to during the data restore operation to figure out whether it is safe to fetch the data and present it to the user. The catalog is also used to build a report that provides insights to the user on malicious files found in the backed-up data.

After the scan workflow completes for the selected data, the usual data restore operation is initiated. During restoration post malware scan, the service checks the labels present in the catalog and makes sure that files marked as infected are dropped. This end-to-end workflow ensures that only clean and safe data is restored. Once the restore job completes, users can download a report highlighting the infected files that were dropped.

The workflow ensures that it is simple and easy for users to complete the task on their own. All that users need to do is to choose whether a pre-restore scan is needed when they trigger a regular data restore job.

With the adoption of services with robust APIs, Druva managed to build the pre-restore functionality and integrate it with recovery workflows for multiple workloads such as file servers, network attached storage (NAS) shares, and devices (laptops/desktops).

Takeaways from Implementing the Pre-Restore Scan Feature

Having the ability to launch infrastructure on-demand to start data scan activity is crucial. It helps keep costs in check as infrastructure is requested and paid for only when it is getting used. At the same time, it ensures that the job starts immediately providing a good RTO.
Malware scan tools traditionally work on files stored on local storage. This limits the options available for integration and scale. Druva needed integration with these tools as part of cloud-native applications. Deploying it as a RESTful service made it easier to integrate with various types of consumer services. Additionally, it ensured that the service scales on demand, and consumes resources only when required.
Services with robust APIs help engineering teams implement functionality that can be leveraged and integrated with multiple workflows.

Next Steps

To learn more about Druva’s technical innovations and how we deliver the best cloud-based backup and restore solution on the market, visit the tech/engineering section of the blog archive.

Data Restore: How Druva Ensures Security and Safety

Introduction

Existing Data Restore/Download Workflow

The Malware Scan Solution Design

Implementing the Pre-Restore Scan Feature

Takeaways from Implementing the Pre-Restore Scan Feature

Next Steps

Druva Blog: Cloud Technology & Data Protection Articles

Druva Data Security Cloud

The Druva Platform

Data Protection

Cyber Response & Recovery

eDiscovery & Compliance

Use Cases

Key Technologies

Customers

Resources

Partners

Company