Platform
- Data Security Cloud
  Data Security Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Response & Recovery
  Cyber Response & Recovery
  Bounce back from cyber attacks with data that is always safe and ready.
- eDiscovery & Compliance
  eDiscovery & Compliance
  Secure, protect, and streamline data governance.
- Meet Dru - Your Copilot for Data Security
Solutions
- Use Cases
  Use Cases
  Learn how Druva helps you accelerate key business initiatives
- Key Technologies
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Azure
  - Hybrid Workloads
    Hybrid Workloads
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Hyper-V
    
    Nutanix
    
    Oracle
    
    MS SQL
    
    SAP HANA
    
    NAS/files
  - Endpoint and SaaS Apps
    Endpoint and SaaS Apps
    Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
    
    End User Protection
    
    Microsoft 365
    
    Salesforce
    
    Google Workspace
    
    Microsoft Entra ID
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- Druva vs. Veeam TCO Calculator
  Find the hidden costs of legacy backup
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Diversity, Equity & Inclusion
  - Blog
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language
- English
- Deutsch

Tech/Engineering

Accumulation of Data in Push Stream-based Workloads During Backup

June 05, 2023 Pratyush Gupta, Staff Software Engineer

Introduction

Data backup refers to the process of creating copies of the data on a device that can be used for recovery in case the original data is lost or corrupted. Druva supports the data backup of various workloads like virtual machines, databases, filesystems, and so on.

At Druva, a backup of the workload involves an agent running on the device and it uses a data mover to upload data to Druva’s cloud storage. Data mover, as the name suggests, helps in the movement of the data from the source device to the cloud storage. Depending on the type of workload, the data mover can have various responsibilities like traversing files on the device, reading/writing files from/to the device, downloading/uploading data from/to the cloud, etc. Druva’s data mover uses Rust-based storage APIs to communicate with the Druva cloud storage.

What are Push Stream-based Workloads?

Relational database applications like Oracle, SAP HANA, etc. have developed native tools that can backup, restore, and recover database files. These native backup tools follow a push stream model. The tool reads the database data and log volumes. It then pushes the data stream to the backup vendor’s implementation of their published APIs. This model differs when compared to other workloads such as file servers, or NAS where the backup vendor is in control of the entire backup process that includes snapshotting and reading/writing the application objects.

Responsibilities of the Data Mover for Push Stream-based Workloads

Back up files to the cloud storage by accumulating the incoming data blocks.
Back up file metadata and attributes to the cloud storage.
List all the versions of a file present across multiple snapshots on the cloud storage.
Fetch the details of a specific version of a previously backed-up file.
Restore a specific version of a file from the cloud storage.
Manage concurrency during backups.
Manage stats during backups and restores.

Need for Accumulation of the Data During Backup

For push stream-based workloads, the application itself orchestrates the backup using its native backup tool. The Druva agent for such workloads primarily focuses on two tasks during backup:

Reading the blocks from the stream where the application is pushing the data and uploading those blocks to the storage using the data mover. (The size of the blocks that are read from the stream is controlled by the agent. There can be different sizes during the tenure of the backup.)
Determine the size of the block that should be uploaded to the storage from the set of block sizes supported by the storage.

The configurable nature of the size of the blocks helps in achieving high throughput and low network latency during the backup. But it also means that the size of the blocks the data mover receives from the agent, and the size of the blocks the data mover needs to upload to the storage, may vary depending on the configurations set by the agent for these two tasks. To address this, the data mover accumulates the incoming data blocks from the agent and uploads them to the storage when the pre-decided upload size is reached.

How is the Data Accumulated?

The accumulator component in the data mover is responsible for data accumulation. The data mover maintains a thread-safe cache of the file handle and its accumulator. The accumulator allocates a buffer to accumulate the data received for the file. The accumulator keeps a virtual and actual offset to track and maintain the block alignment between the local buffer and the cloud storage.

When the accumulator receives a data block, it checks the offset of the block against the virtual offset. The accumulator will detect and prevent non-sequential writes and will pad the buffer whenever necessary to maintain the virtual block alignment. At any point, if the allocated buffer is full, the accumulator will upload the buffer to the storage and allocate a new buffer. Finally, when the block offset is equal to the virtual offset, the accumulator will perform the following operations:

Calculate the size that can be copied to the buffer.
Depending on the size of the block, either add the entire block to the buffer or split the block into two chunks.
Add the first chunk into the buffer, and increment the virtual offset.
Upload the buffer to the storage using the storage API, and increment the actual offset.
Allocate a new buffer.
Repeat this entire process for the other chunk.

When the data mover receives a commit call for a file, it will instruct the accumulator of that file to upload the remaining buffer into the storage. The accumulator will verify the virtual and actual offsets after the final upload and return the response to the data mover accordingly. If there are no errors, the data mover will commit the file to storage and remove the entry for that file from its cache. If the data mover faces any errors during the backup, it ensures that the accumulator will empty its buffer and release the allocated memory.

Next Steps

Learn more about the technical innovations and best practices powering cloud backup and data management in the engineering section of Druva’s blog archive.

Accumulation of Data in Push Stream-based Workloads During Backup

Introduction

What are Push Stream-based Workloads?

Responsibilities of the Data Mover for Push Stream-based Workloads

Need for Accumulation of the Data During Backup

How is the Data Accumulated?

Next Steps

Blog

Druva Data Security Cloud

The Druva Platform

Data Protection

Cyber Response & Recovery

eDiscovery & Compliance

Use Cases

Key Technologies

Customers

Resources

Partners

Company