Platform
- Data Resiliency Cloud
  Data Resiliency Cloud
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Resiliency
  Cyber Resiliency
  Be ready for cyber attacks with data that is always safe, always ready
  - Accelerated Ransomware Recovery
  - Security Posture & Observability
- Governance & Compliance
  Governance & Compliance
  Secure, protect, and streamline data governance for all your critical data, wherever it lives
  - eDiscovery and Legal Hold
  - Sensitive Data Management
- Take a Tour
Solutions
- Business Drivers
  Business Drivers
  Learn how Druva helps you accelerate key business initiatives
- SaaS Applications
  SaaS Applications
  Druva provides comprehensive data protection that supports multiple SaaS applications from a single platform. Discover the Druva difference today.
- Enterprise Workloads
  - Virtualization
    Virtualization
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Nutanix
  - Databases
    Databases
    Reduce the cost and complexity of data protection for enterprise databases
    
    Oracle
    
    MS SQL
    
    SAP HANA
  - Files
    Files
    Discover a more cost-efficient way to protect on-premises and cloud NAS
    
    NAS/files
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Microsoft Azure
- Enterprise Endpoints
  Enterprise Endpoints
  Unify SaaS apps and end-user device protection to reduce data risks. Improve cyber resilience and compliance by protecting enterprise workloads and assets.
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- 2023 Gartner® Magic Quadrant™
  See why Druva is recognized as a Visionary
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Blog
  - Diversity, Equity & Inclusion
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language

Tech/Engineering, Innovation Series

How Druva backs up large data sources in constrained memory environments

April 11, 2022 Kush Shukla, Principal Engineer

In our last blog post, we looked at the problems and respective solutions in porting our core data pipeline to the Rust language. While discussing, we explained the concepts as well as the motivation behind undertaking such a project. One key requirement/expectation (as also mentioned in the last article) from the data pipeline is the ability to scale with the device resources, particularly in terms of memory. In this article, we will address this requirement and look deeper into the problem associated with the backup of large data sources from devices with constrained memory, and our solution to overcome this problem.

The sheer volume of data generated by organizations on a daily basis is increasing rapidly. Organizations implement Druva to back up and protect the generated data. While there are multiple resources generating data around the clock, the resources employed by customers to back up are limited. The limitation of resources in customer environments refers to both hardware and the time they could be used for running the backup operation. This business need makes it imperative for Druva products to utilize the resources of the customer’s system efficiently so that maximum data can be backed up within the correct timeframe.

Druva agents running on the customer environment read the data from the source into the buffer. The data at the source is usually large which can’t be uploaded to the Druva Data Resiliency Cloud at once. Therefore, the data is chunked and is sent to the Druva Cloud, and the process of chunking the data, reading it into the buffer, and uploading it to the Cloud in an asynchronous fashion is taken care of by the core data pipeline.

The amount of data that can be read into the buffer is limited by the available memory. On top of that, there may be other processes running on the same device which will require their own share of the memory. Therefore, there we provide the facility in our backup data pipeline which can be configured to limit memory usage.

How do we manage memory consumption in the backup pipeline?

We utilize semaphore for memory management in our backup pipeline. Semaphore is a type of synchronization primitive that is used to control the access to a common resource accessed by multiple threads to avoid critical section problems in concurrent systems.

Our backup data pipeline uses the tokio crate in the Rust ecosystem to provide the runtime for the execution of concurrent asynchronous network I/O tasks. The common resource is the memory, and we use Semaphore to control the access of this resource from various concurrent tasks in our pipeline.

Below is the code snippet expressing the struct definition of our allocator:

struct AllocatorInner { guard: Arc<Semaphore> } pub struct Allocator { inner: Arc<AllocatorInner>, limit: usize } impl Allocator { pub fn new(limit: usize) -> Self { let limit = limit*1024*1024; let inner = AllocatorInner { guard: Arc::new(Semaphore::new(limit)) } Allocator { inner: Arc::new(inner), limit } }

Allocator is constructed from the input memory limit (in MegaByte). The limit is converted into bytes and cached in the allocator struct as well as the same limit for initializing the available permits for the semaphore.

When an allocation call is submitted from the agent, the allocator tries to acquire the permits equivalent to the size requested from the agent. If the permits are available in the semaphore, the caller can allocate the space, otherwise, the call is blocked until the permits are available.

The snippet below displays the call which blocks the task to acquire the permits equivalent to the requested size from the agent:

let permit = self.inner.guard.acquire_many(sz as u32).await.unwrap();

Once the memory is allocated, the agent fills up the data into the allocated memory buffer and submits it to the data pipeline for subsequent processing and upload. Once the upload is done, the allocated memory is freed and the permits are added back into the semaphore.

Below is the code snippet which adds permits back into the semaphore:

self.inner.guard.add_permits(sz);

After the permits are added back into the semaphore, allocate calls waiting to acquire permits from the semaphore will receive the permits and can proceed with the workflow.

Key takeaways

The semaphore-based solution we used here for the memory allocator is an alternative memory management technique as compared to conventional allocators. Conventional allocators are complex because they approach memory management problems by efficiently allocating/deallocation of memory. However, the presented semaphore allocator is simple as it manages the memory by guarding the memory resource. This simple approach helped us to fulfill our business requirement of throttling on the memory resource. With the above solution in place, we were able to back up large volumes of data (in magnitudes of TBs) at the source through devices with memory constraint limits of 2GB.

Next steps

Learn more about the technical innovations and best practices powering cloud backup and data management. Visit the Innovation Series section of Druva’s blog archive.

Join the team!

Looking for a career where you can shape the future of cloud data protection? Druva is the right place for you! Collaborate with talented, motivated, passionate individuals in a friendly, fast-paced environment; visit the careers page to learn more.

About the author

Kush Shukla is a Polyglot Lead Code Wizard and Brogrammer at Druva. He leads development efforts and loves building tools around developer productivity.

How Druva backs up large data sources in constrained memory environments

How do we manage memory consumption in the backup pipeline?

Key takeaways

Next steps

Join the team!

About the author

Blog

Druva Data Resiliency Cloud

Cloud Backup & Recovery

Data Protection

Governance & Compliance

Cyber Resilience

Business drivers

Workloads

Partners

Customers

Resources

Company