Innovation Series, Tech/Engineering

How Druva backs up large data sources in constrained memory environments

In our last blog post, we looked at the problems and respective solutions in porting our core data pipeline to the Rust language. While discussing, we explained the concepts as well as the motivation behind undertaking such a project. One key requirement/expectation (as also mentioned in the last article) from the data pipeline is the ability to scale with the device resources, particularly in terms of memory. In this article, we will address this requirement and look deeper into the problem associated with the backup of large data sources from devices with constrained memory, and our solution to overcome this problem.

The sheer volume of data generated by organizations on a daily basis is increasing rapidly. Organizations implement Druva to back up and protect the generated data. While there are multiple resources generating data around the clock, the resources employed by customers to back up are limited. The limitation of resources in customer environments refers to both hardware and the time they could be used for running the backup operation. This business need makes it imperative for Druva products to utilize the resources of the customer’s system efficiently so that maximum data can be backed up within the correct timeframe.

Druva agents running on the customer environment read the data from the source into the buffer. The data at the source is usually large which can’t be uploaded to the Druva Data Resiliency Cloud at once. Therefore, the data is chunked and is sent to the Druva Cloud, and the process of chunking the data, reading it into the buffer, and uploading it to the Cloud in an asynchronous fashion is taken care of by the core data pipeline. 

The amount of data that can be read into the buffer is limited by the available memory. On top of that, there may be other processes running on the same device which will require their own share of the memory. Therefore, there we provide the facility in our backup data pipeline which can be configured to limit memory usage. 

How do we manage memory consumption in the backup pipeline?

We utilize semaphore for memory management in our backup pipeline. Semaphore is a type of synchronization primitive that is used to control the access to a common resource accessed by multiple threads to avoid critical section problems in concurrent systems.

Our backup data pipeline uses the tokio crate in the Rust ecosystem to provide the runtime for the execution of concurrent asynchronous network I/O tasks. The common resource is the memory, and we use Semaphore to control the access of this resource from various concurrent tasks in our pipeline.

Below is the code snippet expressing the struct definition of our allocator:

struct AllocatorInner {
    guard: Arc<Semaphore>
}
pub struct Allocator {
      inner: Arc<AllocatorInner>,
      limit: usize
}


impl Allocator {
pub fn new(limit: usize) -> Self {
        let limit = limit*1024*1024;
        let inner = AllocatorInner {
            guard: Arc::new(Semaphore::new(limit))
        }
        Allocator {
            inner: Arc::new(inner),
            limit
        }
    }

}

Allocator is constructed from the input memory limit (in MegaByte). The limit is converted into bytes and cached in the allocator struct as well as the same limit for initializing the available permits for the semaphore.

When an allocation call is submitted from the agent, the allocator tries to acquire the permits equivalent to the size requested from the agent. If the permits are available in the semaphore, the caller can allocate the space, otherwise, the call is blocked until the permits are available.

The snippet below displays the call which blocks the task to acquire the permits equivalent to the requested size from the agent:

let permit = self.inner.guard.acquire_many(sz as u32).await.unwrap();

Once the memory is allocated, the agent fills up the data into the allocated memory buffer and submits it to the data pipeline for subsequent processing and upload. Once the upload is done, the allocated memory is freed and the permits are added back into the semaphore.

Below is the code snippet which adds permits back into the semaphore:

self.inner.guard.add_permits(sz);

After the permits are added back into the semaphore, allocate calls waiting to acquire permits from the semaphore will receive the permits and can proceed with the workflow.

Key takeaways

The semaphore-based solution we used here for the memory allocator is an alternative memory management technique as compared to conventional allocators. Conventional allocators are complex because they approach memory management problems by efficiently allocating/deallocation of memory. However, the presented semaphore allocator is simple as it manages the memory by guarding the memory resource. This simple approach helped us to fulfill our business requirement of throttling on the memory resource. With the above solution in place, we were able to back up large volumes of data (in magnitudes of TBs) at the source through devices with memory constraint limits of 2GB.

Next steps

Learn more about the technical innovations and best practices powering cloud backup and data management. Visit the Innovation Series section of Druva’s blog archive.

Join the team!

Looking for a career where you can shape the future of cloud data protection? Druva is the right place for you! Collaborate with talented, motivated, passionate individuals in a friendly, fast-paced environment; visit the careers page to learn more.

About the author

Kush Shukla is a Polyglot Lead Code Wizard and Brogrammer at Druva. He leads development efforts and loves building tools around developer productivity.