Tech/Engineering

Parmanu: Infrastructure Autoscaler Service for Dynamic Load Processing

Swapnil Kesur, Staff Software Engineer and Kush Shukla, Principal Engineer

To foster rapid product development and deliver superior offerings to market, Continuous Integration and Continuous Deployment (CI-CD) practices have gained remarkable traction within product teams. At Druva, we strive to swiftly, efficiently, and reproducibly transform code within repositories into deployable components through these CI-CD pipelines. CI-CD heavily relies on the foundational infrastructure machines responsible for executing resource-intensive operations.

In this article, we will delve into existing CI-CD challenges, and shed light on the need for a service capable of scaling existing infrastructure in real-time to accommodate dynamic job queues within CI-CD systems. Let us explore the barriers to CI-CD's optimization and discuss Druva’s path to transform in this space. 

About CI-CD systems

As a result of virtualization technology, CI-CD infrastructure machines are predominantly comprised of virtual machines (VMs). These VMs may be situated in an on-premise data center, a public cloud, or a private cloud. Containerization technology has revolutionized the process of encapsulating dependencies and the build itself, bringing remarkable portability. This has helped build teams leverage these technological innovations to configure an array of VM infrastructure resources and author scripts, and seamlessly integrate them into CI-CD platforms to help automate workflows.

Existing issues with CI-CD systems

The seamless realization of the CI-CD system's full potential faces several obstacles. The following issues present significant hurdles within CI-CD systems.

  1. Variable resource demands: Distinct job loads necessitate varying resource allocations. However, limited configured resources coupled with heavy workloads bring added difficulty with job queues and increased waiting periods for users.

  2. Dilemmas with legacy code bases: Containerizing legacy product components poses considerable difficulties when components exhibit dependencies on the machine environment.

  3. Cumbersome dependency management: The gradual accumulation of dependencies installed on machines over time brings increased challenges. This complicates the segregation and replication of these dependencies during runtime.

  4. Weighty containerized Windows images: Containerized Microsoft Windows systems tend to possess substantial size, rendering them less versatile and operating system-agnostic compared to their Linux counterparts.

  5. Arduous infrastructure scaling: Scaling infrastructure resources requires the manual cloning of machines and the configuration of settings — tedious and time-consuming processes.

  6. Integration overhead: The addition of new machines to CI-CD systems necessitates meticulous integration procedures, such as configuring the GitLab-runner for GitLab CI-CD, and nodes for Jenkins.

Overcoming these challenges requires a service that can dynamically scale the infrastructure and accommodate the ebb and flow of workload demands within the CI-CD systems. 

Our Autoscaler Service Parmanu

We built Parmanu — an autoscaler service inspired by a Hindi Superhero comic book. Parmanu serves as a dynamic backend service, empowering Druva to effortlessly scale its infrastructure resources in response to workload demands. The service holds YAML file-based configuration for the following entities:

  • Job emitter platform credentials

  • Infrastructure platform credentials 

  • Job emitter probing frequency

  • Executor machine configurations

  • Minimum/maximum requirement for a job

  • Job-to-node mapping

  • Rules for auto-scaling 

parmanu 1

Parmanu Architecture

parmanu 2

Parmanu is deployed in a distributed fashion with the following stateless microservices.

Probot: This microservice performs periodic probing on the job emitter platform, capturing real-time job load data using REST API calls. Probot employs a multithreaded and asynchronous approach, enabling rapid parallel probing across various job-emitting platforms. Moreover, its plugin-based architecture facilitates easy extension for querying job data via REST APIs.

Scaler: Working in tandem with the job processing platform, the scaler microservice swiftly adds or removes virtual machines (VMs) per workload demands. Leveraging the platform's CLI/API, the scaler executes scaling operations in a multithreaded fashion that allows simultaneous adjustments to infrastructure resources.

Redis Cache: This cache is populated by the probot service and read by the scaler services to take scaling decisions.

Integration Prerequisites

  • Read-only access permissions to fetch job details from the job emitter platform.

  • Permission to create and delete VMs on the job processing platform.

  • Access to read resource data managed by the job processing platform.

  • One-time storage of machine images for replication during scaling on the job processing platform.

Pros

  • Horizontal scaling of infrastructure resources.

  • Highly configurable and policy-driven approach.

  • Fast, flexible, and extensible.

  • Seamless integration into existing systems.

  • Cloud-agnostic solution.

  • Stateless architecture.

  • Fault-tolerant, resilient, and reliable.

Cons

  • Dependency on network connectivity.

  • One-time setup requirements for machines and security in infrastructure platforms.

Prominent Use-Cases

  • Horizontal scaling of build infrastructure across various cloud platforms.

  • Infrastructure scaling that is based on runtime-policy configurations. Example: cost, memory, load.

  • Central horizontal autoscaling solution for different job emitting and processing platforms.

  • Effective management of autoscaling policies across multiple cloud providers from a centralized location.

  • Business Continuity and Disaster Recovery (BCDR) planning and setup.

  • Logging load patterns, understanding resource requirements, and designing customizable scaling policies.

  • Autoscaling of storage nodes based on estimates of scheduled jobs in each region.

Conclusion

Parmanu offers a comprehensive solution for automating infrastructure scaling within CI-CD systems.

Parmanu is a highly configurable auto-scaler service designed to facilitate seamless infrastructure scaling within CI-CD systems. Parmanu enables efficient capture of job load data, dynamic addition or removal of virtual machines, and intelligent decision-making for scaling operations. With its horizontal scaling capabilities, extensive configuration possibilities, easy integration, and fault tolerance, Parmanu empowers organizations to adapt to changing workloads effectively. 

However, it is essential to consider network dependencies and allocate initial setup efforts on infrastructure platforms. Prominent use cases for Parmanu encompass scaling build infrastructure, policy-driven scaling, and serving as a centralized autoscaling solution for job emitting and processing platforms. 

Parmanu has now been functional for more than 3 years and helped Druva to enhance software delivery processes by scaling up and down VMs on a magnitude of 100 per day.

About the authors

Kush Shukla is a Polyglot Lead Code Wizard and Tech Evangelist at Druva. He leads the development efforts and loves building tools around developer productivity and experience.

Swapnil Kesur is a technology enthusiast and open-source protagonist at Druva. He loves to build developer-centric foolproof solutions around engineering productivity.

Next steps

Looking to learn more about the technical innovations and best practices powering cloud backup and data management? Visit the Innovation Series section of Druva’s blog archive.