Tech/Engineering

How Druva Scales One Python Service to Handle and Monitor Millions of APIs per Day

Anurag Ranjan, Sr. Staff Software Engineer

As businesses scale their operations to accommodate a growing user base, the ability to handle a large number of API requests reliably and efficiently becomes a key factor in achieving success. For modern web applications, APIs are the backbone of communication between services, and their performance is critical in determining both user experience and business outcomes.

In this blog, we’ll explore how using Gevent with Gunicorn and Falcon allowed us to handle a staggering million API calls per day for mission-critical microservices. We’ll highlight the architectural decisions that led to this success and how Gevent’s asynchronous capabilities played a pivotal role in improving scalability, reducing latency, and optimizing resource usage.

The Challenge: Handling a Million API Calls Per Day

Handling a million API calls per day is no small feat. This level of traffic demands a robust, scalable, and efficient architecture capable of dealing with high throughput without compromising performance.

Our microservice is responsible for serving critical data that protects various business functionalities, so it’s not just about serving requests — it’s about ensuring minimal latency, high availability, and maximum reliability, even during peak hours when traffic spikes.

We adopted Falcon, a minimalist web framework, to ensure that the API is fast and lightweight. For deploying Falcon, we chose Gunicorn as the WSGI server, using Gevent for concurrency. Let’s dive into how this architecture handles such a massive load leveraging the AWS infrastructure.

The Architecture: Falcon, Gunicorn, and Gevent

Falcon: A High-Performance Web Framework

Falcon is designed for speed, making it an ideal choice for building high-performance APIs. Unlike other frameworks, Falcon minimizes overhead by avoiding unnecessary features, ensuring that each request is processed as quickly as possible. This lean design means that it can handle high throughput with minimal resource consumption.

However, the real key to handling a million requests per day is how these requests are processed, which is where Gunicorn and Gevent come in.

Gunicorn: The WSGI Server

Gunicorn, a Python WSGI HTTP server, is widely used for deploying Python web applications in production environments. It provides a simple interface between web servers and web applications, making it a reliable choice for serving APIs.

Gunicorn supports various worker types, and for high-concurrency applications, we selected the Gevent worker model. Gunicorn creates multiple worker processes to handle incoming requests, and each worker can spawn multiple greenlets, making the system highly concurrent. The ability to scale the number of worker processes allows us to handle a large number of requests across multiple CPU cores.

Gevent: Enabling Concurrency with Cooperative Multitasking

Gevent is an asynchronous networking library for Python that provides concurrency by using greenlets — lightweight, cooperative threads that run in a single process. Greenlets are executed by the Gevent event loop, which efficiently manages task switching. 

Here’s how Gevent helps us manage a million API calls:

1. Non-blocking I/O: A key feature of Gevent is its ability to handle I/O-bound operations (e.g., database queries, external API calls) asynchronously. In a typical synchronous environment, a request would block a worker process until the I/O operation completes. With Gevent, a greenlet can yield control when waiting for I/O, allowing other greenlets to execute in the meantime. This drastically improves throughput by ensuring that workers are not sitting idle while waiting for I/O operations to finish.

2. Lightweight Greenlets: Unlike traditional threads, greenlets are very lightweight. They share the same memory space and have minimal overhead. This allows each worker to manage hundreds or even thousands of greenlets, significantly increasing the number of requests a single worker can handle. As a result, scaling up with Gevent requires far fewer resources compared to a multi-threaded or multi-process setup.

3. Efficient Context Switching: In Gevent, the event loop efficiently manages context switching between greenlets. There’s no need for complex thread synchronization mechanisms like locks, which are common in multi-threaded applications. This reduces latency and resource contention, making Gevent an ideal choice for applications with high concurrency requirements.

Why Gevent is Perfect for Serving Million API Calls

When we started serving a million API calls per day, the key considerations were scalability, latency, and resource efficiency. Let’s break down why Gevent is the ideal solution for these challenges.

Global Interpreter Lock

The Global Interpreter Lock (GIL) is a mechanism in Python that ensures only one thread executes Python bytecode at a time. While it simplifies memory management, it can hinder multi-threaded performance, particularly for CPU-bound tasks, by limiting the ability to utilize multiple cores effectively. 

Problem With GIL

The GIL in Python limits multi-threading, preventing parallel execution of CPU-bound tasks and reducing performance on multi-core systems. 

1. Handling Massive Traffic with Minimal Resources

By using Gevent, we can run many greenlets within a single worker process. This approach is far more memory-efficient than traditional threading models, which require a separate memory stack for each thread. With greenlets, multiple tasks share the same memory space, reducing overhead and allowing each worker to handle more tasks concurrently.

For example, each Gunicorn worker running Gevent can handle thousands of concurrent API requests. Even with a limited number of workers (in our case, just a few), we’re able to process millions of requests without needing to spin up dozens or hundreds of worker processes. This efficient use of resources allows us to manage large volumes of traffic even on modest hardware.

2. Reducing Latency with Asynchronous I/O

The ability to perform asynchronous I/O is crucial when dealing with high-traffic environments. Our microservice frequently interacts with external systems like databases and third-party APIs. These interactions are often I/O-bound, meaning that without Gevent’s non-blocking I/O, each request could be delayed while waiting for external responses.

With Gevent’s asynchronous capabilities, we can initiate I/O operations in parallel, without blocking the worker thread. This means that a single worker can handle multiple requests while waiting for I/O operations to complete, drastically reducing the time spent idling and improving the overall response time.

3. Scalability and Cost Efficiency

As we grew to handle a million API calls, scalability became a critical concern. Traditional approaches to scaling, such as increasing the number of worker processes or threads, can quickly become resource-intensive, especially with the overhead of context switching, memory usage, and the complexity of managing inter-process communication.

Gevent allows us to scale efficiently without overloading the system. Because greenlets are so lightweight, we can scale horizontally by adding more Gunicorn worker processes, each capable of handling thousands of concurrent greenlets. This horizontal scaling approach is not only cost-effective but also ensures that our service can scale effortlessly to handle even larger traffic spikes.

4. Handling High Concurrency with Ease

The nature of the traffic our microservice receives is highly concurrent — many small, short-lived requests that need to be processed in parallel. Gevent excels in this environment because it enables our API to handle thousands of concurrent requests within a single worker process. This eliminates the need for expensive thread or process management and improves overall performance.

Without Gevent, each request would have to wait for others to complete, leading to bottlenecks and reduced throughput. Gevent’s cooperative multitasking model allows the server to process more requests in less time, making it the ideal choice for high-concurrency applications like ours.

Maximizing API Performance with Python Modules, DB Connection Pooling, and Caching

  1. We do not rely on ctypes or other non-Python-based modules but focus solely on Python-native tools to enhance performance with Gevent.

  2. Database Connection Pooling:
    We use connection pooling with Gevent to avoid the overhead of opening and closing database connections for every request. Gevent’s asynchronous model allows multiple queries to be processed concurrently, reducing idle time and improving throughput without relying on non-Python modules.

  3. Caching for Frequently Accessed Data:
    For data that doesn’t change frequently, we implement caching to reduce database/service load. This allows us to serve cached responses quickly, improving response times and reducing unnecessary database queries/API queries.

Dynamic Scaling and Advanced Load Balancing for Seamless Traffic Management

To further support this massive scale, we leveraged AWS infrastructure to dynamically scale our resources. As traffic spikes occurred throughout the day, our system was designed to automatically spin up new nodes to handle the increased load. This elasticity in the cloud environment allowed us to maintain high availability and performance without the need for manual intervention. 

To efficiently distribute the incoming traffic, we employed an advanced load balancer that intelligently routes requests across available nodes, ensuring optimal resource utilization and minimizing latency. 

By scaling up instances on-demand and using the load balancer to manage traffic, we ensured that our architecture could easily accommodate fluctuating traffic volumes, allowing us to efficiently manage the million API calls per day while keeping costs in check.

handling million api calls 1

Harnessing APM: Elevate Service Monitoring & Performance

Application Performance Monitoring is a crucial practice for managing and optimizing the performance of your software applications.

It involves a set of tools and processes that help you in various ways.

  • Observability: Observability enables us to gain insights into a system from an external perspective.

  • Distributed Tracing: Distributed tracing is a technique used in distributed systems to monitor and debug complex interactions between multiple components or services.

  • Spans: A span refers to a specific unit of work or operation within a distributed system.

  • Traces: A distributed trace is a record of the steps taken by a request as it travels through a distributed system.

  • Metrics: Metrics are numerical measurements and performance indicators employed to evaluate different facets of computer systems, software, network operations, and related elements. 

We strategically leverage Application Performance Monitoring (APM) to gain deep insights into our services, ensuring optimal performance, real-time issue detection, and seamless user experiences. Some examples are shared below:

The image below shows service-level quick insights:

handling million api calls 2


The image below depicts interaction between multiple services and provides monitoring insights:

handling million api calls 3

Conclusion: Achieving Peak Performance with Gevent while Leveraging AWS Infrastructure 

Serving a million API calls per day requires a combination of efficient architecture, intelligent scaling, and optimized resource management. By leveraging Gevent with Gunicorn and Falcon, we were able to meet these demands while keeping performance high and resource consumption low. 

Gevent’s ability to handle asynchronous I/O, coupled with its lightweight greenlets and cooperative multitasking model, makes it the perfect choice for high-traffic APIs. For anyone managing high-concurrency workloads in Python, Gevent offers a powerful tool to improve scalability, reduce latency, and maximize throughput.

If you're looking to optimize your API performance for massive traffic, consider implementing Gevent in your Gunicorn worker setup. With the right architecture in place, you too can handle millions of API calls with ease and efficiency. The ability to launch new nodes as needed, coupled with advanced load balancing, provides the necessary flexibility and resilience to maintain peak performance under heavy loads.

This blog post outlines the critical role of Gevent in scaling a microservice to handle a million API calls per day, highlighting key technical advantages like non-blocking I/O, efficient resource use, and the ability to scale horizontally with minimal overhead. By sharing this knowledge, you can demonstrate how Gevent and a smart architecture can unlock new levels of performance for your own applications.