Druva manages a colossal amount of data under its data protection hood and utilizes a multiple stage structure from which data flows through different systems, with a variety of architectures.
One of such systems has a pipeline architecture which runs parallely and handles loads of data, however, sometimes just parallel running pipelines is not efficient. Pipelines can become inefficient due to various factors such as scaling due to resource constraints, wait time introduced by way of differential execution time of components, or a number of other reasons like communication by shared memory intensive objects. To tackle this problem we started building concurrent programs rather than parallel ones.
What is the Pipeline?
A pipeline is a series of independent components or stages connected via some connectors which complete a specific task so that computations are performed in a stream-like fashion. These connectors can be anything from pipes to message queues to channels or even shared memory. The component obtains data from the inward connector, performs some set of operations, and emits the data on the outward connector which is then operated by the next component in line.