Tech/Engineering, Innovation Series

How gRPC Helped us Solve Multi-Node and Multi-Language Challenges

Sudhakar Paulzagade, Distinguished Engineer and Santosh Patil, Senior Software Engineer

Multi-language and other challenges

While developing backup solutions for environments like Oracle Standalone, Oracle RAC, and SAP HANA in the product, we had to deal with some unique difficulties. Most of our infrastructure is written in Golang. We faced programming language issues in situations where the application that we are protecting is written in C/C++ and expected the backup API specification to be implemented as a C/C++ library. We also faced challenges in clustered environments like Oracle RAC and SAP HANA where the application streams data in parallel from multiple nodes but parts of our product required us to create server sessions only from a single node.

In this blog, we explain how gRPC helped us simplify the overall architecture and deal with multi-language and multi-node issues.

Why we did not use CGO

Druva’s Oracle DTC (Direct To Cloud) solution helps customers protect their Oracle standalone as well as clustered environments (RAC) and allows them to stream data directly to the cloud without provisioning any local backup storage. Our solution leverages the Oracle SBT API published by Oracle. Backup Vendors need to implement the SBT API in the form of a library. The SBT API is written in C. Hence, the library needs to be implemented in C/C++ because the Oracle Database Server Processes are implemented in C/C++ and load the SBT library to stream data to and from backup storage. As most of Druva’s infrastructure is written in Golang, the only option was to implement the API in Golang and expose C APIs using CGO. CGO complicated things because marshaling and unmarshalling SBT API input/output data structures became extremely complicated, in some cases impossible to implement.

Using Golang RPC

We decided to explore RPC instead of CGO. RPC is a form of Client-Server Communication method that uses a form of the function call using IDL (Interface Definition Language) as a form of contract on functions. Using RPC, the idea was to implement a pass-through SBT API C layer in the form of a shared library and this shared library would talk to a Golang RPC server which implements actual APIs to stream data to and from cloud backup storage. For RPC we could have written our own implementation on top of raw sockets but that would have been a significant effort. We were looking for a ready-made open-source infrastructure that we can quickly build upon. gRPC came to the rescue.

What is gRPC

gRPC (g Remote Procedure Calls) allows you to communicate with other applications using function calls rather than HTTP calls. This abstracts the network from you, letting you call methods as if they were local code. gRPC uses something known as Protocol Buffers to serialize data between clients and servers. You use a .proto file for defining the services used in your applications. With the help of a gRPC plugin for protoc you can generate code that will give you the methods needed to call a given service, all with native typing in your language of choice.

grpc

Advantages of gRPC

When using gRPC over Protocol Buffers for sending messages over the network, your payloads are serialized in binary. This saves you bandwidth ($$$) and improves network performance. Since gRPC uses automated code generators, you save a lot of time and other overheads when writing custom serializers. 

We chose gRPC due to following reasons:

  • Abstraction is easy (it’s a function call) and it’s very easy to implement
  • Supports a lot of programming languages (Our requirement was C/C++ and Golang)
  • Is fast and scalable (Performance which was a key requirement for us because it’s a data path)
  • Clustered environment

How did we implement Golang gRPC

SBT API is implemented as SBT Client and Data mover process. SBT Client is implemented as a C++ dll/shared library which is loaded by Oracle Server processes. Data mover is implemented in Golang as a standalone process and does actual data movement to and from cloud backup storage. gRPC and protobuf are used as a communication infrastructure between the SBT Client and Data mover as shown in the diagram below

grpc


The architecture lent itself really well to multi-node applications like Oracle RAC and SAP HANA. Below is the multi-node architecture for Oracle RAC.

grpc


The Data mover process i.e. gRPC server is started on one of the nodes of the RAC. The SBT Clients i.e. gRPC client running on multiple nodes stream data in parallel to the Data Mover process which in turn streams the data to and from the cloud. If one or more nodes in the cluster are not available, we still leverage our intelligent discovery and spawn a Data Mover gRPC server on one of the available nodes. This enhances the availability of our solution and also makes it highly scalable.

Similar to Oracle RAC, when we started development for SAP HANA protection, we could simply borrow the same architecture as Oracle RAC. SAP HANA is a multi-node distributed database consisting of a system database and multiple tenant databases. A system database is replicated across multiple hosts but does not span across hosts. A tenant database can span across hosts. SAP HANA backup must be performed through BACKINT. BACKINT is an API specification from SAP which third party backup vendors implement. If a tenant database spans across multiple nodes, BACKINT is started on multiple nodes and streams data to backup storage. Similar to Oracle RAC, BACKINT processes started on multiple nodes act as gRPC clients and talk to the Data Mover gRPC server in our architecture.

Performance of Golang gRPC

One of the striking features of gRPC that we observed is performance. On the same system when we were streaming data between gRPC client and server over loopback n/w and across nodes over the LAN, the initial concern was that we are doing a lot of data copying and transferring buffers over the network; will we get the performance? But we were surprised to see a throughput of around 2.5TB/Hour which was enough to saturate the network between application servers to our cloud storage. This throughput was a raw throughput without any deduplication. With deduplication, we saw even higher throughputs.

Conclusion

Using gRPC with Golang helped us solve both the multi-language and multi-node challenges. 

However, gRPC is a fairly new technology (although it is gaining popularity each year because of its multi-language support). As with any new technology, there is not enough support available for gRPC. If you are stuck, there is very little help available on the internet. You must keep this in mind before your start working with gRPC. If you are stuck, it will come down to you and your team on how to solve the problem. 

At Druva we are never afraid to explore the unknown and are always looking to capitalize on the first-mover advantage. This is true not only for the product that we are building but also the technologies that we adopt to develop the product. Here’s a good example: A blog post on how to solve Golang’s memory problem

Next steps

Please keep an eye out for the next part of this series. You can also learn more about the technical innovations and best practices powering cloud backup and data management. Visit the Innovation Series section of Druva’s blog archive.