Innovation Series

How to Scale WebSockets at Fractional Footprint in Go

Mayank Gupta, Principal Engineer and Pankaj Pipada, Senior Technical Director

Introduction

Are you looking to enhance communication between clients and servers in real-time, with millions of active connections? If so, you're not alone. Many companies that rely on agents installed in customer environments struggle with scaling server-initiated communication, especially when connections need to be persistent.

At Druva, a data protection solutions company, we face this challenge every day. With millions of clients per customer and thousands of customers, we understand the need for long-running, bi-directional, synchronous, and asynchronous connections.

WebSockets are a cutting-edge communication protocol that enables real-time, bi-directional data transfer between clients and servers. They're becoming more popular as they can reduce latency and overhead, and improve scalability. In this post, we'll explore how WebSockets can help you scale millions of active, persistent connections without taking up too much space. Say goodbye to communication headaches and hello to seamless, efficient client-server communication!

Possible Approaches

One option is frequent polls, where clients repeatedly ask the server if there are any updates. However, this can quickly become inefficient, especially with a large number of active connections.

Another approach is HTTP long-polling, which keeps the connection open until there's new data to return. While this method works for some use cases, it has its limitations. For instance, it's not a duplex, meaning that only the server can initiate communication. Additionally, per-connection authorization and higher server footprint can pose challenges. We can also run into some runtime issues, like detecting server disconnects after a delay.

WebSockets provide an alternative approach to tackling this problem. With WebSockets, the server can initiate communication with clients, and the connection remains open until explicitly closed. This makes it a suitable option for real-time, bi-directional, synchronous, and asynchronous communication with multi-million active connections.

WebSockets

A WebSocket is a protocol that enables bi-directional, real-time communication between a client and a server over a single, long-lived connection. Unlike traditional HTTP requests that require the client to constantly poll the server for updates, a WebSocket connection allows the server to push data to the client whenever new information becomes available.

Setup

The initial setup of a WebSocket connection involves a handshake protocol between the client and the server. Here are the steps involved in this process:

  • The client sends an HTTP request to the server, typically using the GET method, with an "Upgrade" header field set to "WebSocket" and a "Connection" header field set to "Upgrade." The request also includes a unique "Sec-WebSocket-Key" header field, which is a randomly generated key that the server will use to prove that it can speak the WebSocket protocol.

  • If the server supports the WebSocket protocol, it will respond with an HTTP response with a status code of "101 Switching Protocols." The response will include an "Upgrade" header field set to "WebSocket," a "Connection" header field set to "Upgrade,” and a "Sec-WebSocket-Accept" header field that is calculated using the value of the client's "Sec-WebSocket-Key" header field. The server may also include additional header fields in the response.

  • Once the client receives the server's response, the WebSocket connection is established and both the client and server can begin sending data to each other in real time.

It's important to note that the WebSocket protocol also supports additional options for setting up the initial handshake, including specifying subprotocols and extensions. However, the basic steps outlined above are the core components of the WebSocket handshake protocol.

Code snippet

Security Considerations

WebSocket connections are not restricted by the same-origin policy, which means that WebSocket servers need to validate the "Origin" header to prevent cross-site WebSocket hijacking attacks. This is important when sensitive or private data is being transferred over the WebSocket. To authenticate the WebSocket connection, it's best to use tokens or similar protection mechanisms.

An example of a vulnerability in WebSocket security was the Cable Haunt incident in 2020. Cable Haunt is a critical vulnerability found in cable modems from various manufacturers across the world. The vulnerability enables remote attackers to execute arbitrary code on your modem, indirectly through an endpoint on the modem. The vulnerable endpoint is exposed to the local network but can be reached remotely due to improper WebSocket usage. Through malicious communication with this endpoint, a buffer overflow can be exploited to gain control of the modem. Using Websockets with TLS/SSL and introducing proper authentication through access tokens or similar mechanisms can help avoid such attacks.

Go (Golang) Ecosystem for WebSockets

Go has several popular WebSocket libraries that make it easy to add real-time communication to your application. Let's take a look at some of the most commonly used libraries and how they compare.

  • net/websocket

    • net/websocket is the standard WebSocket library included in Go's standard library. It provides a simple API for creating WebSocket servers and clients and supports both the WebSocket protocol and the older Hixie 76 protocol.

    • One downside to using net/WebSocket is that it lacks some of the advanced features provided by other libraries, such as message compression and ping/pong handling. However, it is still a solid choice for basic WebSocket applications.

  • gorilla/websocket

    • gorilla/websocket is a popular WebSocket library that provides a rich set of features and a more developer-friendly API than net/WebSocket. It supports the WebSocket protocol as well as a number of extensions, including message compression and per-message deflate compression.

    • gorilla/websocket provides a number of other features, such as message fragmentation and message broadcasting, which make it a popular choice for building real-time applications.

    • It is important to note that gorilla/websocket has been moved to a public archive state and is no longer actively maintained. While the library is still functional and widely used, it may not receive updates or bug fixes in the future. This means that any security vulnerabilities or other issues with the library may not be addressed.

    • If you decide to use gorilla/websocket in your project, it is important to be aware of this risk and take steps to mitigate it. This could include:

      • Conducting thorough testing and auditing of the library before deploying it in production.

      • Keeping an eye on security advisories and updates related to the library and being prepared to switch to an alternative WebSocket library if necessary.

      • Consider using an alternative WebSocket library that is actively maintained and receives regular updates and bug fixes, such as `gobwas/ws`.

  • gobwas/ws

    • gobwas/ws is a lightweight WebSocket library that is designed for performance and ease of use. It supports the WebSocket protocol and a limited number of extensions and provides a simple API for creating WebSocket servers and clients.

    • It is also actively maintained and receives regular updates, security, and bug fixes.

    • The library also provides additional features and customization options, such as support for subprotocols, ping/pong messages, and custom WebSocket frame handling.

    • Example code to use it would look like this:

Code snippet
Code snippet

Memory Utilization

Challenge

When using WebSockets, one of the biggest challenges is managing memory utilization. In traditional HTTP connections, each HTTP writer and reader typically uses 4KB of memory. With WebSockets, an additional 4KB of memory is required for the WebSocket HTTP writer, and each goroutine can use up to 8KB of memory. This means that with a million connections, the memory utilization can reach up to 20GB. Below diagram indicates the different components present:

WebSocket components


Solution 1: EPoll

The first solution is to use the Linux system call, Epoll. Epoll is a scalable I/O event notification mechanism that monitors multiple file descriptors for I/O. By implementing Epoll, memory utilization can be reduced by approximately 30%. Here is a sample code for using Epoll:

Code snippet

Solution 2: Optimized Buffer Allocations

  • Another solution to optimize memory utilization is to use low-level APIs for packet handling and buffers to avoid intermediate allocations during I/O. 

  • This technique is used by the `gobwas/ws` library, which also supports zero-copy upgrades. 

  • By using this library, memory utilization can be reduced by up to 60%, resulting in a total reduction of memory utilization by 97% with a million connections, which is only 600 MB.

Synchronous Communication

WebSocket is a protocol for two-way communication between a client and a server over a long-lived, bi-directional connection. As such, WebSocket is designed for asynchronous communication where both the server and the client can send messages to each other at any time. However, there may be use cases where synchronous request/response communication is required. E.g: a server sends a request to the client and needs to respond back immediately with a response to another consumer server.

One way to achieve synchronous communication with WebSocket is to use unique identifiers, such as UUIDs, for each request message. The server can send a request message containing a unique identifier, along with any other necessary data, to the client. The client can then process the request and send a response message back to the server with the same unique identifier. The server can then use this identifier to match the response message with the original request and process the response accordingly.

In Go, channels can be used to implement synchronous request/response approaches, even when the server initiates the request. The server can create a channel for each request, and use the UUID as the key for the channel. When the server sends a request to the client, it can create a channel with a unique UUID, send the request over the WebSocket connection, and block on the channel until a response is received. Meanwhile, the client can listen for incoming requests, process the request, and send the response back over the WebSocket connection with the same UUID as the original request. When the server receives the response with the matching UUID, it can unblock the channel and continue processing the response. Below figure details this flow:

Flow chart

Horizontal Scaling

WebSocket servers can horizontally scale to handle a large number of client connections by adopting an architecture that allows for dynamic server spawning and a server registry. One possible approach is to use AWS Auto Scaling Groups (ASG) to spawn new servers dynamically based on the demand for connections. Each spawned server can then register itself with AWS Route 53 DNS to ensure that clients can connect to it.

To maintain the state of connected clients and the servers they are connected to, a server registry can be implemented on top of a MySQL database. This registry can keep track of which server each client is connected to, and can be used to route incoming messages to the correct server.

When a client initiates a connection request, the request can be sent to a load balancer, which can distribute the request to one of the available servers. The server can then check the server registry to see if the client is already connected to another server, and if so, route the connection to the appropriate server. If the client is not yet connected, the server can add the client's connection to the registry, and start processing incoming messages.

If a server becomes overloaded, the ASG can automatically spawn new servers to handle the additional load, and the server registry can be updated accordingly to reflect the new server that the client should connect to.

Flow chart


Conclusion

In this post, we've explored the challenges and possible solutions when it comes to scaling WebSockets for large-scale communication involving persistent connections. 

We delved into the Go (Golang) ecosystem for WebSockets, discussing the various libraries available and how they compare. 

By implementing optimized buffer allocations and using EPoll, we've reduced memory utilization by up to 97%. Additionally, we've covered the need for synchronous communication and how it can be achieved using message UUIDs and channels in Go.

Finally, we looked at horizontally scaling the system by dynamically spawning servers using AWS ASG, maintaining a server registry built on top of MySQL, and using AWS Route53 DNS to enable the dynamic registration of new servers.

Overall, with the right architectural design and implementation, WebSockets can be scaled efficiently, resulting in a fractional footprint compared to traditional TCP connections, and allowing for real-time communication at multi-million scale.

Next Steps

Looking to learn more about the technical innovations and best practices powering cloud backup and data management? Visit the Innovation Series section of Druva’s blog archive.