Platform
- Data Resiliency Cloud
  Data Resiliency Cloud
  Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Resiliency
  Cyber Resiliency
  Be ready for cyber attacks with data that is always safe, always ready
  - Accelerated Ransomware Recovery
  - Security Posture & Observability
- Governance & Compliance
  Governance & Compliance
  Secure, protect, and streamline data governance for all your critical data, wherever it lives
  - eDiscovery and Legal Hold
  - Sensitive Data Management
- Take a Tour
Solutions
- Business Drivers
  Business Drivers
  Learn how Druva helps you accelerate key business initiatives
- SaaS Applications
  SaaS Applications
  Druva provides comprehensive data protection that supports multiple SaaS applications from a single platform. Discover the Druva difference today.
- Enterprise Workloads
  - Virtualization
    Virtualization
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Nutanix
  - Databases
    Databases
    Reduce the cost and complexity of data protection for enterprise databases
    
    Oracle
    
    MS SQL
    
    SAP HANA
  - Files
    Files
    Discover a more cost-efficient way to protect on-premises and cloud NAS
    
    NAS/files
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Microsoft Azure
- Enterprise Endpoints
  Enterprise Endpoints
  Unify SaaS apps and end-user device protection to reduce data risks. Improve cyber resilience and compliance by protecting enterprise workloads and assets.
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- 2023 Gartner® Magic Quadrant™
  See why Druva is recognized as a Visionary
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Blog
  - Diversity, Equity & Inclusion
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language

Tech/Engineering, Innovation Series

Evaluating distributed databases with the CAP theorem for an ideal cloud computing network

May 20, 2021 Sohom Bhattacharjee, Software Engineer

Each day, multibillion-dollar organizations and the everyday person alike rely on the cloud to consistently and simply exchange data. Cloud computing has made it very easy for us to use applications without having to think about infrastructure. Needless to say, most software applications on the cloud are designed to make use and take advantage of its distributed and decentralized approach to networking.

The database is a critical part of any software system. In a distributed system, data may be stored in multiple computers located in the same physical location, or dispersed over a network of interconnected computers. Were such a system to be exposed to ransomware, malicious file deletion, or similar, the entire network could be at risk. Thus, it is important to design them (the data stores) and the entire application around the data store, in such a way that the user experience is not affected should something go wrong.

There is no silver bullet which can help us find the perfect data store for our needs. This is where the CAP theorem helps us. The CAP theorem (also known as Brewer’s theorem) gives us a framework for thinking about the trade-offs that are involved when choosing a distributed data store for our cloud computing needs. Studying this informs us about the various problems one might face, and how to think about them (and potentially solve them). It helps us write robust software for the cloud era.

What is the CAP theorem?

The CAP theorem states that out of three desirable properties, it is impossible for a distributed data store to guarantee more than two. These three desirable properties are:

Consistency
Availability
Partition tolerance

In this blog, we will explore these properties in the context of the CAP theorem.

Distributed database

In a distributed database, the data is physically stored in different computers. The computers may be present in a single data center or spread across different locations. The different computers are connected via network. These different computers are called nodes, and they can communicate with each other using messages to keep the state of the system in sync.

Distributed transactions

The clients can read and write via any node that is a part of the system. When a node gets a request, it performs operations before responding to the client. A non-error response indicates that the transaction has been completed successfully.

For example, let us assume that we have a two-node cluster. Let us call the nodes `node_1` and `node_2`. This cluster is keeping track of a value `v=0`. This is the initial state of the cluster.

Consistency

Consistency is the property that ensures every transaction changes the state of the database from a valid state to another valid state. This means that the system returns the correct response to every query.

Therefore, any read operation after a write operation should fetch the latest state, regardless of which nodes processed the write and read.

Let us illustrate this with the example of our two-node cluster:

The current state of the cluster is `v=0`
A client queries a node(`node_1`) with a transaction changing the state of the cluster to `v=1`
The node communicates with the other node(`node_2`), and they both reconcile the new state of the cluster. The new state is `v=1`
A new client queries the cluster for the current state — since both the nodes have the updated data, the cluster responds with the updated state

This means that once a client writes `v=1` to the cluster (using any node), the subsequent state of the entire cluster should be `v=1`. This state should be independent of which nodes were used for the transaction.

Availability

Availability means that the entire database should be available to respond to requests even if some nodes are down. For our cluster, all this means is that if one of the nodes is down, the cluster should continue to respond to queries (reads and writes).

For example, if `node_1` is down, then all the requests that will be re-routed to `node_2` should be able to process the queries correctly without errors.

Partition tolerance

A network partition happens when some of the nodes are not able to communicate with the rest of the nodes. This means that the messages the nodes use to reconcile the state of the cluster are either not reaching them or being delayed to a point where they are no longer useful.

A partition-tolerant system will continue to be in service during a network partition.

The CAP theorem — proof

The CAP Theorem states that out of the three properties, we can guarantee only two.

What happens if our system satisfies all properties?

The easiest way to test this claim is to introduce a partition into the system.

How does a write operation look in this case?

Initial state of the cluster `v=0`
Client talks to `node_1` and changes the state to `v=1`
The nodes cannot talk to each other so `node_2` does not get the updated state of the cluster
Now when a client queries `node_2`, the client will not receive the updated state of the cluster. (`v=0`)

We see a trade-off here. In order for the system to be available during a network partition, the state of the database was not consistent. Each node (or partition) had its own state. This can become problematic for clients. On the other hand, if the system were to stop responding to requests during a partition, the system would not be able to guarantee partition tolerance.

Thus, a system can not guarantee all three properties.

Real-world examples

Most distributed database systems are partition tolerant by default, because this is the expectation in the cloud-native landscape. This leaves us with consistency and availability as parameters we can change and adjust as needed. Some database systems even allow the user to select the behavior of the database by using configuration parameters. We should also understand that these aren’t absolutes. There is a spectrum between availability and consistency, and many systems offer users the choice to select a point along that continuum.

Apache Cassandra is a very good example of a distributed database system that has chosen to guarantee availability, partition-tolerance, and trade-off consistency. Cassandra guarantees eventual consistency, which means that after a certain amount of time, all nodes will have the same cluster state. A very popular example of an eventually-consistent system is the domain name system (DNS). That is why any updates to the DNS need time before they propagate across the entire planet.

Cassandra allows a user to choose the consistency level during reads/writes. If you choose to have strong consistency over your writes, then the system will make sure all updates are propagated to all replicas before marking the transaction as a success. Similarly, if you choose to have strong consistency during reads, the system would make sure that all replicas have the same copy of the data before responding to the client. Both these situations trade availability for consistency.

Key takeaways

The CAP theorem is all about trade-offs. Since different systems have different requirements, the trade-offs that they choose will be different. Some databases may be required to absolutely guarantee availability, while relaxing database consistency needs. Others may need to guarantee consistency at the cost of not being highly available.

The goal is to understand the theory so we may be more informed about the trade-offs.

We choose consistency over availability when data consistency is very important. This is a typical approach in financial institutions. In these situations, we want the data to be very consistent and are willing to tolerate downtime. Banks typically will have a predetermined downtime window where they are allowed to perform database maintenance.

On the other hand, we choose availability when the uptime of the system is more important than consistency. This is typically seen in systems which collect large amounts of data, such as a metrics database, which collects telemetry data from smart devices or distributed microservices. In this scenario, reads are not followed closely by the writes, and some inconsistency is okay as long as the system is not refusing any writes. Dropping data would lead to more inaccuracies in understanding the monitored system.

However, these are just generic scenarios, and there are companies that may run Cassandra (which is an eventually-consistent database) for all their data storage needs — including banking data.

Explore innovations and best practices in cloud backup and data management — such as the CAP theorem for distributed databases — in the Tech/Engineering section of Druva’s blog archive.

Evaluating distributed databases with the CAP theorem for an ideal cloud computing network

What is the CAP theorem?

Distributed database

Distributed transactions

Consistency

Availability

Partition tolerance

The CAP theorem — proof

What happens if our system satisfies all properties?

How does a write operation look in this case?

Real-world examples

Key takeaways

Blog

Druva Data Resiliency Cloud

Cloud Backup & Recovery

Data Protection

Governance & Compliance

Cyber Resilience

Business drivers

Workloads

Partners

Customers

Resources

Company