A few months back, we migrated from Cassandra to AWS DynamoDB as the noSQL database powering our inSync Cloud service. The move was a hard decision due to the risk and effort involved, but we had some compelling reasons to do so.
Disclaimer: Our experiences are with Cassandra 0.8.x. Cassandra improved on quite a few areas in 1.0 and some of the following issues may have been addressed in 1.0.
Data model misfit
First and foremost, Cassandra did not fit our data model well. A majority of inSync’s index entries falls in two buckets:
- File offset to data block mapping (type A records)
- Block checksum to data block mapping (type B records)
Each Type A record represents a file with potentially millions of columns. On the other hand, there are billions of Type B records with a small number of columns.
In our case, we found that Cassandra could efficiently handle millions of rows each with thousands of columns. On the other hand, Cassandra did not perform well with millions of columns in a row (Type A records) and neither did it perform well with billions of small sized rows (Type B records). To use Cassandra efficiently, we were therefore forced to break huge rows and also combine smaller rows.
inSync shows a sequential access pattern for Type A records. For Type B records, only a fraction is hot and the rest is very infrequently accessed. Due to these access characteristics, we expect to operate with cache sizes much smaller than the database size.
We found that Cassandra tends to sacrifice read performance in order to improve write performance. Because of the log structured design of Cassandra, a single row is spread across multiple sstables. Reading one row requires reading pieces from multiple sstables. As a result, Type A access does not remain a pure sequential access.
Cassandra also relies on OS page cache for caching the index entries, primarily because JVM cannot handle large heap efficiently. For us, this feature translated to caching a bunch of cold Type B records along with one hot Type B record that happened to sit on the same OS page. In other words, the cache was filled with cold records resulting in a poor cache hit ratio and lot of read load on the disks.
A lot has been written on this topic and we also witnessed that it takes a dedicated team to maintain a Cassandra cluster.
The regular maintenance tasks, particularly repair, is not automated. We had to fire a regular repair operation and then monitor it because if the operations failed for some reason, it would not restart on its own and never resume from the last point. And this process had to be done for each node in the cluster. Imagine running a Cassandra cluster with tens of nodes!
Further, the repair process ends up creating duplicate copies of almost all records and doubling the database size. A repair is invariably followed by a compaction cycle before the database size comes back to normalcy. All this process adds substantial load on the system and if the process did not finish over the weekend, we ended up with much lower performance for the regular load.
Cassandra automatically triggers minor sstable compactions. In our case, the sstable compaction was almost a continuous activity and it competed with the regular database load for system resources. Finding low activity slots and dynamically tuning compaction throughput so that it did not compete with regular load became another maintenance overhead for us.
The “Incident” – When we started looking for more options
As a last straw, we had our worst experience with Cassandra maintenance when we tried to grow a cluster from 3 nodes to 6 nodes.
- One, the process just went on and on.
- Two, there was no way to figure out the progress.
- Three, populating new nodes added more load on the already loaded original set of nodes.
Ultimately, we ended up dumping the old cluster and reloading all the data in a new bigger cluster. And because reload requires static data, we had to keep the services down during the process :( I guess this was the proverbial last straw that led us to look out for other options.
I know for a fact that the Cassandra team is working on many issues mentioned here, although there are some design choices in Cassandra that would make it unfit for certain data models. All the same, the ‘log structured merge trees’ architecture of Cassandra has a lot of potential and I hope Cassandra turns out to be a more mature product in due course of time.