Tech/Engineering

Kubernetes: value and challenges for developing cloud apps

Stephen Manley, CTO

During a college lecture, the professor asked an algorithm question that only one student could answer. The Yoda-like professor mused, “In thirty years, only two other students have answered that question correctly. One won a Turing Award. The other lost his way and became a management consultant. Which path will you take, hmmm?”

Kubernetes and containers face a similar fork in the road. These technologies could transform our industry or become OpenStack 2.0 – collapsing under the weight of fragmentation and complexity. There will be no middle path.

I spent the last two years developing with Kubernetes and containers. In one moment, I’d rave about how it handled mundane infrastructure management tasks. In the next, it frustrated me to comical degrees because it was almost impossible to control my own environment.

I dare not predict which path Kubernetes will take, but perhaps my experience can help you choose your path.

Kubernetes helps you build cloud apps faster

Containers virtualize the applications from the underlying virtual or physical infrastructure, while Kubernetes orchestrates and scales containerized applications in real time. Every blog faithfully recites: “Containers are more lightweight than VMs because they don’t package the OS”. While true, the efficiency gains of containers are not worth uprooting your on-premises (working) environment. Kubernetes and containers are most valuable for creating and managing cloud application environments at scale. Since you can’t manage VMs in the cloud, containers will become your new unit of management and Kubernetes will become your new central manager.

As we developed our cloud application, Kubernetes helped:

  • Manage cost and load — Kubernetes can scale a cluster up and down to meet workload needs at the right cost. When applications need to handle a workload spike, Kubernetes auto-provisions nodes. It then kills and restarts containers (remember, they are impermanent) to load balance across the new nodes. When the spike passes, Kubernetes deletes nodes and rebalances the application again. Before Kubernetes, we always over-provisioned for “worst case” loads, wasting resources and money. Kubernetes reduced our cloud costs by 3x.
  • Resiliency — Kubernetes simplifies running resilient, distributed applications. As it scaled a customer’s cluster, Kubernetes deployed our application containers across all the nodes (Daemon Set). Furthermore, when something in our control and management layer failed (our code, cluster node, etc.), Kubernetes restarted our containers automatically (DeploymentStateful Set). Thanks to Kubernetes, we didn’t build clustering at the application or infrastructure layer, which improved our time to market.
  • Upgrade and testing — Before Kubernetes, upgrading applications was generally risky, complicated, and disruptive. With Kubernetes, upgrades are easy. I could upgrade components 10 times a day without disrupting end users. I uploaded a new version of the container(s) to DockerHub and killed the running containers.  When Kubernetes automatically restarted the container, it downloaded the newest version. That’s it. Kubernetes optimized both internal testing and upgrading customer deployments.

Kubernetes solved some of the biggest cloud application infrastructure challenges, so we could focus on building value.

Developing in Kubernetes is not all unicorns and rainbows

While Kubernetes was indispensable to our cloud application development, we did face some serious challenges. Another layer of indirection and control makes life easier until something goes wrong — then it gets much, much more difficult.

What are some of the challenges with developing for Kubernetes and containers?

  • Debugging via logs — Since it’s difficult to attach traditional debugging tools to Kubernetes applications, logs were our essential debug tool. Unfortunately, it can be difficult to retrieve the relevant logs. Verbose logs can rotate out quickly, deleting older log information (otherwise, lots of containers x large logs = out of space). Furthermore, when Kubernetes shuts a node down, the log data for that node is lost. When I debug, I want all the logs I can get, and at times it felt like Kubernetes was working against me.
  • Distributed debugging — With Kubernetes, there are no simple bugs. Kubernetes applications tend to use 3 – 10x more containers than VMs, Kubernetes services and cloud functions. To debug, you have interleave multiple containers’ logs, and augment them with Kubernetes, system and network logs. Then, since you can upgrade containers individually, you’ve got to debug API version and security configuration mismatches between containers. Debugging complex applications is difficult; debugging Kubernetes applications in the cloud is agonizing.
  • Database support — Kubernetes continues to grow its support for the stateful containers that databases rely on, but some databases work better with containers than others. For example, we found very clear instructions on how to run MongoDB in Kubernetes. On the other hand, we found multiple conflicting recommendations on how to run PostgreSQL. Our first release boasted a highly available, protected metadata store built on MongoDB and a fragile statistics repository on a PostgreSQL database that we prayed would never hit an issue.

Kubernetes is still a relatively young technology, so it’s not surprising that debugging and breadth of support are a challenge. Understanding that, however, doesn’t make debugging issues and or configuring some databases any less frustrating.

Managing Kubernetes can ruin your night

Managing a production Kubernetes cluster brings its own surprises. The maxim “Nothing is ever deleted on the Internet” applies to Kubernetes. It’s easy to leave remnants behind, and those remnants can haunt you.

My “first Kubernetes cluster” horror story began when I tried to shut down that cluster. I naively used the AWS console to terminate all the EC2 instances in the cluster. When I glanced back a minute later, AWS had spawned new instances to replace the old ones! I shut those down. More new ones came back. I deleted a subset of nodes. They came back. I spent two hours screaming silently, “Why won’t you die?!?!” Then I realized that the nodes kept spawning because that’s what Kubernetes does. It keeps your applications running, even when nodes fail. Finally realizing my mistake, I deleted the AWS Auto Scaling Group and ended my nightmare. NOTE: Today, I use kops or ekscli to create and delete Kubernetes clusters in AWS.

After I had slain the immortal cluster, Kubernetes still had a surprise waiting for me. When I next logged into AWS, I was greeted by TBs of unmounted EBS volumes. Deleting the instances hadn’t deleted the storage, since Kubernetes knew that users might want to spin up a new cluster to use that data. I needed to delete the volumes and their snapshots manually — after paying for those volumes for days.

When you deploy a resilient platform, it solves a lot of challenges. But you need to learn how to manage it, or you’ll find yourself fighting (and cursing) the tools that are supposed to help you.

Kubernetes is not the only game in town

My old company chose Kubernetes, but there are other application development models in the cloud. Druva, for example, was built on AWS, so in addition to containers, it uses cloud services (e.g. Amazon Dynamo DB) and serverless computing. Like Kubernetes, cloud providers also provide built-in cost optimization, resiliency and easy upgrades. They also bring their own challenges with debugging and flexibility.

So, why did we choose to use Kubernetes vs. cloud-native?

  • Database Control — We wanted to control the versions of PostgreSQL and MongoDB so we were in control of security patches, features, performance, etc. We also wanted to use TimescaleDB (this was prior to Amazon Timestream) for analytics.
  • Skill Set/Background — We had a group of traditional application developers. Moving to cloud was overwhelming – e.g. access/egress fees, on-demand pricing, dynamic deployment. Containers preserved enough familiarity, so that we could focus on adapting to the cloud model. If we had tried to jump to serverless, we’d have had no anchor.
  • Portability — No layer is going to be completely cloud provider agnostic. Developers will always hook into local services and APIs. Still, in 3 weeks, we ported an application built for AWS to Azure. It’s unlikely that we’d have been able to do that if we’d completely hooked into the cloud provider’s environment.

Kubernetes vs. cloud-native isn’t a binary decision. You can choose Kubernetes for some applications and cloud-native for others. You can even mix Kubernetes and cloud-native in one application. Many developers use Kubernetes to run the application, while storing persistent data in Amazon RDS or Amazon S3.

Conclusion

With Kubernetes and containers, you can develop cloud applications better and faster. It helps manage scaling, resiliency, testing, and upgrading, so you can spend your time adding value. It’s not perfect, and it’s not the only platform to help you build applications, but it’s a powerful tool. While none of us knows where Kubernetes will end up, its potential is so exciting that you cannot afford to ignore it.

In fact, what makes Kubernetes so exciting is not just what it can help you do today, but what it is trying to achieve. “Infrastructure as Code” and “Data as Code” are not just clever taglines. They’re the underlying principles that promise to transform how to protect and manage data.

But we’ll talk more about that next time. Until then, explore more about how Druva uses cloud for our Druva Cloud Platform.