Innovation Series, Tech/Engineering

What you need to know about Kubernetes applications (and protecting them)

Production applications have been moving from virtual machines to containers and from on-prem to cloud. The increase in the use of containers demanded the need for an orchestration platform such as Kubernetes to manage the containers. Now, Kubernetes is the most commonly used orchestration tool for managing containerized applications. But what really is a Kubernetes application?

Kubernetes environment and applications

Let’s start with your Kubernetes environment. If you are running Kubernetes in your Virtual Private Cloud (VPC), you can have just one cluster or even a few dozen clusters. A cluster is a collection of one or even hundreds of nodes. Your control plane manages the worker nodes. In production, for high availability and fault tolerance, you’d have multiple nodes. Your control plane might run across several machines and you wouldn’t run your user containers on the machine running the control plane. In your control plane, the Kubernetes API is the end point for the control plane functions. The etcd is a key-value store that stores your configuration details such as database connection details. 

The pods run in the worker node. You might have one to a few hundred pods; rarely in the thousands. They can contain one to a few thousand containers. The smaller and less cluttered your cluster, the better you can manage it. Namespace is a mechanism to isolate groups of resources in a cluster. You can learn more about Kubernetes concepts here.

An application is the entire entity that you care about, and solves your business case. For instance, your application can contain your customer portal plus the database that your customer portal calls to serve your customers. What does this application look like in Kubernetes? Your application runs in a namespace. An application running within a Kubernetes environment consists of native Kubernetes resources (e.g. service accounts, stateful sets, persistent volumes, secrets, etc.) and potentially custom resources that are defined specifically for that application. Stateful applications have containers in the pod that share resources such as storage. Persistent volumes(PVs) are a piece of storage in your cluster. Your pods might talk to external storage as well such as Amazon RDS, EBS, etc. Your external storage EBS are made available inside the cluster via PV. The pod will request the storage via a PersistentVolumeClaim(PVC). Your application might even have dozens of PVCs. 

The components of an application include:

  • Kubernetes-native resources (e.g. pods, secrets, configmaps, etc.)
  • Custom defined resources (e.g. CRDs)
  • Persistent volumes (e.g. CSI)
  • External data stores (e.g. Amazon RDS, EBS) 

Understand what your organization users do with Kubernetes applications 

There are three main types of stakeholders in your organization who deal with your Kubernetes environment in the cloud:

  1. The Cloud Administrator: Manages cloud workloads in your organization. Responsible for protecting cloud workloads, policies, roles, and managing backup, restore, and retention.
  2. The Kubernetes Administrator: Cloud admin with Kubernetes expertise managing Kubernetes clusters in the organization. Responsible for setting up clusters and monitoring Kubernetes clusters in your organization. For some organizations, cloud admin and Kubernetes admin can be the same person. In many cases they belong to the same central/cloud administration team.
  3. The Application Administrator: Owners and creators of Kubernetes applications. They know what comprises the application. There are typically multiple app owners in an organization.

Since there is no Kubernetes application object, a custom application definition enables the application owner and central backup team to have a shared understanding of what to protect. But at the same time, you cannot expect the application owner to list every single resource that makes an application. The data protection solution should do the heavy lifting on identifying the applications.

What can go wrong and what should your Kubernetes application protection strategy be?

It is no surprise that many things can go wrong in a multi-user environment, including:

  1. Unintended modifications
  2. Accidental deletion
  3. Malicious internal threats
  4. Ransomware attacks or other external threats

Though Kubernetes is known for resiliency, it can only bring back the container infrastructure, not the data, thus the state is lost. Moreover, there is no Kubernetes application object, so Kubernetes does not know what your application truly is. Hence, you need an application-centric Kubernetes protection solution. This solution should provide cross-namespace, cross-cluster, and cross-region recovery options. Your application-centric Kubernetes protection solution should enable your users to be able to do the following:

Application disaster recovery

In the event of a catastrophic application failure, the original application might no longer be running. Users expect to recover their application(s) sometimes even in a different region. That includes application resources and data to be recovered. 

Application rollback

In the event of an unintended change to an application, including configuration and/or data, users expect to revert their application to the point in time a backup was created. Users expect that the revert not only re-creates/modifies application resources, but also that it will eliminate resources that did not exist at the point of time of the backup. They will also expect an option to not overwrite existing resources. 

Application migration

Users will want to move applications for multiple reasons, including cost optimization, load balancing, and cluster upgrades. While migration is not strictly a protection use case, many organizations leverage their protection tools for migration. 

Depending on how the migration process is managed, the source application may be running concurrently with the migrated application until a “cutover” process occurs to minimize risk and downtime. 

Application cloning

Users will want to clone applications for multiple reasons, including training, development, and upgrade testing. While cloning is not strictly a protection use case, many users leverage their protection tools for cloning. 

Since the clone will run concurrently with the production instance, the process should ensure that resources do not conflict (e.g. renaming of resources) and that data is copied. Admins may also want to retain provenance information for clones to either track the clone copies or to enable updates to the clones (pushing a new “golden copy”).

Application retrieval

Admins will need to retrieve past versions of applications for reasons, including legal cases, project retrieval, or regulatory compliance. Traditionally, retrieval was focused on data, but now application retrieval is becoming more important. Application retrieval enables users to recreate the application flow and view the data in context.

Application resource recovery

Admins will need to recover a subset of an application for reasons ranging from legal cases to testing a subset of an application, to rolling back only one part of an application. They need a mechanism by which they can specify the resources they want to recover, and the protection solution must validate that dependent resources are in place to ensure a successful recovery.

Some backup admins might just choose to protect the namespace. The data protection solution should still be able to do that. The challenge with namespace protection is that it only has the option to use simplistic crash-consistent protection mechanisms, the application owner cannot easily specify a recovery, and there is no clear connection between the backup team and application team.

Protection is not just for backups, but recovery too

The security for protecting the application is not just applicable to backing up data. The application protection solution should have security postures in place in each layer as explained in the following:

Installation

The images of the protection solution should be certified so Kubernetes admins can be sure that they are deploying only the intended image in their environment. This is to prevent any malware disguised as the data protection solution.

The protection solution’s permissions needed in the environment should be restrictive. For instance, the data protection operator should not have the permissions to delete resources in the cluster. 

Backup

The data protection solution should provide options to encrypt the data. Immutability of backups is critical to prevent ransomware attacks. The data protection solution should restrict access to sensitive Kubernetes objects such as secrets. Moreover, the data protection solution should be able to work with secret management tools used currently in the Kubernetes environment.

Orchestration

The data protection solution should store metadata outside the cluster such that in case of any cluster disaster or breach, the backup metadata is not corrupted. This centralized metadata should be able to support cross-account, cross-region restores in case of disasters. 

The communication between the data protection operator component in the cluster and the orchestration layer should be secure from end to end. Authorization is managed between the two protection components by the data protection solution.

Restore

The data protection solution should provide the ability to manage users and groups that have access to restore the backups. Recovery of applications should be scoped for the environment and the cloud admin should be able to define the scope in which the restore is valid. This ensures that no bad actors, internal threats, or leaks can re-create the application beyond the defined scope.

Conclusion

Kubernetes is the most commonly used container orchestration tool to run production applications. Though Kubernetes is resilient, it is a complex environment; it cannot bring back data and it does not have application objects, thus making Kubernetes application protection critical. There are multiple users in your organization who interact with Kubernetes environments, i.e. members from the central admin team and the application owners. Users will need the data protection solution to provide disaster recovery, rollback, cloning, retrieval for compliance, and resource recovery for the applications. Moreover, the data protection solution should have security posture for this modern workload across installation, backup, orchestration, and recovery. 

Druva’s Kubernetes protection provides application protection for your Kubernetes workloads running on AWS. It supports various application protection and recovery needs for multiple Kubernetes stakeholders across the organization with security posture incorporated in each layer of data protection. Learn more about Druva’s Kubernetes protection on the website, and watch the video below for a demo of the solution in action.