Organizations adopt DevOps and Continuous Integration/Continuous Deployment (CI/CD) because they want to build applications faster. Business leaders envision pipelines filled with revenue-generating code that is written, tested, and deployed within minutes.
To realize the DevOps dream, the team needs to solve four challenges:
- Automate the pipeline
- Scale up and down to meet the workload
- Access realistic data sets
- Protect the environment
Having built (and blogged about) a DevOps pipeline with Jenkins and Kubernetes, I’ve felt the pain. The combination of Kubernetes and cloud helps customers build reliable, scalable DevOps pipelines.
Automate DevOps with Kubernetes
Developers want to push new code, test it, and put it into production as quickly as possible. They will not wait for IT to provision new resources, and they struggle to share with one another. DevOps needs to provide a shared, automated, self-service platform that keeps developers logically isolated. DevOps needs Kubernetes.
Kubernetes provides reliable automated infrastructure for containerized applications. Developers work on a common cluster, but have their own namespaces. While the hardware is shared, Kubernetes controls CPU and memory consumption for each application. Everybody gets a private sandbox in a shared, scalable platform.
In our DevOps pipeline, developers built and pushed updates to their own environment multiple times a day, without disrupting one another. Those changes would then be tested on a central test cluster before being published to the rest of the development group. It took one click, nothing more.
Of course, once we could test with one click, the DevOps pipeline became very popular. We hit the physical limits of the data center cluster. Without more hardware, it could not handle the bursts of activity.
Scale DevOps with the cloud
When the DevOps pipeline becomes popular, you have three choices.
- Buy hardware to handle the peak of traffic and accept that the environment will usually be underutilized
- Set up queues for the pipeline and accept unhappy developers
- Move your DevOps pipeline to the cloud and embrace the future
The cloud provides near-infinite scaling, so that it can meet any burst in developer traffic, then scale down after the spike passes. Of course, the major cloud providers offer native Kubernetes solutions, as well.
I ran and tested dozens of builds a day with Jenkins + Kubernetes + AWS, scaled to the highest bursts and never wasted resources when developers were idle.
Of course, once we moved to the cloud, we were surprised that developers filled the DevOps pipeline with long-running jobs. We found those jobs spent most of their time creating or loading data.
Manage DevOps data with Kubernetes
Data is the biggest challenge for DevOps pipelines. Applications need data to do interesting work. Unfortunately, developers cannot “spin up” realistic datasets quickly and easily. Current approaches suffer from severe limitations:
- Data generation tools: It takes too long to create a significant amount of data, and the result lacks complexity. The tests do not catch bugs caused by realistic data sets, and production ends up failing.
- Use a storage clone of production data: Connecting to external sources of data can be complicated (network, security) and risky (data privacy, data leakage). Teams spend more time meeting and troubleshooting than in using the pipeline.
- Use a Copy Data Management product: Cost (another tool), complexity (network, security), and limited support (handful of high-end databases) make it a niche option. This approach does not cover all of the data used by a modern application.
Kubernetes solves the DevOps challenge with persistent data. First, the Kubernetes Container Storage Interface (CSI) simplifies storage provisioning. With a few lines in the application definition (i.e. the YAML specification), each container requests the amount and type of storage that it needs. As the application runs, developers create and tag a snapshot of their data. Developers can then use a copy of that snapshot for any application because, via CSI, Kubernetes will populate their storage with data from that snapshot. Developers can get realistic data sets quickly, easily, and securely through their standard Kubernetes interface. With CSI, Kubernetes can now provision everything the container needs — CPU, memory, storage and data — without contacting an administrator.
DevOps with easily accessible data transforms the development process. We built multiple datasets, each with a different focus for testing. When a user submitted a change, we created multiple instances of their application, each testing against a different dataset. With a parallelized Kubernetes DevOps pipeline, test times dropped by 10x while confidence increased.
Of course, now that our DevOps pipeline had become business-critical, everybody started to worry about what would happen if something went wrong.
Protect DevOps pipelines with Kubernetes (coming soon)
DevOps teams must protect the environment and data, so they do not incur downtime or data loss. The DevOps teams’ challenges are not new to IT, but they are new to DevOps.
Organizations should protect both the Kubernetes infrastructure and the data. Even in the cloud, disasters can strike — e.g. external attacks, administrator errors, and product defects. Teams must protect the cluster and application configuration, so they can recreate the environment with one-click recovery.
They also must protect the data copies. The data snapshots can be deleted or corrupted — e.g. rogue administrator, ransomware, errors — which would halt the DevOps pipeline. Therefore, mature groups follow traditional backup “3-2-1” best practices — at least 3 copies on at least 2 types of media with at least 1 on a different site/account.
Organizations need a protection solution that works across their environment. Most groups building DevOps in the cloud still have on-premises DevOps pipelines and production instances. They need a standard solution to protect their clusters on-premises and in the cloud. Even more, they should support secure, efficient data and application movement between on-premises and the cloud.
As valuable as a DevOps pipeline is, management expects everybody to streamline costs. The cloud and Kubernetes minimize the compute costs, since they consume resources only when somebody is using the pipeline. Persistent data, however, is always there. Nobody wants to pay peak prices for idle data. Furthermore, cloud providers charge not just for storage, but for access. A cost-efficient solution should tier data with predictable costs that decrease over time.
Today, there are few comprehensive protection solutions for DevOps pipelines and Kubernetes. We stitched together a solution built of backup and open source components — e.g. SaaS backup and cloud snapshot management for the data and Velero for the cluster metadata. It was not easy to set up, manage, or recover. The current complexity of data protection means that most people do not protect their DevOps pipelines.
Kubernetes has created a Data Protection Workgroup to address the protection challenges in a more systemic manner. We are working to help Kubernetes deliver a reliable, secure DevOps pipeline.
It is time to get started
More organizations are trying to build a DevOps pipeline to enable Continuous Integration and Continuous Deployment. Kubernetes and the cloud led the way with an automated, scalable solution. In the past two years, Kubernetes has integrated storage, data management, and data protection, so companies can now build a scalable pipeline for real applications. Now is the time to join us and ride the DevOps wave by using Kubernetes in the cloud.
The cloud is changing the way you manage and protect your data. Learn why the Druva Cloud Platform is the data protection solution for your data center, cloud, and endpoint workloads.
To learn more, join us at the VMware User Group (VMUG) Kubernetes/DevOps & Cloud virtual conference. Click here to register and either stop by our booth to chat with the experts or swing by the auditorium to attend Stephen’s breakout session, Going to the cloud? Now what? at 9:45 AM PT.