Platform
- Data Security Cloud
  Data Security Cloud
  Fully managed data security across enterprise, cloud, SaaS, and end user.
- Data Protection
  Data Protection
  Modernize data protection to reduce costs and complexity
- Cyber Response & Recovery
  Cyber Response & Recovery
  Bounce back from cyber attacks with data that is always safe and ready.
- eDiscovery & Compliance
  eDiscovery & Compliance
  Secure, protect, and streamline data governance.
- Meet Dru - Your Copilot for Data Security
Solutions
- Use Cases
  Use Cases
  Learn how Druva helps you accelerate key business initiatives
- Key Technologies
  - Public Cloud
    Public Cloud
    Protect native AWS and Azure deployments with secure backups without the cost and complexity
    
    AWS
    
    Azure
  - Hybrid Workloads
    Hybrid Workloads
    Transform data center backup and disaster recovery for virtual environments
    
    VMware
    
    Hyper-V
    
    Nutanix
    
    Oracle
    
    MS SQL
    
    SAP HANA
    
    NAS/files
  - Endpoint and SaaS Apps
    Endpoint and SaaS Apps
    Enterprise Cloud Backup and data management across edge, on-premises and cloud workloads
    
    End User Protection
    
    Microsoft 365
    
    Salesforce
    
    Google Workspace
    
    Microsoft Entra ID
- Free Trial
Customers
- Explore All Customer Stories
  We are trusted by the world's leading organizations to protect their data. Explore customer success stories to see how your peers are using Druva.
- Ransomware recovery ready
  Learn why Medallia chose Druva
  
  SaaS data protection across the enterprise
  See why Regeneron partnered with Druva
Resources
- Druva vs. Veeam TCO Calculator
  Find the hidden costs of legacy backup
  
  Data Resiliency for Dummies
  Get your guide to data resiliency
Partners
- Programs
  Programs
  Learn how you can profit with Druva and a cloud-first SaaS selling motion. Explore partner programs, access resources, and discover the benefits of partnering with Druva.
- Strategic Partners
  Strategic Partners
  Learn about Druva's strategic capabilities across platform, OEM, and other partnerships. Find out how Druva accelerates and protects customers' cloud journeys.
  - Dell Technologies
  - AWS
  - VMware
  - Nutanix
- Become a Partner
Company
- - Company
  - Leadership
  - Investors
  - Careers
  - Contact Us
  - Newsroom
  - Awards
  - Events
  - Diversity, Equity & Inclusion
  - Blog
- Get in touch with us
  Contact Us
  
  News, product innovations, and more
  Blog
Get Started
Support
Login
Language
- English
- Deutsch

Tech/Engineering, Innovation Series

MLOps: Deploying AI/ML Models into the Production Environment

October 05, 2021 Prabal Kumar, Principal ML Engineer

Building a machine learning model to solve a business problem and making it work well on the test data is only half of the battle. The job is not done until the model is really producing business value in the production environment. Making an ML model run in the production environment is very challenging and needs a lot of components and integration. In fact, ML model code is only a small fraction of the real-world ML system as described in this famous paper.

MLOps (machine learning operations) can be defined as a set of tools and practices to deploy and maintain ML products in production reliably and efficiently.

ML Project Lifecycle

At a very high level, the stages of ML project lifecycle are:

Define — Scoping the project
Data — Defining and organizing the data
Modeling — Selecting and training the model
Deployment — Deploying the trained model in production and then monitoring/maintaining it

ML is an iterative and continuous process, so we can see a lot of back and forth in these stages for various reasons. For example, one of the reasons for re-training could be data drift. Data drift is the change in the distribution of the data on which the model was trained and the current production data.

MLOps CI/CD Pipeline

Similar to DevOps, the automation of ML workflows can be managed by using CI/CD best practices. In contrast to DevOps where we only need to track the changes of the code, in MLOps we need to track the changes of the data and model as well.

Data, model, and code are the three axes of change in an ML application, and those need to be versioned properly. Versioning will also help in reproducibility. The MLOps CI/CD pipeline architecture could be designed as:

MLOps CI/CD pipeline architecture design image

At a high level, there are three stages of an MLOps pipeline:

Data pipeline — Transforming raw data into processed data by using data preparation, visualization, and feature engineering scripts.
Model pipeline — Training the models and choosing a model which is performing the best on the test data to be deployed in the production. Model monitoring is not just for the standard infrastructure metrics, but also for the data and model prediction performance.
Production environment — The deployed model is accessible as an API endpoint. The model artifact contains the model and the associated feature engineering scripts.

The stages of the pipeline are iterative and based on performance metrics; model training needs to be re-initiated from time to time. If required, the data pipeline can also be re-initiated.

MLOps Pipeline Characteristics

An MLOps pipeline must also be reproducible apart from being reliable, reusable, maintainable, and flexible. Without reproducibility, it would be impossible to determine if a new model is better than the previous model. The MLOps pipeline automatically bundles all related versions of the data, code, and model, otherwise it would be a nightmare to manage and track all these components manually.

MLOps Pipeline Testing

Automated testing helps identify the issues quickly and in their early stages. The data must also be validated for a sanity check before processing. The MLOps pipeline contains the test scripts in case any code breaks, and all tests need to be passed or else the pipeline will stop. Similarly, if all the models are not meeting minimum performance metrics criteria, none of them will be deployed and the pipeline will end there. Finally, there is integration testing for the end-to-end complete pipeline.

MLOps Platforms

There are many MLOps platforms available in the market to start with, such as Amazon sageMaker, Azure Machine Learning, Google Cloud AI, DataRobot, IBM Watson Machine Learning, and more. A new user can start quickly with open-source MLOps tools like Kubeflow, MLflow, Metaflow, etc. to learn more about MLOps.

MLOps Pipeline Performance

We should consider a cloud-first strategy for the training and deployment of ML models at scale. It will ensure the high availability of the ML model in production. We will also get the flexibility to choose the right instance type based on the model size, computation requirements, and number of requests to be served.

Skills Required for a MLOps Team

To be successful in productionizing ML models, we need the skills of both worlds: ML and software. Data scientists research and create the best possible models, while MLOps teams deploy the models into the production environment and maintain them. This is a very highly collaborative effort that includes data engineers. Data engineers manage the data pipeline, which is useful to transform raw data into processed data to be consumed directly for the training.

Key Takeaways and Next Steps

MLOps is an emerging field that helps to realize the actual business value of data science work by productionizing the models in an efficient, simple, and automated way. ML is experimental in nature which makes it even more important to follow the best practices of MLOps to standardize its processes. There is a learning curve to MLOps in either case — from a data science background or DevOps.

Looking to learn more about the technical innovations and best practices powering cloud backup and data management? Visit the Innovation Series section of Druva’s blog archive.

MLOps: Deploying AI/ML Models into the Production Environment

ML Project Lifecycle

MLOps CI/CD Pipeline

MLOps Pipeline Characteristics

MLOps Pipeline Testing

MLOps Platforms

MLOps Pipeline Performance

Skills Required for a MLOps Team

Key Takeaways and Next Steps

Blog

Druva Data Security Cloud

The Druva Platform

Data Protection

Cyber Response & Recovery

eDiscovery & Compliance

Use Cases

Key Technologies

Customers

Resources

Partners

Company