Deploy a robust pipeline that uses managed automation tools and machine learning (ML) services to simplify ML model development and production - MLOps Workload Orchestrator

Deploy a robust pipeline that uses managed automation tools and machine learning (ML) services to simplify ML model development and production

November 2020 (last update: May 2022)

The ML lifecycle is an iterative and repetitive process that involves changing models over time and learning from new data. As ML applications gain popularity, organizations are building new and better applications for a wide range of use cases including optimized email campaigns, forecasting tools, recommendation engines, self-driving vehicles, virtual personal assistants, and more. While operational and pipelining processes vary greatly across projects and organizations, the processes contain commonalities across use cases.

This solution helps you streamline and enforce architecture best practices by providing an extendable framework for managing ML pipelines for Amazon Web Services (AWS) ML services and third-party services. The solution’s template allows you to train models, upload trained models, configure the orchestration of the pipeline, initiate the start of the deployment process, move models through different stages of deployment, and monitor the successes and failures of the operations. The solution also provides a pipeline for building and registering Docker images for custom algorithms that can be used for model deployment on an Amazon SageMaker endpoint.

You can use batch and real-time data inferences to configure the pipeline for your business context. This solution increases your team’s agility and efficiency by allowing them to repeat successful processes at scale.

This solution provides the following key features:

  • A preconfigured pipeline initiated through an API call or a Git repository

  • Model training using Amazon SageMaker built-in algorithms and training job, hyperparameter tuning job, or autopilot job

  • A trained model deployed with an inference endpoint

  • Monitoring for deployed machine learning models and detection of any deviation in data quality, model quality, model bias, and/or model explainability

  • Support for running your own integration tests to ensure that the deployed model meets expectations

  • Provisioning multiple environments to support your ML model’s life cycle

  • Multi-account support for bring-your-own-model (BYOM) and model monitor pipelines

  • Building and registering Docker images for custom algorithms that can be used for model deployment on an SageMaker endpoint

  • The option to use SageMaker model registry to deploy versioned models

  • User notification of the pipeline outcome though SMS or email

The MLOps Workload Orchestrator solution currently offers 12 pipelines:

  • One pipeline to train ML models using SageMaker built-in algorithms and SageMaker training job

  • One pipeline to train ML models using SageMaker built-in algorithms and SageMaker hyperparameter tuning job

  • One pipeline to train ML models using SageMaker built-in algorithms and SageMaker autopilot job

  • Two BYOM real-time inference pipelines for ML models trained using both SageMaker built-in algorithms and custom algorithms

  • Two BYOM batch transform pipelines for ML models trained using both SageMaker built-in algorithms and custom algorithms

  • One custom algorithm image builder pipeline that can be used to build and register Docker images in Amazon Elastic Container Registry (Amazon ECR) for custom algorithms

  • Four model monitor pipelines to continuously monitor the quality of deployed machine learning models by the real-time inference pipeline and alerts for any deviations in data quality, model quality, model bias, and/or model explainability

To support multiple use cases and business needs, this solution provides two AWS CloudFormation templates for single account and multi-account deployments.

  • Template option 1 – Single account: Use the single-account template to deploy all of the solution’s pipelines in the same AWS account. This option is suitable for experimentation, development, and/or small-scale production workloads.

  • Template option 2 – Multi-account: Use the multi-account template to provision multiple environments (for example, development, staging, and production) across different AWS accounts, which improves governance and increases security and control of the ML pipeline’s deployment, provides safe experimentation and faster innovation, and keeps production data and workloads secure and available to ensure business continuity.

This implementation guide describes architectural considerations and configuration steps for deploying MLOps Workload Orchestrator in the AWS Cloud. It includes links to an AWS CloudFormation template that launches and configures the AWS services required to deploy this solution using AWS best practices for security and availability.

The solution is intended for IT infrastructure architects, machine learning engineers, data scientists, developers, DevOps, data analysts, and marketing technology professionals who have practical experience architecting in the AWS Cloud.