Deploy a robust pipeline that uses managed automation tools and machine learning (ML) services to simplify ML model development and production

Publication date: November 2020 (last update: June 2024)

The ML lifecycle is an iterative and repetitive process that involves changing models over time and learning from new data. As ML applications gain popularity, organizations are building new and better applications for a wide range of use cases including optimized email campaigns, forecasting tools, recommendation engines, self-driving vehicles, virtual personal assistants, and more. While operational and pipelining processes vary across projects and organizations, the processes contain commonalities across use cases.

The MLOps Workload Orchstrator solution helps you streamline and enforce architecture best practices by providing an extendable framework for managing ML pipelines for Amazon Web Services (AWS) ML services and third-party services. The solution’s template allows you to train models, upload trained models, configure the pipeline orchestration, initiate the start of the deployment process, move models through different stages of deployment, and monitor the successes and failures of the operations. The solution also provides a pipeline for building and registering Docker images for custom algorithms that can be used for model deployment on an Amazon SageMaker endpoint.

You can use batch and real-time data inferences to configure the pipelines for your business context. This solution increases your team’s agility and efficiency by allowing them to repeat successful processes at scale.

The MLOps Workload Orchestrator solution currently offers 12 pipelines:

One pipeline to train ML models using Amazon SageMaker built-in algorithms and Amazon SageMaker training job
One pipeline to train ML models using Amazon SageMaker built-in algorithms and Amazon SageMaker hyperparameter tuning job
One pipeline to train ML models using Amazon SageMaker built-in algorithms and Amazon SageMaker autopilot job
Two BYOM real-time inference pipelines for ML models trained using both Amazon SageMaker built-in algorithms and custom algorithms
Two BYOM batch transform pipelines for ML models trained using both Amazon SageMaker built-in algorithms and custom algorithms
One custom algorithm image builder pipeline that you can use to build and register Docker images in Amazon Elastic Container Registry (Amazon ECR) for custom algorithms
Four model monitor pipelines to continuously monitor the quality of deployed machine learning models by the real-time inference pipeline and alerts for deviations in data quality, model quality, model bias, and/or model explainability

To support multiple use cases and business needs, this solution provides two AWS CloudFormation templates for single-account and multi-account deployments:

Template option 1 – Single account – Use the single-account template to deploy all of the solution’s pipelines in the same AWS account. This option is suitable for experimentation, development, and/or small-scale production workloads.
Template option 2 – Multi-account – Use the multi-account template to provision multiple environments (for example, development, staging, and production) across different AWS accounts, which improves governance and increases security and control of the ML pipeline’s deployment, provides safe experimentation and faster innovation, and keeps production data and workloads secure and available to ensure business continuity.

This implementation guide provides an overview of the MLOps Workload Orchestrator solution, its reference architecture and components, considerations for planning the deployment, configuration steps for deploying the solution to the AWS Cloud.

The intended audience for using this solution’s features and capabilities in their environment includes IT infrastructure architects, ML engineers, data scientists, developers, DevOps, data analysts, and marketing technology professionals who have practical experience architecting in the AWS Cloud.

Use this navigation table to quickly find answers to these questions:

If you want to . . .	Read . . .
Know the cost for running this solution. The estimated cost for running this solution in the US East (N. Virginia) Region is USD $374.57 per month for AWS resources.	Cost
Understand the security considerations for this solution.	Security
Know how to plan for quotas for this solution.	Quotas
Know which AWS Regions support this solution.	Supported AWS Regions
View or download the AWS CloudFormation templates included in this solution to automatically deploy the infrastructure resources (the “stack”) for this solution.	AWS CloudFormation templates
Access the source code and optionally use the AWS Cloud Development Kit (AWS CDK) to deploy the solution.	GitHub repository

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Features and benefits