Deploy a robust pipeline that uses managed automation tools and machine learning (ML) services to simplify ML model development and production
November 2020 (last update: January 2023)
The machine learning (ML) lifecycle is an iterative and repetitive process that involves changing models over time and learning from new data. As ML applications gain popularity, organizations are building new and better applications for a wide range of use cases including optimized email campaigns, forecasting tools, recommendation engines, self-driving vehicles, virtual personal assistants, and more. While operational and pipelining processes vary greatly across projects and organizations, the processes contain commonalities across use cases.
This solution helps you streamline and enforce architecture best practices by providing an
extendable framework for managing ML pipelines for Amazon Web Services (AWS) ML services and
third-party services. The solution’s template allows you to train models, upload trained models,
configure the orchestration of the pipeline, initiate the start of the deployment process, move
models through different stages of deployment, and monitor the successes and failures of the
operations. The solution also provides a pipeline for building and registering Docker images for
custom algorithms that can be used for model deployment on an Amazon SageMaker
Template option 2: Multi account deployment of this solution does not yet support Amazon SageMaker Model Dashboard.
You can use batch and real-time data inferences to configure the pipeline for your business context. This solution increases your team’s agility and efficiency by allowing them to repeat successful processes at scale.
This solution provides the following key features:
-
A preconfigured pipeline initiated through an API call or a Git repository
-
Model training using Amazon SageMaker built-in algorithms and training job, hyperparameter tuning job, or autopilot job
-
A trained model deployed with an inference endpoint
Monitoring for deployed machine learning models and detection of any deviation in data quality, model quality, model bias, and/or model explainability
-
Support for running your own integration tests to ensure that the deployed model meets expectations
-
Provisioning multiple environments to support your ML model’s life cycle
-
Multi-account support for bring-your-own-model (BYOM) and model monitor pipelines
-
Building and registering Docker images for custom algorithms that can be used for model deployment on an SageMaker endpoint
-
The option to use SageMaker model registry to deploy versioned models
-
User notification of the pipeline outcome though SMS or email
-
Support of Amazon SageMaker Model Card operations (
create
,describe
,update
,delete
,list
, andexport model cards
) -
List of solution-created Amazon SageMaker resources (such as models, endpoints, model cards, and batch transform jobs) in the Amazon SageMaker Model Dashboard
The MLOps Workload Orchestrator solution currently offers 12 pipelines:
-
One pipeline to train ML models using SageMaker built-in algorithms and SageMaker training job
-
One pipeline to train ML models using SageMaker built-in algorithms and SageMaker hyperparameter tuning job
-
One pipeline to train ML models using SageMaker built-in algorithms and SageMaker autopilot job
Two BYOM real-time inference pipelines for ML models trained using both SageMaker built-in algorithms and custom algorithms
Two BYOM batch transform pipelines for ML models trained using both SageMaker built-in algorithms and custom algorithms
One custom algorithm image builder pipeline that can be used to build and register Docker images in Amazon Elastic Container Registry (Amazon ECR) for custom algorithms
Four model monitor pipelines to continuously monitor the quality of deployed machine learning models by the real-time inference pipeline and alerts for any deviations in data quality, model quality, model bias, and/or model explainability
To support multiple use cases and business needs, this solution provides two AWS CloudFormation templates for single account and multi-account deployments.
Template option 1 – Single account: Use the single-account template to deploy all of the solution’s pipelines in the same AWS account. This option is suitable for experimentation, development, and/or small-scale production workloads.
Template option 2 – Multi-account: Use the multi-account template to provision multiple environments (for example, development, staging, and production) across different AWS accounts, which improves governance and increases security and control of the ML pipeline’s deployment, provides safe experimentation and faster innovation, and keeps production data and workloads secure and available to ensure business continuity.
This implementation guide describes architectural considerations and configuration steps for
deploying MLOps Workload Orchestrator in the AWS Cloud. It includes links to an AWS CloudFormation
The solution is intended for IT infrastructure architects, machine learning engineers, data scientists, developers, DevOps, data analysts, and marketing technology professionals who have practical experience architecting in the AWS Cloud.