Domain 3: Deployment and Orchestration of ML Workflows (22% of the exam content)
This domain accounts for 22% of the exam content.
Topics
Task 3.1: Select deployment infrastructure based on existing architecture and requirements
Knowledge of:
Deployment best practices (for example, versioning, rollback strategies)
deployment services (for example, SageMaker)
Methods to serve ML models in real time and in batches
How to provision compute resources in production environments and test environments (for example, CPU, GPU)
Model and endpoint requirements for deployment endpoints (for example, serverless endpoints, real-time endpoints, asynchronous endpoints, batch inference)
How to choose appropriate containers (for example, provided or customized)
Methods to optimize models on edge devices (for example, SageMaker Neo)
Skills in:
Evaluating performance, cost, and latency tradeoffs
Choosing the appropriate compute environment for training and inference based on requirements (for example, GPU or CPU specifications, processor family, networking bandwidth)
Selecting the correct deployment orchestrator (for example, Apache Airflow, SageMaker Pipelines)
Selecting multi-model or multi-container deployments
Selecting the correct deployment target (for example, SageMaker endpoints, Kubernetes, Amazon Elastic Container Service [Amazon ECS], Amazon Elastic Kubernetes Service [Amazon EKS], Lambda)
Choosing model deployment strategies (for example, real time, batch)
Task 3.2: Create and script infrastructure based on existing architecture and requirements
Knowledge of:
Difference between on-demand and provisioned resources
How to compare scaling policies
Tradeoffs and use cases of infrastructure as code (IaC) options (for example, CloudFormation, Cloud Development Kit [ CDK])
Containerization concepts and container services
How to use SageMaker endpoint auto scaling policies to meet scalability requirements (for example, based on demand, time)
Skills in:
Applying best practices to enable maintainable, scalable, and cost-effective ML solutions (for example, automatic scaling on SageMaker endpoints, dynamically adding Spot Instances, by using Amazon EC2 instances, by using Lambda behind the endpoints)
Automating the provisioning of compute resources, including communication between stacks (for example, by using CloudFormation, CDK)
Building and maintaining containers (for example, Amazon Elastic Container Registry [Amazon ECR], Amazon EKS, Amazon ECS, by using bring your own container [BYOC] with SageMaker)
Configuring SageMaker endpoints within the VPC network
Deploying and hosting models by using the SageMaker SDK
Choosing specific metrics for auto scaling (for example, model latency, CPU utilization, invocations per instance)
Task 3.3: Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines
Knowledge of:
Capabilities and quotas for CodePipeline, CodeBuild, and CodeDeploy
Automation and integration of data ingestion with orchestration services
Version control systems and basic usage (for example, Git)
CI/CD principles and how they fit into ML workflows
Deployment strategies and rollback actions (for example, blue/green, canary, linear)
How code repositories and pipelines work together
Skills in:
Configuring and troubleshooting CodeBuild, CodeDeploy, and CodePipeline, including stages
Applying continuous deployment flow structures to invoke pipelines (for example, Gitflow, GitHub Flow)
Using services to automate orchestration (for example, to deploy ML models, automate model building)
Configuring training and inference jobs (for example, by using Amazon EventBridge rules, SageMaker Pipelines, CodePipeline)
Creating automated tests in CI/CD pipelines (for example, integration tests, unit tests, end-to-end tests)
Building and integrating mechanisms to retrain models