Domain 4: Machine Learning Implementation and Operations (20% of the exam content)
This domain accounts for 20% of the exam content.
Topics
Task 4.1: Build ML solutions for performance, availability, scalability, resiliency, and fault tolerance
-
Log and monitor environments.
CloudTrail and Amazon CloudWatch
Build error monitoring solutions.
Deploy to multiple Regions and multiple Availability Zones.
Create AMIs and golden images.
Create Docker containers.
Deploy Auto Scaling groups.
Rightsize resources (for example, instances, Provisioned IOPS, volumes).
Perform load balancing.
Follow best practices.
Task 4.2: Recommend and implement the appropriate ML services and features for a given problem
-
ML on (application services), for example:
Amazon Polly
Amazon Lex
Amazon Transcribe
Amazon Q
Understand service quotas.
Determine when to build custom models and when to use Amazon SageMaker built-in algorithms.
-
Understand infrastructure (for example, instance types) and cost considerations.
Use Spot Instances to train deep learning models by using Batch.
Task 4.3: Apply basic security practices to ML solutions
Identity and Access Management (IAM)
S3 bucket policies
Security groups
VPCs
Encryption and anonymization
Task 4.4: Deploy and operationalize ML solutions
Expose endpoints and interact with them.
Understand ML models.
Perform A/B testing.
Retrain pipelines.
-
Debug and troubleshoot ML models.
Detect and mitigate drops in performance.
Monitor performance of the model.