Domain 4: Machine Learning Implementation and Operations (20% of the exam content) - AWS Certification

Domain 4: Machine Learning Implementation and Operations (20% of the exam content)

This domain accounts for 20% of the exam content.

Task 4.1: Build ML solutions for performance, availability, scalability, resiliency, and fault tolerance

  • Log and monitor environments.

    • CloudTrail and Amazon CloudWatch

    • Build error monitoring solutions.

  • Deploy to multiple Regions and multiple Availability Zones.

  • Create AMIs and golden images.

  • Create Docker containers.

  • Deploy Auto Scaling groups.

  • Rightsize resources (for example, instances, Provisioned IOPS, volumes).

  • Perform load balancing.

  • Follow best practices.

Task 4.2: Recommend and implement the appropriate ML services and features for a given problem

  • ML on (application services), for example:

    • Amazon Polly

    • Amazon Lex

    • Amazon Transcribe

    • Amazon Q

  • Understand service quotas.

  • Determine when to build custom models and when to use Amazon SageMaker built-in algorithms.

  • Understand infrastructure (for example, instance types) and cost considerations.

    • Use Spot Instances to train deep learning models by using Batch.

Task 4.3: Apply basic security practices to ML solutions

  • Identity and Access Management (IAM)

  • S3 bucket policies

  • Security groups

  • VPCs

  • Encryption and anonymization

Task 4.4: Deploy and operationalize ML solutions

  • Expose endpoints and interact with them.

  • Understand ML models.

  • Perform A/B testing.

  • Retrain pipelines.

  • Debug and troubleshoot ML models.

    • Detect and mitigate drops in performance.

    • Monitor performance of the model.