Domain 1: Data Engineering (20% of the exam content) - AWS Certification

Domain 1: Data Engineering (20% of the exam content)

This domain accounts for 20% of the exam content.

Task 1.1: Create data repositories for ML

  • Identify data sources (for example, content and location, primary sources such as user data).

  • Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]).

Task 1.2: Identify and implement a data ingestion solution

  • Identify data job styles and job types (for example, batch load, streaming).

  • Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads).

    • Amazon Kinesis

    • Amazon Data Firehose

    • Amazon EMR

    • Glue

    • Amazon Managed Service for Apache Flink

  • Schedule jobs.

Task 1.3: Identify and implement a data transformation solution

  • Transform data in transit (ETL, Glue, Amazon EMR, Batch).

  • Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).