Domain 1: Data Engineering (20% of the exam content)
This domain accounts for 20% of the exam content.
Topics
Task 1.1: Create data repositories for ML
Identify data sources (for example, content and location, primary sources such as user data).
Determine storage mediums (for example, databases, Amazon S3, Amazon Elastic File System [Amazon EFS], Amazon Elastic Block Store [Amazon EBS]).
Task 1.2: Identify and implement a data ingestion solution
Identify data job styles and job types (for example, batch load, streaming).
-
Orchestrate data ingestion pipelines (batch-based ML workloads and streaming-based ML workloads).
Amazon Kinesis
Amazon Data Firehose
Amazon EMR
Glue
Amazon Managed Service for Apache Flink
Schedule jobs.
Task 1.3: Identify and implement a data transformation solution
Transform data in transit (ETL, Glue, Amazon EMR, Batch).
Handle ML-specific data by using MapReduce (for example, Apache Hadoop, Apache Spark, Apache Hive).