Domain 3: Data Operations and Support (22% of the exam content) - AWS Certification

Domain 3: Data Operations and Support (22% of the exam content)

This domain accounts for 22% of the exam content.

Task 3.1: Automate data processing by using services

Knowledge of:

  • How to maintain and troubleshoot data processing for repeatable business outcomes

  • API calls for data processing

  • Which services accept scripting (for example, Amazon EMR, Amazon Redshift, Glue)

Skills in:

  • Orchestrating data pipelines (for example, Amazon MWAA, Step Functions)

  • Troubleshooting Amazon managed workflows

  • Calling SDKs to access Amazon features from code

  • Using the features of services to process data (for example, Amazon EMR, Amazon Redshift, Glue)

  • Consuming and maintaining data APIs

  • Preparing data transformation (for example, Glue DataBrew)

  • Querying data (for example, Amazon Athena)

  • Using Lambda to automate data processing

  • Managing events and schedulers (for example, EventBridge)

Task 3.2: Analyze data by using services

Knowledge of:

  • Tradeoffs between provisioned services and serverless services

  • SQL queries (for example, SELECT statements with multiple qualifiers or JOIN clauses)

  • How to visualize data for analysis

  • When and how to apply cleansing techniques

  • Data aggregation, rolling average, grouping, and pivoting

Skills in:

  • Visualizing data by using services and tools (for example, Glue DataBrew, Amazon QuickSight)

  • Verifying and cleaning data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler)

  • Using Athena to query data or to create views

  • Using Athena notebooks that use Apache Spark to explore data

Task 3.3: Maintain and monitor data pipelines

Knowledge of:

  • How to log application data

  • Best practices for performance tuning

  • How to log access to services

  • Amazon Macie, CloudTrail, and Amazon CloudWatch

Skills in:

  • Extracting logs for audits

  • Deploying logging and monitoring solutions to facilitate auditing and traceability

  • Using notifications during monitoring to send alerts

  • Troubleshooting performance issues

  • Using CloudTrail to track API calls

  • Troubleshooting and maintaining pipelines (for example, Glue, Amazon EMR)

  • Using Amazon CloudWatch Logs to log application data (with a focus on configuration and automation)

  • Analyzing logs with services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs)

Task 3.4: Ensure data quality

Knowledge of:

  • Data sampling techniques

  • How to implement data skew mechanisms

  • Data validation (data completeness, consistency, accuracy, and integrity)

  • Data profiling

Skills in:

  • Running data quality checks while processing the data (for example, checking for empty fields)

  • Defining data quality rules (for example, Glue DataBrew)

  • Investigating data consistency (for example, Glue DataBrew)