Domain 3: Data Operations and Support (22% of the exam content)
This domain accounts for 22% of the exam content.
Topics
Task 3.1: Automate data processing by using services
Knowledge of:
How to maintain and troubleshoot data processing for repeatable business outcomes
API calls for data processing
Which services accept scripting (for example, Amazon EMR, Amazon Redshift, Glue)
Skills in:
Orchestrating data pipelines (for example, Amazon MWAA, Step Functions)
Troubleshooting Amazon managed workflows
Calling SDKs to access Amazon features from code
Using the features of services to process data (for example, Amazon EMR, Amazon Redshift, Glue)
Consuming and maintaining data APIs
Preparing data transformation (for example, Glue DataBrew)
Querying data (for example, Amazon Athena)
Using Lambda to automate data processing
Managing events and schedulers (for example, EventBridge)
Task 3.2: Analyze data by using services
Knowledge of:
Tradeoffs between provisioned services and serverless services
SQL queries (for example, SELECT statements with multiple qualifiers or JOIN clauses)
How to visualize data for analysis
When and how to apply cleansing techniques
Data aggregation, rolling average, grouping, and pivoting
Skills in:
Visualizing data by using services and tools (for example, Glue DataBrew, Amazon QuickSight)
Verifying and cleaning data (for example, Lambda, Athena, QuickSight, Jupyter Notebooks, Amazon SageMaker Data Wrangler)
Using Athena to query data or to create views
Using Athena notebooks that use Apache Spark to explore data
Task 3.3: Maintain and monitor data pipelines
Knowledge of:
How to log application data
Best practices for performance tuning
How to log access to services
Amazon Macie, CloudTrail, and Amazon CloudWatch
Skills in:
Extracting logs for audits
Deploying logging and monitoring solutions to facilitate auditing and traceability
Using notifications during monitoring to send alerts
Troubleshooting performance issues
Using CloudTrail to track API calls
Troubleshooting and maintaining pipelines (for example, Glue, Amazon EMR)
Using Amazon CloudWatch Logs to log application data (with a focus on configuration and automation)
Analyzing logs with services (for example, Athena, Amazon EMR, Amazon OpenSearch Service, CloudWatch Logs Insights, big data application logs)
Task 3.4: Ensure data quality
Knowledge of:
Data sampling techniques
How to implement data skew mechanisms
Data validation (data completeness, consistency, accuracy, and integrity)
Data profiling
Skills in:
Running data quality checks while processing the data (for example, checking for empty fields)
Defining data quality rules (for example, Glue DataBrew)
Investigating data consistency (for example, Glue DataBrew)