Domain 2: Exploratory Data Analysis (24% of the exam content) - AWS Certification

Domain 2: Exploratory Data Analysis (24% of the exam content)

This domain accounts for 24% of the exam content.

Task 2.1: Sanitize and prepare data for modeling

  • Identify and handle missing data, corrupt data, and stop words.

  • Format, normalize, augment, and scale data.

  • Determine whether there is sufficient labeled data.

    • Identify mitigation strategies.

    • Use data labelling tools (for example, Amazon Mechanical Turk).

Task 2.2: Perform feature engineering

  • Identify and extract features from datasets, including from data sources such as text, speech, images, and public datasets.

  • Analyze and evaluate feature engineering concepts (for example, binning, tokenization, outliers, synthetic features, one-hot encoding, reducing dimensionality of data).

Task 2.3: Analyze and visualize data for ML

  • Create graphs (for example, scatter plots, time series, histograms, box plots).

  • Interpret descriptive statistics (for example, correlation, summary statistics, p-value).

  • Perform cluster analysis (for example, hierarchical, diagnosis, elbow plot, cluster size).