Deploying the Baseline Course Infrastructure - Teaching Big Data Skills with Amazon EMR

Deploying the Baseline Course Infrastructure

  1. Click to launch the course infrastructure stack.

    Blue button labeled "Launch Stack" with an arrow icon, indicating an action to start.

    The CloudFormation page launches in the AWS Management Console. The Amazon S3 URL is pre-filled with the CloudFormation template URL.

    CloudFormation create stack page with template options and pre-filled Amazon S3 URL.

    Figure 2: Create stack page

  2. Choose Next.

  3. On the Stack details page, type an easily identified Stack name. For example, emr-course-infrastructure.

    CloudFormation stack details form with stack name and various network parameters.

    Figure 3: Stack details page

  4. Review the Parameters and change as needed. These values are used to create a new VPC, subnets, route tables, NAT gateway, Internet Gateway, S3 buckets, and IAM users, groups and policies.

    Note

    When specifying a S3 bucket name, make sure the bucket name is unique globally.

    Along with the infrastructure setup, this step also creates three student IAM users and one course admin IAM user.

  5. Click Next.

  6. On the Configure stack options page, accept the default values or change as needed.

  7. Choose Next.

  8. On the Review page, review the selections and scroll to the Capabilities section. Select the check box I acknowledge that AWS CloudFormation might create IAM resources with custom names.

    Capabilities section with checkbox to acknowledge creation of IAM resources with custom names.

    Figure 4: Review page - acknowledgement

  9. Choose Create stack and wait for the cluster to deploy. A CREATE_IN_PROGRESS status message appears (Figure 5).

    CloudFormation stack creation in progress for emr-course-infrastructure.

    Figure 5: Cluster creation in progress

    Once baseline infrastructure is created, a CREATE_COMPLETE status message appears (Figure 6).

    CloudFormation stack named "emr-course-infrastructure" with CREATE_COMPLETE status.

    Figure 6: Cluster creation complete

  10. Select the emr-course-infrastructure stack name, and in the right pane, choose the Outputs tab.

  11. Make note of the following key|value.

    • PublicSubnet1

    • WebAccessSecurityGroup

    CloudFormation stack outputs showing VPC, subnet, and security group details for EMR infrastructure.

    Figure 7: Outputs