Deploying the Baseline Course Infrastructure - Teaching Big Data Skills with Amazon EMR

Deploying the Baseline Course Infrastructure

  1. Click to launch the course infrastructure stack.

    The CloudFormation page launches in the AWS Management Console. The Amazon S3 URL is pre-filled with the CloudFormation template URL.

    Figure 2: Create stack page

  2. Choose Next.

  3. On the Stack details page, type an easily identified Stack name. For example, emr-course-infrastructure.

    Figure 3: Stack details page

  4. Review the Parameters and change as needed. These values are used to create a new VPC, subnets, route tables, NAT gateway, Internet Gateway, S3 buckets, and IAM users, groups and policies.


    When specifying a S3 bucket name, make sure the bucket name is unique globally.

    Along with the infrastructure setup, this step also creates three student IAM users and one course admin IAM user.

  5. Click Next.

  6. On the Configure stack options page, accept the default values or change as needed.

  7. Choose Next.

  8. On the Review page, review the selections and scroll to the Capabilities section. Select the check box I acknowledge that AWS CloudFormation might create IAM resources with custom names.

    Figure 4: Review page - acknowledgement

  9. Choose Create stack and wait for the cluster to deploy. A CREATE_IN_PROGRESS status message appears (Figure 5).

    Figure 5: Cluster creation in progress

    Once baseline infrastructure is created, a CREATE_COMPLETE status message appears (Figure 6).

    Figure 6: Cluster creation complete

  10. Select the emr-course-infrastructure stack name, and in the right pane, choose the Outputs tab.

  11. Make note of the following key|value.

    • PublicSubnet1

    • WebAccessSecurityGroup

    Figure 7: Outputs