Cloudera EDH on AWS
Cloudera EDH Quick Start

Architecture

AWS CloudFormation provides an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

This Quick Start deploys and configures the following components:

  • A VPC configured with four subnets, two public and two private.

  • A NAT gateway configured in the public subnet to allow outbound internet access for the instances in the private subnet. The gateway is configured with an Elastic IP address.

    Note

    If you choose the option to create a new VPC, the Quick Start creates and configures the VPC, the two private and two public subnets, and the NAT gateway for you. If you choose the option to deploy Cloudera EDH into an existing VPC, the Quick Start requires the described configuration.

  • A Linux server instance deployed in the public subnet for downloading Cloudera Director and various configuration files and scripts.

  • An IAM instance role with fine-grained permissions for access to AWS services necessary for the deployment process.

  • Security groups for each instance or function to restrict access to only necessary protocols and ports.

  • A placement group to provide a logical grouping of instances and enable applications to participate in a low-latency, 10 Gbps network (optional).

  • A fully customizable EDH cluster including worker nodes, edge nodes, and management nodes that you define based on your compute and storage requirements.

In this reference architecture, we support two options for deploying Cloudera's Enterprise Data Hub within a VPC. One option is to launch all the nodes within a public subnet that provides direct internet access. The second option is to deploy all the nodes within a private subnet. The reference deployment builds both public and private subnets, and the cluster can be deployed in either subnet using the configuration file.

EDH Cluster in a Public Subnet

This option builds the following environment in the AWS Cloud.


          Cloudera public subnet topology

Figure 1: Public subnet topology

A public subnet cluster topology includes an EC2 instance (referred to as the cluster launcher instance), which is launched within the public subnet. An Elastic IP address is assigned to the instance, and a security group that allows SSH access to the instance is created. The cluster launcher instance then builds the EDH cluster by launching all the Hadoop-related EC2 instances within the public subnet. In this topology, all the launched instances have direct access to the internet.

EDH Cluster in a Private Subnet

This option builds the following environment in the AWS cloud.


          Cloudera private subnet topology

Figure 2: Private subnet topology

A private subnet cluster topology launches the cluster launcher instance, which is in the public subnet. An Elastic IP address is assigned to the instance, and a security group that allows SSH access to the instance is created. All other Hadoop-related EC2 instances are created within the private subnet. In this topology, the EC2 instances within the EDH cluster do not have direct access to the internet. Instead, they access the internet through the NAT gateway. In this topology, the only publicly accessible component is the cluster launcher in the public subnet.