Cloudera EDH on AWS
Cloudera EDH Quick Start


Cloudera EDH on AWS

Cloudera's Enterprise Data Hub (EDH) allows you to store your data with the flexibility to run a variety of enterprise workloads—including batch processing, interactive SQL, enterprise search, and advanced analytics—while utilizing robust security, governance, data protection, and management.

AWS provides customers with the ability to set up the infrastructure to support EDH in a flexible, scalable, and cost-effective manner. This reference deployment will assist you in building an EDH cluster on AWS by integrating Cloudera Director with an automated deployment initiated by AWS CloudFormation.

This guide is meant primarily for the deployment of the Cloudera's EDH cluster on AWS. For additional administration and support topics related to Cloudera's Enterprise Data Hub, visit Cloudera Support.

Cost and Licenses

This deployment uses Cloudera Director to deploy EDH automatically into a configuration of your choice. You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. This reference deployment allows you to scale your cluster to any number of nodes. The instance type you select to meet your memory and compute requirements, and the number of nodes in your cluster will affect your cost.

Prices are subject to change. See the pricing pages for each AWS service you will be using for full details.

This deployment activates a 60-day trial of Cloudera Enterprise. To upgrade your version, see Managing Licenses on the Cloudera website.

AWS Services

The core AWS components used by this Quick Start include the following AWS services. (If you are new to AWS, see the Getting Started section of the AWS documentation.)

  • Amazon EC2 – The Amazon Elastic Compute Cloud (Amazon EC2) service enables you to launch virtual machine instances with a variety of operating systems. You can choose from existing Amazon Machine Images (AMIs) or import your own virtual machine images.

  • Amazon VPC – The Amazon Virtual Private Cloud (Amazon VPC) service lets you provision a private, isolated section of the AWS cloud where you can launch AWS services and other resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways.

  • NAT gateways – The Network Address Translation (NAT) Gateway service is a highly available AWS managed service that makes it easy to connect instances in a private VPC subnet to the internet. With NAT gateways, you don’t have to manage NAT instances, and the bandwidth available is no longer limited to the NAT instance size.

  • AWS CloudFormation – AWS CloudFormation lets you create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable way. You use a template to describe all the AWS resources (for example, EC2 instances) that you want. You don't have to individually create and configure the resources or figure out dependencies—AWS CloudFormation handles all of that.

  • IAM – AWS Identity and Access Management (IAM) enables you to securely control access to AWS services and resources for your users. With IAM, you can centrally manage users, security credentials such as access keys, and permissions that control which AWS resources users can access.