Menu
Amazon EMR
Amazon EMR Release Guide

Use Hue with a Remote Database in Amazon RDS

By default, Hue user information and query histories are stored in a local MySQL database on the master node. However, you can create one or more Hue-enabled clusters using a configuration stored in Amazon S3 and a MySQL database in Amazon RDS. This allows you to persist user information and query history created by Hue without keeping your Amazon EMR cluster running. We recommend using Amazon S3 server-side encryption to store the configuration file.

First create the remote database for Hue.

To create the external MySQL database

  1. Open the Amazon RDS console at https://console.aws.amazon.com/rds/.

  2. Click Launch a DB Instance.

  3. Choose MySQL and click Select.

  4. Leave the default selection of Multi-AZ Deployment and Provisioned IOPS Storage and click Next.

  5. Leave the Instance Specifications at their defaults, specify Settings, and click Next.

  6. On the Configure Advanced Settings page, choose a proper security group and database name. The security group you use must at least allow ingress TCP access for port 3306 from the master node of your cluster. If you have not created your cluster at this point, you can allow all hosts to connect to port 3306 and adjust the security group after you have launched the cluster. Click Launch DB Instance.

  7. From the RDS Dashboard, click on Instances and select the instance you have just created. When your database is available, you can open a text editor and copy the following information: dbname, username, password, and RDS instance hostname. You will use information when you create and configure your cluster.

To specify an external MySQL database for Hue when launching a cluster using the AWS CLI

To specify an external MySQL database for Hue when launching a cluster using the AWS CLI, use the information you noted when creating your RDS instance for configuring hue.ini with a configuration object

Note

You can create multiple clusters that use the same external database, but each cluster will share query history and user information.

  • Create a cluster with Hue installed, using the external database you created:

    aws emr create-cluster --release-label --applications Name=Hue Name=Spark Name=Hive \ --instance-type m3.xlarge --instance-count 2 --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json

    Note

    Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

    myConfig.json:

    [{ "Classification": "hue-ini", "Properties": {}, "Configurations": [ { "Classification": "desktop", "Properties": {}, "Configurations": [ { "Classification": "database", "Properties": { "name": "dbname", "user": "username", "password": "password", "host": "hueinstance.c3b8apyyjyzi.us-east-1.rds.amazonaws.com", "port": "3306", "engine": "mysql" }, "Configurations": [] } ] } ] }]

Note

If you have not previously created the default EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

Troubleshooting

In the event of Amazon RDS failover

It is possible users may encounter delays when running a query because the Hue database instance is non-responsive or is in the process of failover. The following are some facts and guidelines for this issue:

  • If you login to the Amazon RDS console, you can search for failover events. For example, to see if a failover is in process or has occurred, look for events such as "Multi-AZ instance failover started" and "Multi-AZ instance failover completed."

  • It takes about 30 seconds for an RDS instance to complete a failover.

  • If you are experiencing longer-than-normal responses for queries in Hue, try to re-execute the query.

On this page: