Tutorial: Working with Amazon DynamoDB and Apache Hive
In this tutorial, you will launch an Amazon EMR cluster, and then use Apache Hive to process data stored in a DynamoDB table.
Hive is a data warehouse application for Hadoop that allows you to process and analyze data from multiple sources. Hive provides a SQL-like language, HiveQL, that lets you work with data stored locally in the Amazon EMR cluster or in an external data source (such as Amazon DynamoDB).
For more information, see to the Hive Tutorial.
Before You Begin
For this tutorial, you will need the following:
An AWS account. If you do not have one, see Sign Up for AWS.
An SSH client (Secure Shell). You use the SSH client to connect to the master node of the Amazon EMR cluster and run interactive commands. SSH clients are available by default on most Linux, Unix, and Mac OS X installations. Windows users can download and install the PuTTY client, which has SSH support.