In this tutorial, you launch a long-running Amazon EMR cluster using the console. In addition to the console used in this tutorial, Amazon EMR provides a command-line client, a REST-like API, and several SDKs that you can use to launch and manage clusters. For more information about these interfaces, see What Tools are Available for Amazon EMR?.
After launching the cluster, you run a Hive script to analyze a series of CloudFront web distribution log files. After running the script, you query your data using the Hue web interface.
The AWS service charges incurred by completing this tutorial include the cost of running an Amazon EMR cluster containing 3 m3.xlarge instances for one hour and the cost of storing log and output data in Amazon S3. The total cost of this tutorial is approximately $1.05 (depending on your region). Your actual costs may differ slightly from this estimate.
Service charges vary by region. If you are a new customer, within your first year of using AWS, the Amazon S3 storage charges are potentially waived, given you have not used the capacity allowed in the Free Usage Tier. Amazon EC2 and Amazon EMR charges resulting from this tutorial are not included in the Free Usage Tier, but they are minimal.
- Step 1: Create an AWS Account
- Step 2: Create an Amazon S3 Bucket for Your Cluster Logs and Output Data
- Step 3: Launch an Amazon EMR Cluster
- Step 4: Run the Hive Script as a Step
- Step 5: Query Your Data Using Hue
- (Optional) Step 6: Explore Amazon EMR
- (Optional) Step 7: Remove the Resources Used in the Tutorial