Now that you know what Amazon EMR can do, let's walk through a tutorial using mapper and
reducer functions to analyze data in a streaming cluster. In this example, you'll use Amazon EMR
to count the frequency of words in a text file. The mapper logic is written as a Python
script and you'll use the built-in
aggregator function provided by Hadoop as
the reducer. Using the Amazon EMR console, you'll launch a cluster of virtual servers into a
cluster to process the data in a distributed fashion, according to the logic in the Python
script and the
In addition to the console used in this tutorial, Amazon EMR provides a command-line client, a REST-like API set, and several SDKs that you can use to launch and manage clusters. For more information about these interfaces, see What Tools are Available for Amazon EMR?.
For console access, use your IAM user name and password to sign in to the AWS Management Console using the IAM sign-in page. IAM lets you securely control access to AWS services and resources in your AWS account. For more information about creating access keys, see How Do I Get Security Credentials? in the AWS General Reference.
How Much Does it Cost to Run this Tutorial?
The AWS service charges incurred by working through this tutorial are the cost of running an Amazon EMR cluster containing three m1.small instances for one hour. These prices vary by region and storage used. If you are a new customer, within your first year of using AWS, the Amazon S3 storage charges are potentially waived, given you have not used the capacity allowed in the Free Usage Tier. Amazon EMR charges are not included in the Free Usage Tier.