Amazon EMR
Developer Guide

How Does Amazon EMR Work?

Amazon EMR is a service you can use to run managed Hadoop clusters on Amazon Web Services. A Hadoop cluster is a set of servers that work together to perform computational tasks by distributing the work and data among the servers. The task might be to analyze data, store data, or to move and transform data. By using several computers linked together in a cluster, you can run tasks that process or store vast amounts (petabytes) of data.

When Amazon EMR launches a Hadoop cluster, it runs the cluster on virtual servers provided by Amazon EC2. Amazon EMR has made enhancements to the version of Hadoop it installs on the servers to work seamlessly with AWS. This provides several advantages, as described in Amazon EMR Features.

In addition to integrating Hadoop with AWS, Amazon EMR adds some new concepts to distributed processing such as nodes and steps.