What is Amazon EMR?
|This documentation is for versions 4.x and 5.x of Amazon EMR. For information about Amazon EMR AMI versions 2.x and 3.x, see the Amazon EMR Developer Guide (PDF).|
Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
If you are a first-time user of Amazon EMR, we recommend that you begin by reading the following:
What is Amazon EMR? (this section) – This section provides an overview of Amazon EMR functionality and features.
Amazon EMR – This service page provides the Amazon EMR highlights, product details, and pricing information.
Getting Started: Analyzing Big Data with Amazon EMR – This section provides a tutorial of using Amazon EMR to create a sample cluster and run a Hive script as a step.