Amazon EMR
Management Guide

Amazon EBS Volumes

An Amazon EBS volume is a durable, block-level storage device that you can attach to a single EC2 instance. You can use EBS volumes as primary storage for data that requires frequent updates, such as HDFS. After a volume is attached to an instance, you can use it like any other ephemeral device but with the added benefit that you have more capacity options than the fixed instance storage attached to an instance type. Amazon EBS provides the following volume types: General Purpose (SSD), Provisioned IOPS (SSD), Throughput Optimized (HDD), Cold (HDD), and Magnetic. They differ in performance characteristics and price, so you can tailor your storage based on the analytic and business needs of your applications. For example, some applications may have a need to spill to disk while others can safely work in-memory or using Amazon S3.


Amazon EBS storage is available for releases 4.0 or greater.

Amazon EMR allows you to use EBS storage for your clusters; however, Amazon EBS works differently with EMR cluster instances than it does with regular Amazon EC2 instances.

For example, EBS volumes attached to EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination (for example, when shrinking instance groups), so it is important to not expect data persistence. Although the data is ephemeral on these volumes, it is possible that data in HDFS may be replicated depending on the number and specialization of nodes in the cluster. When you add EBS volumes, these are mounted as additional volumes. They are not a part of the boot volume and represent additional storage. YARN is configured to use all the additional volumes but you are responsible for allocating the additional volumes as local storage (such as for local log files).

You can only attach EBS volumes to instances at cluster startup time unless you add an extra task node instance group, at which time you can add EBS volumes. If an instance in an EMR cluster fails, then both the instance and attached EBS volumes are replaced as new. Consequently, if you manually detach an EBS volume, Amazon EMR treats that as a failure and replace both instance and volume.

Other caveats for using Amazon EBS with EMR clusters are:

  • You cannot snapshot an EBS volume used with Amazon EMR.

  • EBS-encrypted volumes are not supported.

  • If you apply tags using the Amazon EMR webservice API, those operations will be applied to EBS volumes.

  • There is a limit of 25 volumes per instance.

Adding Amazon EBS Volumes

Add an EBS volume when you create a cluster or add task instance groups to a cluster.


To modify or get information about EBS volumes attached to EMR instances, you must have the following permissions in your service role: ec2:DeleteVolume, ec2:DescribeVolumeStatus, ec2:DescribeVolumes, ec2:DetachVolume. If your service role is attached to a managed policy, you do not need to do anything.

To add EBS volumes using console

  1. Open the Amazon EMR console at

  2. Choose Create cluster.

  3. Choose Go to advanced options.

  4. Select the applications to install in Software Configuration and choose Next.

  5. In Hardware Configuration, select a node type and choose Add EBS volumes in the Storage per instance column.

    Cluster on a VPC
  6. Accept the defaults or enter a size, number of volumes per instance, and IOPS rate.


    TheStorage per instance value includes the 10 GiB dedicated to the boot volume of the instance. For EBS-only instance types, the default value is 32 GiB.

    Cluster on a VPC

    (Optional) Select EBS-Optimized instance to make the instances in the instance group EBS-optimized, which provides dedicated capacity for Amazon EBS I/O. For more information, see Amazon EBS–Optimized Instances in the Amazon EC2 User Guide for Linux Instances.

  7. (Optional) Add more EBS volumes to this instance group by choosing Add EBS volumes.


You must embed key-value arguments as JSON structures inline for the EBS arguments to --instance-groups; enclose all arguments with single quotes, as demonstrated in the examples.

to add EBS volumes using the AWS CLI during cluster creation

  • Use the following AWS CLI command to create a cluster with instance groups, with an attached EBS volume:

    aws emr create-cluster --release-label emr-4.3.0  --use-default-roles \
    --ec2-attributes KeyName=myKey --instance-groups 'InstanceGroupType=MASTER,InstanceCount=1,InstanceType=d2.xlarge' \
    {VolumeSpecification={VolumeType=io1,SizeInGB=100,Iops=100}, VolumesPerInstance=3}]}'

To add EBS volumes using the AWS CLI

  • Use the following AWS CLI command to add instance groups with attached EBS volumes:

    aws emr add-instance-groups --cluster-id j-clusterId --instance-groups 'InstanceGroupType=TASK, InstanceCount=1, \
    InstanceType=m3.xlarge, EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification={VolumeType=io1, SizeInGB=10, Iops=100}},\