EFA basics Supported interfaces and libraries Supported instance types Supported operating systems EFA limitations EFA pricing

Elastic Fabric Adapter for HPC and ML workloads on Amazon EC2

An Elastic Fabric Adapter (EFA) is a network device that you can attach to your Amazon EC2 instance to accelerate High Performance Computing (HPC) and machine learning applications. EFA enables you to achieve the application performance of an on-premises HPC cluster, with the scalability, flexibility, and elasticity provided by the AWS Cloud.

EFAs provide lower and more consistent latency and higher throughput than the TCP transport traditionally used in cloud-based HPC systems. It enhances the performance of inter-instance communication that is critical for scaling HPC and machine learning applications. It is optimized to work on the existing AWS network infrastructure and it can scale depending on application requirements.

EFAs integrate with Libfabric 1.7.0 and later and it supports Open MPI 5 and later and Intel MPI 2019 Update 5 and later for HPC applications, and Nvidia Collective Communications Library (NCCL) for machine learning applications.

Note

The OS-bypass capabilities of EFAs are not supported on Windows instances. If you attach an EFA to a Windows instance, the instance functions as an Elastic Network Adapter, without the added EFA capabilities.

EFA basics

An EFA is an Elastic Network Adapter (ENA) with added capabilities. It provides all of the functionality of an ENA, with an additional OS-bypass functionality. OS-bypass is an access model that allows HPC and machine learning applications to communicate directly with the network interface hardware to provide low-latency, reliable transport functionality.

Contrasting a traditional HPC software stack with one that uses an EFA.

Traditionally, HPC applications use the Message Passing Interface (MPI) to interface with the system’s network transport. In the AWS Cloud, this has meant that applications interface with MPI, which then uses the operating system's TCP/IP stack and the ENA device driver to enable network communication between instances.

With an EFA, HPC applications use MPI or NCCL to interface with the Libfabric API. The Libfabric API bypasses the operating system kernel and communicates directly with the EFA device to put packets on the network. This reduces overhead and enables the HPC application to run more efficiently.

Note

Libfabric is a core component of the OpenFabrics Interfaces (OFI) framework, which defines and exports the user-space API of OFI. For more information, see the Libfabric OpenFabrics website.

Differences between EFAs and ENAs

Elastic Network Adapters (ENAs) provide traditional IP networking features that are required to support VPC networking. EFAs provide all of the same traditional IP networking features as ENAs, and they also support OS-bypass capabilities. OS-bypass enables HPC and machine learning applications to bypass the operating system kernel and to communicate directly with the EFA device.

Supported interfaces and libraries

EFAs support the following interfaces and libraries:

Open MPI 5 and later
Open MPI 4.0 or newer is preferred for Graviton
Intel MPI 2019 Update 5 and later
NVIDIA Collective Communications Library (NCCL) 2.4.2 and later

Supported instance types

The following instance types support EFAs:

General purpose: m5dn.24xlarge | m5dn.metal | m5n.24xlarge | m5n.metal | m5zn.12xlarge | m5zn.metal | m6a.48xlarge | m6a.metal | m6i.32xlarge | m6i.metal | m6id.32xlarge | m6id.metal | m6idn.32xlarge | m6idn.metal | m6in.32xlarge | m6in.metal | m7a.48xlarge | m7a.metal-48xl | m7g.16xlarge | m7g.metal | m7gd.16xlarge | m7gd.metal | m7i.48xlarge | m7i.metal-48xl
Compute optimized: c5n.9xlarge | c5n.18xlarge | c5n.metal | c6a.48xlarge | c6a.metal | c6gn.16xlarge | c6i.32xlarge | c6i.metal | c6id.32xlarge | c6id.metal | c6in.32xlarge | c6in.metal | c7a.48xlarge | c7a.metal-48xl | c7g.16xlarge | c7g.metal | c7gd.16xlarge | c7gd.metal | c7gn.16xlarge | c7gn.metal | c7i.48xlarge | c7i.metal-48xl
Memory optimized: r5dn.24xlarge | r5dn.metal | r5n.24xlarge | r5n.metal | r6a.48xlarge | r6a.metal | r6i.32xlarge | r6i.metal | r6idn.32xlarge | r6idn.metal | r6in.32xlarge | r6in.metal | r6id.32xlarge | r6id.metal | r7a.48xlarge | r7a.metal-48xl | r7g.16xlarge | r7g.metal | r7gd.16xlarge | r7gd.metal | r7i.48xlarge | r7i.metal-48xl | r7iz.32xlarge | r7iz.metal-32xl | r8g.24xlarge | r8g.48xlarge | r8g.metal-24xl | r8g.metal-48xl | u7i-12tb.224xlarge | u7in-16tb.224xlarge | u7in-24tb.224xlarge | u7in-32tb.224xlarge | x2idn.32xlarge | x2idn.metal | x2iedn.32xlarge | x2iedn.metal | x2iezn.12xlarge | x2iezn.metal | x8g.24xlarge | x8g.48xlarge | x8g.metal-24xl | x8g.metal-48xl
Storage optimized: i3en.12xlarge | i3en.24xlarge | i3en.metal | i4g.16xlarge | i4i.32xlarge | i4i.metal | im4gn.16xlarge
Accelerated computing: dl1.24xlarge | dl2q.24xlarge | g4dn.8xlarge | g4dn.12xlarge | g4dn.16xlarge | g4dn.metal | g5.8xlarge | g5.12xlarge | g5.16xlarge | g5.24xlarge | g5.48xlarge | g6.8xlarge | g6.12xlarge | g6.16xlarge | g6.24xlarge | g6.48xlarge | g6e.8xlarge | g6e.12xlarge | g6e.16xlarge | g6e.24xlarge | g6e.48xlarge | gr6.8xlarge | inf1.24xlarge | p3dn.24xlarge | p4d.24xlarge | p4de.24xlarge | p5.48xlarge | p5e.48xlarge | trn1.32xlarge | trn1n.32xlarge | vt1.24xlarge
High-performance computing: hpc6a.48xlarge | hpc6id.32xlarge | hpc7a.12xlarge | hpc7a.24xlarge | hpc7a.48xlarge | hpc7a.96xlarge | hpc7g.4xlarge | hpc7g.8xlarge | hpc7g.16xlarge

To see the available instance types that support EFAs in a specific Region

The available instance types vary by Region. To see the available instance types that support EFAs in a Region, use the describe-instance-types command with the --region parameter. Include the --filters parameter to scope the results to the instance types that support EFA and the --query parameter to scope the output to the value of InstanceType.


aws ec2 describe-instance-types  --region us-east-1  --filters Name=network-info.efa-supported,Values=true  --query "InstanceTypes[*].[InstanceType]"  --output text | sort

Supported operating systems

Operating system support differs depending on the processor type. The following table shows the supported operating systems.

Operating system	Intel/AMD (`x86_64`) instance types	AWS Graviton (`arm64`) instance types
Amazon Linux 2023	✓	✓
Amazon Linux 2	✓	✓
RHEL 8 and 9	✓	✓
Debian 10 and 11	✓	✓
Rocky Linux 8 and 9	✓	✓
Ubuntu 20.04, 22.04, and 24.04	✓	✓
SUSE Linux Enterprise 15 SP2 and later	✓	✓
OpenSUSE Leap 15.5 and later	✓

Note

Ubuntu 20.04 supports peer direct support when used with dl1.24xlarge instances.

EFA limitations

EFAs have the following limitations:

All P4d and P5 instance types support NVIDIA GPUDirect Remote Direct Memory Access (RDMA).
EFA traffic between P4d/P4de/DL1 instances and other instance types is currently not supported.
Instance types that support multiple network cards can be configured with one EFA per network card. All other supported instance types support only one EFA per instance.
For c7g.16xlarge, m7g.16xlarge and r7g.16xlarge Dedicated Instances and Dedicated Hosts are not supported when an EFA is attached.
EFA OS-bypass traffic can't cross Availability Zones, VPCs, or AWS accounts. In other words, EFA OS-bypass traffic can't flow from one Availability Zone, VPC (with or without a VPC peering connection), or AWS account to another. This does not apply to normal IP traffic from the EFA.
EFA OS-bypass traffic can't be sent across subnets in a Local Zone.
EFA OS-bypass traffic is not routable. Normal IP traffic from the EFA remains routable.
The EFA must be a member of a security group that allows all inbound and outbound traffic to and from the security group itself.
EFA is not supported on Windows instances.
EFA is not supported on AWS Outposts.

EFA pricing

EFA is available as an optional Amazon EC2 networking feature that you can enable on any supported instance at no additional cost.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Optimize network performance on Windows

EFA on accelerated instances

Elastic Fabric Adapter for HPC and ML workloads on Amazon EC2

Note

Contents

EFA basics

Note

Differences between EFAs and ENAs

Supported interfaces and libraries

Supported instance types

To see the available instance types that support EFAs in a specific Region

Supported operating systems

Note

EFA limitations

EFA pricing