Troubleshoot the Elastic Network Adapter (ENA) Windows driver - Amazon Elastic Compute Cloud

Troubleshoot the Elastic Network Adapter (ENA) Windows driver

The Elastic Network Adapter (ENA) is designed to improve operating system health and to reduce unexpected hardware behavior or failures that can disrupt the operation of your Windows instance. The ENA architecture keeps device or driver failures as transparent to the operating system as possible.

This topic provides troubleshooting information for the ENA Windows driver.

Can't connect

If you are unable to connect to your instance, see Capture a screenshot of an unreachable instance.

Note

You can also connect to the instance through AWS Systems Manager Session Manager. However, to do so requires prior configuration. For more information, see Session Manager in the AWS Systems Manager User Guide.

Collect diagnostic information on the instance

The steps to open Windows operating system (OS) tools vary, depending on what version of the OS is installed on your instance. In the following sections, we use the Run dialog to open the tools, which works the same across all OS versions. However, you can access these tools using any method that you prefer.

Access the Run dialog
  • Using the Windows logo key combination: Windows + R

  • Using the search bar:

    • Enter run in the search bar.

    • Select the Run application from the search results.

Some steps require the context menu to access properties or context-sensitive actions. There are several ways to do this, depending on your OS version and hardware.

Access the context menu
  • Using your mouse: right-click an item to bring up its context menu.

  • Using your keyboard:

    • Depending on your OS version, use Shift + F10, or Ctrl + Shift + F10.

    • If you have the context key on your keyboard (three horizontal lines in a box), select the item you want and then press the context key.

If you can connect to your instance, use the following techniques to gather diagnostic information for troubleshooting.

Check ENA device status

To check the status of your ENA Windows driver using the Windows Device Manager, follow these steps:

  1. Open the Run dialog using one of the methods described in the preceding section.

  2. To open the Windows Device Manager, enter devmgmt.msc in the Run box.

  3. Choose OK. This opens the Device Manager window.

  4. Select the arrow to the left of Network adapters to expand the list.

  5. Choose the name, or open the context menu for the Amazon Elastic Network Adapter, and then choose Properties. This opens the Amazon Elastic Network Adapter Properties dialog.

  6. Verify that the message in the General tab says "This device is working properly."

Investigate driver event messages

To review ENA Windows driver event logs using the Windows Event Viewer, follow these steps:

  1. Open the Run dialog using one of the methods described in the preceding section.

  2. To open the Windows Event Viewer, enter eventvwr.msc in the Run box.

  3. Choose OK. This opens the Event Viewer window.

  4. Expand the Windows Logs menu, and then choose System.

  5. Under Actions, in the top-right panel, choose Filter Current Log. This displays the filtering dialog.

  6. In the Event sources box, enter ena. This limits results to events that were generated by the ENA Windows driver.

  7. Choose OK. This shows filtered event log results in the detail sections of the window.

  8. To drill down into the details, select an event message from the list.

The following example shows an ENA driver event in the Windows Event Viewer system events list:


					Example: ENA driver event shown in the Windows Event Viewer system messages list.

Event message summary

The following table shows event messages that the ENA Windows driver generates.

Input
Event ID ENA driver event description Type
5001 Hardware is out of resources Error
5002 Adapter has detected a hardware error Error
5005 Adapter has timed out on NDIS operation that did not complete in a timely manner Error
5032 Adapter has failed to reset the device Error
5200 Adapter has been initialized Informational
5201 Adapter has been halted Informational
5202 Adapter has been paused Informational
5203 Adapter has been restarted Informational
5204 Adapter has been shut down Informational
5205 Adapter has been reset Error
5206 Adapter has been surprise removed Error
5208 Adapter initialization routine has failed Error
5210 Adapter has encountered and successfully recovered an internal issue Error

Review performance metrics

The ENA Windows driver publishes network performance metrics from the instances where metrics are enabled. You can view and enable metrics on the instance using the native Performance Monitor application. For more information about the metrics that the ENA Windows driver produces, see Monitor network performance for your EC2 instance.

On instances where ENA metrics are enabled, and the Amazon CloudWatch agent is installed, CloudWatch collects the metrics that are associated with the counters in Windows Performance Monitor, as well as some advanced metrics for ENA. These metrics are collected in addition to the metrics enabled by default on EC2 instances. For more information about the metrics, see Metrics collected by the CloudWatch agent in the Amazon CloudWatch User Guide.

Note

Performance metrics are available for ENA driver versions 2.4.0 and later (also for version 2.2.3). ENA driver version 2.2.4 was rolled back due to potential performance degradation on the sixth generation EC2 instances. We recommend that you upgrade to the current version of the driver to ensure that you have the latest updates.

Some of the ways that you can use performance metrics include:

  • Troubleshoot instance performance issues.

  • Choose the right instance size for a workload.

  • Proactively plan scaling activities.

  • Benchmark applications to determine if they maximize the performance available on an instance.

Refresh rate

By default, the driver refreshes metrics using a 1-second interval. However, the application that retrieves the metrics might use a different interval for polling. You can change the refresh interval in Device Manager, using the advanced properties for the driver.

To change the metrics refresh interval for the ENA Windows driver, follow these steps:

  1. Open the Run dialog using one of the methods described in the preceding section.

  2. To open the Windows Device Manager, enter devmgmt.msc in the Run box.

  3. Choose OK. This opens the Device Manager window.

  4. Select the arrow to the left of Network adapters to expand the list.

  5. Choose the name, or open the context menu for the Amazon Elastic Network Adapter, and then choose Properties. This opens the Amazon Elastic Network Adapter Properties dialog.

  6. Open the Advanced tab in the pop-up window.

  7. From the Property list, choose Metrics Refresh Interval to change the value.

  8. When you are done, choose OK.

ENA adapter reset

The reset process starts when the ENA Windows driver detects an error on an adapter, and marks the adapter as unhealthy. The driver cannot reset itself, so it depends on the operating system to check the adapter health status, and call the reset handle for the ENA Windows driver. The reset process might result in a brief period of time where traffic loss occurs. However, TCP connections should be able to recover.

The ENA adapter might also indirectly request a device reset procedure, by failing to send a keep-alive notification. For example, if the ENA adapter reaches an unknown state after loading an irrecoverable configuration, it might stop sending keep-alive notifications.

Common causes for ENA adapter reset
  • Keep-alive messages are missing

    The ENA adapter posts keep-alive events at a fixed rate (usually once every second). The ENA Windows driver implements a watchdog mechanism, which periodically checks for the presence of these keep-alive messages. If it detects one or more new messages since the last time it checked, it records a successful outcome. Otherwise, the driver concludes that the device experienced a failure, and initiates a reset sequence.

  • Packets are stuck in transmit queues

    The ENA adapter verifies that packets are flowing through the transmit queues as expected. The ENA Windows driver detects if packets are getting stuck, and initiates a reset sequence if they are.

  • Read timeout for Memory Mapped I/O (MMIO) registers

    To limit memory mapped I/O (MMIO) read operations, the ENA Windows driver accesses MMIO registers only during initialization and reset processes. If the driver detects a timeout, it takes one of the following actions, depending on what process was running:

    • If a timeout is detected during initialization, it fails the flow, which results in the driver displaying a yellow exclamation mark by the ENA adapter in Windows Device Manager.

    • If a timeout is detected during reset, it fails the flow. The OS then initiates a surprise removal of the ENA adapter, and recovers it by stopping and starting the adapter that was removed. For more information about surprise removal of a network interface card (NIC), see Handling the Surprise Removal of a NIC in the Microsoft Windows Hardware Developer documentation.

Troubleshooting scenarios

The following scenarios can help you troubleshoot issues that you might experience with the ENA Windows driver. We recommend that you start with upgrading your ENA driver, if you don't have the latest version. To find the latest driver for your Windows OS version, see Amazon ENA driver versions.

Description

After you go through the steps to install a specific version of the ENA driver, the Windows Device Manager shows that Windows installed a different version of the ENA driver.

Cause

When you run the install for a driver package, Windows ranks all of the driver packages that are valid for the given device in the local Driver Store before it begins. Then it selects the package with the lowest rank value as the best match. This can be different from the package that you intended to install. For more information about the device driver package selection process, see How Windows selects a driver package for a device on the Microsoft documentation website.

Solution

To ensure that Windows installs your chosen driver package version, you can remove lower ranked driver packages from the Driver Store with the PnPUtil command line tool.

Follow these steps to update the ENA driver:

  1. Connect to your instance and log in as the local administrator.

  2. Open the Device Manager properties window, as described in the Check ENA device status section. This opens the General tab of the Amazon Elastic Network Adapter Properties window.

  3. Open the Driver tab.

  4. Choose Update Driver. This opens the Update Driver Software – Amazon Elastic Network Adapter dialog box.

    1. On the How do you want to search for driver software? page, choose Browse my computer for driver software.

    2. On the Browse for driver software on your computer page, choose Let me pick from a list of device drivers on my computer, located below the search bar.

    3. On the Select the device driver you want to install for this hardware page, choose Have Disk....

    4. In the Install from Disk window, choose Browse..., next to the file location from the dropdown list.

    5. Navigate to the location where you downloaded the target ENA driver package. Select the file named ena.inf and choose Open.

    6. To start the install, choose OK, and then choose Next.

  5. If the installer doesn’t automatically reboot your instance, run the Restart-Computer PowerShell cmdlet.

    PS C:\> Restart-Computer

Description

The ENA adapter icon in the Device Manager Network adapters section displays a warning sign (a yellow triangle with an exclamation mark inside).

The following example shows an ENA adapter with the warning icon in Windows Device Manager:


							Example: ENA adapter with warning icon shown in the Windows Device Manager.

Cause

This device warning is commonly caused by environment issues, which might require more research, and often require a process of elimination to determine the underlying cause. For a full list of device errors, see Device Manager Error Messages in the Microsoft Windows Hardware Developer documentation.

Solution

The solution for this device warning depends on the root cause. The process of elimination described here includes a few basic steps to help identify and resolve the most common issues that might have a simple solution. Additional root cause analysis is required when these steps do not resolve the issue.

Follow these steps to help identify and resolve common issues:

  1. Stop and start the device

    Open the Device Manager properties window, as described in the Check ENA device status section. This opens the General tab of the Amazon Elastic Network Adapter Properties window, where the Device status displays the error code and a short message.

    1. Open the Driver tab.

    2. Choose Disable Device, and respond Yes to the warning message that displays.

    3. Choose Enable Device.

  2. Stop and start the EC2 instance

    If the adapter still shows the warning icon in Device Manager, the next step is to stop and start the EC2 instance. This relaunches the instance on different hardware in most cases.

  3. Investigate possible instance resource issue

    If you have stopped and started your EC2 instance, and the problem persists, this might indicate a resource issue on your instance, such as insufficient memory.

Description

The Windows Event Viewer shows adapter timeout and reset events occurring in combination for ENA adapters. Messages resemble the following examples:

  • Event ID 5007: Amazon Elastic Network Adapter : Timed out during an operation.

  • Event ID 5205: Amazon Elastic Network Adapter : Adapter reset has been started.

Adapter resets cause minimal traffic disruption. Even when there are multiple resets, it would be unusual for them to cause any severe network disruption.

Cause

This sequence of events indicates that the ENA Windows driver initiated a reset for an ENA adapter that was unresponsive. However, the mechanism that the device driver uses to detect this issue is subject to false positives resulting from CPU 0 starvation.

Solution

If this combination of errors happens frequently, check your resource allocations to see where adjustments might be helpful.

  1. Open the Run dialog using one of the methods described in the preceding section.

  2. To open the Windows Resource Monitor, enter resmon in the Run box.

  3. Choose OK. This opens the Resource Monitor window.

  4. Open the CPU tab. Per-CPU usage graphs are shown along the right side of the Resource Monitor window.

  5. Check the usage levels for CPU 0 to see if they are too high.

We recommend that you configure RSS to exclude CPU 0 for the ENA adapter on larger instance types (more than 16 vCPU). For smaller instance types, configuring RSS might improve the experience, but due to the lower number of available cores, testing is necessary to ensure that constraining CPU cores does not negatively impact performance.

Use the Set-NetAdapterRss command to configure RSS for your ENA adapter, as shown in the following example.

Set-NetAdapterRss -name (Get-NetAdapter | Where-Object {$_.InterfaceDescription -like "*Elastic*"}).Name -Baseprocessorgroup 0 -BaseProcessorNumber 1

Description

If you migrate to a sixth generation EC2 instance, you might experience reduced performance or ENA attachment failures if you haven't updated your ENA Windows driver version.

Cause

The sixth generation EC2 instance types require the following minimum version of the ENA Windows driver, based on the instance operating system (OS).

Minimum version
Windows Server version ENA driver version

Windows Server 2008 R2

2.2.3 or 2.4.0

Windows Server 2012 and later

2.2.3 and later

Windows Workstation

2.2.3 and later

Solution

Before you upgrade to a sixth generation EC2 instance, make sure that the AMI you launch from has compatible drivers based on the instance OS as shown in the previous table. For more information, see What do I need to do before migrating my EC2 instance to a sixth generation instance to make sure that I get maximum network performance? in the AWS re:Post Knowledge Center.

Description

The ENA interface is not performing as expected.

Cause

Root cause analysis for performance issues is a process of elimination. There are too many variables involved to name a common cause.

Solution

The first step in your root cause analysis is to review the diagnostic information for the instance that is not performing as expected, to determine if there are errors that might be causing the issue. For more information, see the Collect diagnostic information on the instance section.

You might need to modify the default operating system configuration to achieve maximum network performance on instances with enhanced networking. Some optimizations, such as turning on checksum offloading and enabling RSS, are configured by default in official Windows AMIs. For other optimizations that you can apply to the ENA adapter, see the performance adjustments shown in ENA adapter performance adjustments.

We recommend that you proceed with caution, and limit device property adjustments to those that are listed in this section, or to specific changes that are recommended by the AWS support team.

To change ENA adapter properties, follow these steps:

  1. Open the Run dialog using one of the methods described in the preceding section.

  2. To open the Windows Device Manager, enter devmgmt.msc in the Run box.

  3. Choose OK. This opens the Device Manager window.

  4. Select the arrow to the left of Network adapters to expand the list.

  5. Choose the name, or open the context menu for the Amazon Elastic Network Adapter, and then choose Properties. This opens the Amazon Elastic Network Adapter Properties dialog.

  6. To make your changes, open the Advanced tab.

  7. When you're done, choose OK to save your changes.

The following example shows an ENA adapter property in the Windows Device Manager:


							Example: ENA adapter property shown in the Windows Device Manager.
ENA adapter performance adjustments

The following table includes properties that can be adjusted to improve performance for the ENA interface.

Input
Property Description Default value Adjustment

Receive Buffers

Controls the number of entries in the software receive queues.

1024

Can be increased up to a maximum of 8192.

Receive Side Scaling (RSS)

Enables the efficient distribution of network receive processing across multiple CPUs in multiprocessor systems.

Enabled

You can spread the load across multiple processors. To learn more, see Operating system optimizations.

Maximum Number of RSS Queues

Sets the maximum number of RSS queues allowed when RSS is enabled.

32

The number of RSS queues is determined during driver initialization, and includes the following limitations (among others):

  • RSS queue limit set by this property

  • Instance limits (vCPU count)

  • Hardware generation limits (up to 8 RSS queues in ENAv1, and up to 32 RSS queues in ENAv2)

You can set the value from 1-32, depending on your instance and hardware generation limits. To learn more, see Operating system optimizations.

Jumbo packet

Enables the use of jumbo ethernet frames (more than 1500 bytes of payload).

Disabled (this limits payload to 1500 bytes or less)

Value can be set up to 9015, which translates to 9001 bytes of payload. This is the maximum payload for jumbo ethernet frames. See Considerations for using jumbo ethernet frames.

Considerations for using jumbo ethernet frames

Jumbo frames allow more than 1500 bytes of data by increasing the payload size per packet, which increases the percentage of the packet that is not packet overhead. Fewer packets are needed to send the same amount of usable data. However, traffic is limited to a maximum MTU of 1500 in the following cases:

  • Traffic outside of a given AWS Region for EC2 Classic.

  • Traffic outside of a single VPC.

  • Traffic over an inter-Region VPC peering connection.

  • Traffic over VPN connections.

  • Traffic over an internet gateway.

Note

Packets over 1500 bytes are fragmented. If you have the Don't Fragment flag set in the IP header, these packets are dropped.

Jumbo frames should be used with caution for internet-bound traffic, or any traffic that leaves a VPC. Packets are fragmented by intermediate systems, which slows down this traffic. To use jumbo frames inside of a VPC without impacting outbound traffic that's leaving the VPC, try one of the following options:

  • Configure the MTU size by route.

  • Use multiple network interfaces with different MTU sizes and different routes.

Recommended use cases for jumbo frames

Jumbo frames can be useful for traffic inside of and between VPCs. We recommend using jumbo frames for the following use cases:

  • For instances that are collocated inside of a cluster placement group, jumbo frames help to achieve the maximum network throughput possible. For more information, see Placement groups.

  • You can use jumbo frames for traffic between your VPCs and your on-premises networks over AWS Direct Connect. For more information about using AWS Direct Connect, and verifying jumbo frame capability, see Set network MTU for private virtual interfaces or transit virtual interfaces in the AWS Direct Connect User Guide.

  • For more information about supported MTU sizes for transit gateways, see Quotas for your transit gateways in the Amazon VPC Transit Gateways.