Amazon Elastic Compute Cloud
User Guide for Linux Instances

Troubleshooting

The following are common errors and troubleshooting steps.

Issues Launching Accelerators

Ensure that you are launching in a Region where Amazon EI accelerators are available. For more information, see the Region Table.

Resolving Connectivity Issues

If you are unable to successfully connect to accelerators, verify that you have completed the following:

  • You have set up a VPC endpoint for Amazon EI for the subnet in which you have launched your instance.

  • You have configured security groups for the instance and VPC endpoints with outbound rules that allow communications for HTTPS (Port 443). You have configured the VPC endpoint security group with an inbound rule that allows HTTPS traffic.

  • You have added an IAM instance role with the "elastic-inference:Connect" permission to the instance from which you are connecting to the accelerator.

  • You have checked CloudWatch Logs to verify that your accelerator is healthy. The EC2 instance details from the Amazon EC2 console contain a link to CloudWatch, which allows you to view the health of its associated accelerator.

Resolving Unhealthy Status Issues

If the Amazon EI accelerator is in an unhealthy state, the following are troubleshooting steps that you can use to resolve the issue.

Stop and Start the Instance

If your Amazon EI accelerator is in an unhealthy state, stopping and starting it again is the simplest option. For more information, see Stopping and Starting Your Instances.

Warning

When you stop an instance, the data on any instance store volumes is erased. If you have any data to preserve on instance store volumes, make sure to back it up to persistent storage.

Troubleshooting Model Performance

Amazon EI accelerates operations defined by frameworks like TensorFlow and MXNet. While Amazon EI accelerates most neural network, math, array manipulation, and control flow operators, there are many operators that Amazon EI does not accelerate. These include training-related operators, input/output operators, and some operators in contrib.

When a model contains operators that Amazon EI does not accelerate, the framework runs them on the instance. The frequency and location of these operators within a model graph can have an impact on the model's inference performance with Amazon EI accelerators. If your model is known to benefit from GPU acceleration and does not perform well on Amazon EI, contact AWS Support or amazon-ei-feedback@amazon.com.

Submitting Feedback

Contact AWS Support or send feedback to: amazon-ei-feedback@amazon.com.