Amazon Elastic Inference
Developer Guide

The AWS Documentation website is getting a new look!
Try it now and let us know what you think. Switch to the new look >>

You can return to the original look by selecting English in the language selector above.

Troubleshooting

The following are common Amazon Elastic Inference errors and troubleshooting steps.

Issues Launching Accelerators

Ensure that you are launching in a Region where Elastic Inference accelerators are available. For more information, see the Region Table.

Resolving Configuration Issues

If you launched your instance with the Deep Learning AMI (DLAMI), run python ~/anaconda3/bin/EISetupValidator.py to verify that the instance is correctly configured. You can also download the EISetupValidator.py script and execute 'python EISetupValidator.py.

Resolving Connectivity Issues

If you are unable to successfully connect to accelerators, verify that you have completed the following:

  • Set up a Virtual Private Cloud (VPC) endpoint for Elastic Inference for the subnet in which you have launched your instance.

  • Configure security groups for the instance and VPC endpoints with outbound rules that allow communications for HTTPS (Port 443). Configure the VPC endpoint security group with an inbound rule that allows HTTPS traffic.

  • Add an IAM instance role with the elastic-inference:Connect permission to the instance from which you are connecting to the accelerator.

  • Check CloudWatch Logs to verify that your accelerator is healthy. The Amazon EC2 instance details from the console contain a link to CloudWatch, which allows you to view the health of its associated accelerator.

Stop and Start the Instance

If your Elastic Inference accelerator is in an unhealthy state, stopping and starting it again is the simplest option. For more information, see Stopping and Starting Your Instances.

Warning

When you stop an instance, the data on any instance store volumes is erased. If you have any data to preserve on instance store volumes, make sure to back it up to persistent storage.

Troubleshooting Model Performance

Elastic Inference accelerates operations defined by frameworks like TensorFlow and MXNet. While Elastic Inference accelerates most:

  • neural network

  • math

  • array manipulation

  • control flow

operators, there are many operators that Elastic Inference does not accelerate. These include

  • training-related operators

  • input/output operators

  • operators in contrib

When a model contains operators that Elastic Inference does not accelerate, the framework runs them on the instance. The frequency and location of these operators within a model graph can have an impact on the model's inference performance with Elastic Inference accelerators. If your model is known to benefit from GPU acceleration and does not perform well on Elastic Inference, contact AWS Support or amazon-ei-feedback@amazon.com.

Submitting Feedback

Contact AWS Support or send feedback to: amazon-ei-feedback@amazon.com.