What Is Amazon Elastic Inference? - Amazon Elastic Inference

What Is Amazon Elastic Inference?

Starting April 15, 2023, AWS will not onboard new customers to Amazon Elastic Inference (EI), and will help current customers migrate their workloads to options that offer better price and performance. After April 15, 2023, new customers will not be able to launch instances with Amazon EI accelerators in Amazon SageMaker, Amazon ECS, or Amazon EC2. However, customers who have used Amazon EI at least once during the past 30-day period are considered current customers and will be able to continue using the service.

Machine learning (ML) on AWS helps you innovate faster with the most comprehensive set of ML services and infrastructure made available in a low-cost, pay as-you-go usage model. AWS continuously delivers better performing and lower cost infrastructure for ML inference workloads. AWS launched Amazon Elastic Inference (EI) in 2018 to enable customers to attach low-cost GPU-powered acceleration to Amazon EC2, Amazon SageMaker instances, or Amazon Elastic Container Service (ECS) tasks to reduce the cost of running deep learning inference by up to 75% compared to standalone GPU based instances such as Amazon EC2 P4d and Amazon EC2 G5. In 2019, AWS launched AWS Inferentia, Amazon's first custom silicon designed to accelerate deep learning workloads by providing high performance inference in the cloud. Amazon EC2 Inf1 instances based on AWS Inferentia chips deliver up 2.3x higher throughput and up to 70% lower cost per inference than comparable current generation GPU-based Amazon EC2 instances. With the availability of new accelerated compute options such as AWS Inferentia and Amazon EC2 G5 instances, the benefit of attaching a fractional GPU to a CPU host instance using Amazon EI has diminished. For example, customers hosting models on Amazon EI who move to ml.inf1.xlarge instances can get up to 56% in cost savings and 2x performance improvement.

Customers can use Amazon SageMaker Inference Recommender to help them choose the best alternative instances to Amazon EI for deploying their ML models.

Frequently asked questions

  1. Why is Amazon encouraging customers to move workloads from Amazon Elastic Inference (EI) to newer hardware acceleration options such as AWS Inferentia?

    Customers get better performance at a much better price than Amazon EI with new hardware accelerator options such as AWS Inferentia for their inference workloads. AWS Inferentia is designed to provide high performance inference in the cloud, to drive down the total cost of inference, and to make it easy for developers to integrate machine learning into their business applications. To enable customers to benefit from such newer generation hardware accelerators, we will not onboard new customers to Amazon EI after April 15, 2023.

  2. Which AWS services are impacted by the move to stop onboarding new customers to Amazon Elastic Inference (EI)?

    This announcement will affect Amazon EI accelerators attached to any Amazon EC2, Amazon SageMaker instances, or Amazon Elastic Container Service (ECS) tasks. In Amazon SageMaker, this applies to both endpoints and notebook kernels using Amazon EI accelerators.

  3. Will I be able to create a new Amazon Elastic Inference (EI) accelerator after April 15, 2023?

    No, if you are a new customer and have not used Amazon EI in the past 30 days, then you will not be able create a new Amazon EI instance in your AWS account after April 15, 2023. However, if you have used an Amazon EI accelerator at least once in the past 30 days, you can attach a new Amazon EI accelerator to your instance.

  4. We currently use Amazon Elastic Inference (EI) accelerators. Will we be able to continue using them after April 15, 2023?

    Yes, you will be able use Amazon EI accelerators. We recommend that you migrate your current ML Inference workloads running on Amazon EI to other hardware accelerator options at your earliest convenience.

  5. How do I evaluate alternative instance options for my current Amazon SageMaker Inference Endpoints?

    Amazon SageMaker Inference Recommender can help you identify cost-effective deployments to migrate existing workloads from Amazon Elastic Inference (EI) to an appropriate ML instance supported by SageMaker.

  6. How do I change the instance type for my existing endpoint in Amazon SageMaker?

    You can change the instance type for your existing endpoint by doing the following:

    1. First, create a new EndpointConfig that uses the new instance type. If you have an autoscaling policy, delete the existing autoscaling policy.

    2. Call UpdateEndpoint while specifying your newly created EndpointConfig.

    3. Wait for your endpoint to change status to InService. This will take approximately 10-15 minutes.

    4. Finally, if you need autoscaling for your new endpoint, create a new autoscaling policy for this new endpoint and ProductionVariant.

  7. How do I change the instance type for my existing Amazon SageMaker Notebook Instance using Amazon Elastic Inference (EI)?

    Choose Notebook instances in the SageMaker console, and then choose the Notebook Instance you want to update. Make sure the Notebook Instance has a Stopped status. Finally, you can choose Edit and change your instance type. Make sure that, when your Notebook Instance starts up, you select the right kernel for your new instance.

  8. Is there a specific instance type which is a good alternative to Amazon Elastic Inference (EI)?

    Every machine learning workload is unique. We recommend using Amazon SageMaker Inference Recommender to help you identify the right instance type for your ML workload, performance requirements, and budget. AWS Inferentia, specifically inf1.xlarge, is the best high performance and low-cost alternative for Amazon EI customers.

Prerequisites

You need an Amazon Web Services account and should be familiar with launching an Amazon EC2, Amazon Deep Learning Containers, or SageMaker instances to successfully run Amazon Elastic Inference. To launch an Amazon EC2 instance, complete the steps in Setting up with Amazon EC2. Amazon S3 resources are required for installing packages via pip. For more information about setting up Amazon S3 resources, see the Amazon Simple Storage Service User Guide.

Pricing for Amazon Elastic Inference

You are charged for each second that an Elastic Inference accelerator is attached to an instance in the running state. You are not charged for an accelerator attached to an instance that is in the pending, stopping, stopped, shutting-down, or terminated state. You are also not charged when an Elastic Inference accelerator is in the unknown or impaired state.

You do not incur AWS PrivateLink charges for VPC endpoints to the Elastic Inference service when you have accelerators provisioned in the subnet.

For more information about pricing by Region for Elastic Inference, see Elastic Inference Pricing.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

Next Up

Amazon Elastic Inference Basics