Migrate inference workload from x86 to AWS Graviton

Focus mode

Migrate inference workload from x86 to AWS Graviton - Amazon SageMaker AI

Push container images to Amazon ECR Create a SageMaker AI Model Create an endpoint configuration Create an endpoint

AWS Graviton is a series of ARM-based processors designed by AWS. They are more energy efficient than x86-based processors and offer a compelling price-performance ratio. Amazon SageMaker AI offers Graviton-based instances so that you can take advantage of these advanced processors for your inference needs.

You can migrate your existing inference workloads from x86-based instances to Graviton-based instances, by using either ARM compatible container images or multi-architecture container images. This guide assumes that you are either using AWS Deep Learning container images, or your own ARM compatible container images. For more information on building your own images, check Building your image.

At a high level, migrating inference workload from x86-based instances to Graviton-based instances is a four-step process:

Push container images to Amazon Elastic Container Registry (Amazon ECR), an AWS managed container registry.
Create a SageMaker AI Model.
Create an endpoint configuration.
Create an endpoint.

The following sections of this guide provide more details regarding the above steps. Replace the user placeholder text in the code examples with your own information.

Push container images to Amazon ECR

You can push your container images to Amazon ECR with the AWS CLI. When using an ARM compatible image, verify that it supports ARM architecture:


docker inspect deep-learning-container-uri

The response "Architecture": "arm64" indicates that the image supports ARM architecture. You can push it to Amazon ECR with the docker push command. For more information, check Pushing a Docker image.

Multi-architecture container images are fundamentally a set of container images supporting different architectures or operating systems, that you can refer to by a common manifest name. If you are using multi-architecture container images, then in addition to pushing the images to Amazon ECR, you will also have to push a manifest list to Amazon ECR. A manifest list allows for the nested inclusion of other image manifests, where each included image is specified by architecture, operating system and other platform attributes. The following example creates a manifest list, and pushes it to Amazon ECR.

Create a manifest list.


docker manifest create aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository \
  aws-account-id.dkr.ecr.aws-account-id.amazonaws.com/my-repository:amd64 \
	aws-account-id.dkr.ecr.aws-account-id.amazonaws.com/my-repository:arm64 \

Annotate the manifest list, so that it correctly identifies which image is for which architecture.


docker manifest annotate --arch arm64 aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository \
  aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository:arm64

Push the manifest.


docker manifest push aws-account-id.dkr.ecr.aws-region.amazonaws.com/my-repository

For more information on creating and pushing manifest lists to Amazon ECR, check Introducing multi-architecture container images for Amazon ECR, and Pushing a multi-architecture image.

Create a SageMaker AI Model

Create a SageMaker AI Model by calling the CreateModel API.


import boto3
from sagemaker import get_execution_role


aws_region = "aws-region"
sagemaker_client = boto3.client("sagemaker", region_name=aws_region)

role = get_execution_role()

sagemaker_client.create_model(
    ModelName = "model-name",
    PrimaryContainer = {
        "Image": "deep-learning-container-uri",
        "ModelDataUrl": "model-s3-location",
        "Environment": {
            "SAGEMAKER_PROGRAM": "inference.py",
            "SAGEMAKER_SUBMIT_DIRECTORY": "inference-script-s3-location",
            "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
            "SAGEMAKER_REGION": aws_region,
        }
    },
    ExecutionRoleArn = role
)

Create an endpoint configuration

Create an endpoint configuration by calling the CreateEndpointConfig API. For a list of Graviton-based instances, check Compute optimized instances.


sagemaker_client.create_endpoint_config(
    EndpointConfigName = "endpoint-config-name",
    ProductionVariants = [
        {
            "VariantName": "variant-name",
            "ModelName": "model-name",
            "InitialInstanceCount": 1,
            "InstanceType": "ml.c7g.xlarge", # Graviton-based instance
       }
    ]
)

Create an endpoint

Create an endpoint by calling the CreateEndpoint API.


sagemaker_client.create_endpoint(
    EndpointName = "endpoint-name",
    EndpointConfigName = "endpoint-config-name"
)

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Low latency real-time inference with AWS PrivateLink

Troubleshoot deployments

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Migrate inference workload from x86 to AWS Graviton

Topics

Push container images to Amazon ECR

Create a SageMaker AI Model

Create an endpoint configuration

Create an endpoint

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?