Migrate inference workload from x86 to AWS Graviton
AWS Graviton
You can migrate your existing inference workloads from x86-based instances to Graviton-based instances, by using
either ARM compatible container images or multi-architecture container images. This guide assumes that you are
either using AWS
Deep Learning container images
At a high level, migrating inference workload from x86-based instances to Graviton-based instances is a four-step process:
Push container images to Amazon Elastic Container Registry (Amazon ECR), an AWS managed container registry.
Create a SageMaker Model.
Create an endpoint configuration.
Create an endpoint.
The following sections of this guide provide more details regarding the above steps. Replace the
user placeholder text
in the code examples with your own information.
Topics
Push container images to Amazon ECR
You can push your container images to Amazon ECR with the AWS CLI. When using an ARM compatible image, verify that it supports ARM architecture:
docker inspect
deep-learning-container-uri
The response "Architecture": "arm64"
indicates that the image supports ARM architecture. You
can push it to Amazon ECR with the docker push
command. For more information, check Pushing a Docker
image.
Multi-architecture container images are fundamentally a set of container images supporting different architectures or operating systems, that you can refer to by a common manifest name. If you are using multi-architecture container images, then in addition to pushing the images to Amazon ECR, you will also have to push a manifest list to Amazon ECR. A manifest list allows for the nested inclusion of other image manifests, where each included image is specified by architecture, operating system and other platform attributes. The following example creates a manifest list, and pushes it to Amazon ECR.
-
Create a manifest list.
docker manifest create
aws-account-id
.dkr.ecr.aws-region
.amazonaws.com/my-repository
\aws-account-id
.dkr.ecr.aws-account-id
.amazonaws.com/my-repository:amd64
\aws-account-id
.dkr.ecr.aws-account-id
.amazonaws.com/my-repository:arm64
\ -
Annotate the manifest list, so that it correctly identifies which image is for which architecture.
docker manifest annotate --arch arm64
aws-account-id
.dkr.ecr.aws-region
.amazonaws.com/my-repository
\aws-account-id
.dkr.ecr.aws-region
.amazonaws.com/my-repository:arm64
-
Push the manifest.
docker manifest push
aws-account-id
.dkr.ecr.aws-region
.amazonaws.com/my-repository
For more information on creating and pushing manifest lists to Amazon ECR, check Introducing
multi-architecture container images for Amazon ECR
Create a SageMaker Model
Create a SageMaker Model by calling the CreateModel
API.
import boto3 from sagemaker import get_execution_role aws_region = "
aws-region
" sagemaker_client = boto3.client("sagemaker", region_name=aws_region) role = get_execution_role() sagemaker_client.create_model( ModelName = "model-name
", PrimaryContainer = { "Image": "deep-learning-container-uri
", "ModelDataUrl": "model-s3-location
", "Environment": { "SAGEMAKER_PROGRAM": "inference.py
", "SAGEMAKER_SUBMIT_DIRECTORY": "inference-script-s3-location
", "SAGEMAKER_CONTAINER_LOG_LEVEL": "20", "SAGEMAKER_REGION": aws_region, } }, ExecutionRoleArn = role )
Create an endpoint configuration
Create an endpoint configuration by calling the CreateEndpointConfig
API. For a list of Graviton-based instances, check Compute optimized instances.
sagemaker_client.create_endpoint_config( EndpointConfigName = "
endpoint-config-name
", ProductionVariants = [ { "VariantName": "variant-name
", "ModelName": "model-name
", "InitialInstanceCount":1
, "InstanceType": "ml.c7g.xlarge
", # Graviton-based instance } ] )
Create an endpoint
Create an endpoint by calling the CreateEndpoint
API.
sagemaker_client.create_endpoint( EndpointName = "
endpoint-name
", EndpointConfigName = "endpoint-config-name
" )