Inference
This section shows how to run inference on AWS Deep Learning Containers for Amazon Elastic Container Service (Amazon ECS) using PyTorch, and TensorFlow.
Important
If your account has already created the Amazon ECS service-linked role, then that role is used by default for your service unless you specify a role here. The service-linked role is required if your task definition uses the awsvpc network mode. The role is also required if the service is configured to use service discovery, an external deployment controller, multiple target groups, or Elastic Inference accelerators in which case you should not specify a role here. For more information, see Using Service-Linked Roles for Amazon ECS in the Amazon ECS Developer Guide.
PyTorch inference
Before you can run a task on your Amazon ECS cluster, you must register a task definition. Task definitions are lists of containers grouped together. The following examples use a sample Docker image that adds either CPU or GPU inference scripts to Deep Learning Containers.
Next steps
To learn about using Custom Entrypoints with Deep Learning Containers on Amazon ECS, see Custom entrypoints.
TensorFlow inference
The following examples use a sample Docker image that adds either CPU or GPU inference scripts to Deep Learning Containers from your host machine's command line.
CPU-based inference
Use the following example to run CPU-based inference.
-
Create a file named
ecs-dlc-cpu-inference-taskdef.json
with the following contents. You can use this with either TensorFlow or TensorFlow 2. To use it with TensorFlow 2, change the Docker image to a TensorFlow 2 image and clone the r2.0 serving repository branch instead of r1.15.{ "requiresCompatibilities": [ "EC2" ], "containerDefinitions": [{ "command": [ "mkdir -p /test && cd /test && git clone -b r1.15 https://github.com/tensorflow/serving.git && tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=saved_model_half_plus_two --model_base_path=/test/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_cpu" ], "entryPoint": [ "sh", "-c" ], "name": "
tensorflow-inference-container
", "image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:1.15.0-cpu-py36-ubuntu18.04
", "memory":8111
, "cpu":256
, "essential": true, "portMappings": [{ "hostPort": 8500, "protocol": "tcp", "containerPort": 8500 }, { "hostPort": 8501, "protocol": "tcp", "containerPort": 8501 }, { "containerPort": 80, "protocol": "tcp" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/tensorflow-inference-gpu", "awslogs-region": "us-east-1
", "awslogs-stream-prefix": "half-plus-two
", "awslogs-create-group": "true" } } }], "volumes": [], "networkMode": "bridge", "placementConstraints": [], "family": "tensorflow-inference
" } -
Register the task definition. Note the revision number in the output and use it in the next step.
aws ecs register-task-definition --cli-input-json file://
ecs-dlc-cpu-inference-taskdef.json
-
Create an Amazon ECS service. When you specify the task definition, replace
revision_id
with the revision number of the task definition from the output of the previous step.aws ecs create-service --cluster
ecs-ec2-training-inference
\ --service-namecli-ec2-inference-cpu
\ --task-definitionEc2TFInference:revision_id
\ --desired-count1
\ --launch-typeEC2
\ --scheduling-strategy="REPLICA
" \ --regionus-east-1
-
Verify the service and get the network endpoint by completing the following steps.
-
Open the console at https://console.aws.amazon.com/ecs/v2
. -
Select the
ecs-ec2-training-inference
cluster. -
On the Cluster page, choose Services and then cli-ec2-inference-cpu.
-
After your task is in a
RUNNING
state, choose the task identifier. -
Under Logs, choose View logs in CloudWatch. This takes you to the CloudWatch console to view the training progress logs.
-
Under Containers, expand the container details.
-
Under Name and then Network Bindings, under External Link note the IP address for port 8501 and use it in the next step.
-
-
To run inference, use the following command. Replace the external IP address with the external link IP address from the previous step.
curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://
<External ip>
:8501/v1/models/saved_model_half_plus_two:predictThe following is sample output.
{ "predictions": [2.5, 3.0, 4.5 ] }
Important
If you are unable to connect to the external IP address, be sure that your corporate firewall is not blocking non-standards ports, like 8501. You can try switching to a guest network to verify.
GPU-based inference
Use the following example to run GPU-based inference.
-
Create a file named
ecs-dlc-gpu-inference-taskdef.json
with the following contents. You can use this with either TensorFlow or TensorFlow 2. To use it with TensorFlow 2, change the Docker image to a TensorFlow 2 image and clone the r2.0 serving repository branch instead of r1.15.{ "requiresCompatibilities": [ "EC2" ], "containerDefinitions": [{ "command": [ "mkdir -p /test && cd /test && git clone -b r1.15 https://github.com/tensorflow/serving.git && tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=saved_model_half_plus_two --model_base_path=/test/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu" ], "entryPoint": [ "sh", "-c" ], "name": "
tensorflow-inference-container
", "image": "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:1.15.0-gpu-py36-cu100-ubuntu18.04
", "memory":8111
, "cpu":256
, "resourceRequirements": [{ "type": "GPU", "value": "1" }], "essential": true, "portMappings": [{ "hostPort": 8500, "protocol": "tcp", "containerPort": 8500 }, { "hostPort": 8501, "protocol": "tcp", "containerPort": 8501 }, { "containerPort": 80, "protocol": "tcp" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/TFInference", "awslogs-region": "us-east-1
", "awslogs-stream-prefix": "ecs
", "awslogs-create-group": "true" } } }], "volumes": [], "networkMode": "bridge", "placementConstraints": [], "family": "TensorFlowInference
" } -
Register the task definition. Note the revision number in the output and use it in the next step.
aws ecs register-task-definition --cli-input-json file://
ecs-dlc-gpu-inference-taskdef.json
-
Create an Amazon ECS service. When you specify the task definition, replace
revision_id
with the revision number of the task definition from the output of the previous step.aws ecs create-service --cluster
ecs-ec2-training-inference
\ --service-namecli-ec2-inference-gpu
\ --task-definitionEc2TFInference:revision_id
\ --desired-count1
\ --launch-typeEC2
\ --scheduling-strategy="REPLICA
" \ --regionus-east-1
-
Verify the service and get the network endpoint by completing the following steps.
-
Open the console at https://console.aws.amazon.com/ecs/v2
. -
Select the
ecs-ec2-training-inference
cluster. -
On the Cluster page, choose Services and then cli-ec2-inference-cpu.
-
After your task is in a
RUNNING
state, choose the task identifier. -
Under Logs, choose View logs in CloudWatch. This takes you to the CloudWatch console to view the training progress logs.
-
Under Containers, expand the container details.
-
Under Name and then Network Bindings, under External Link note the IP address for port 8501 and use it in the next step.
-
-
To run inference, use the following command. Replace the external IP address with the external link IP address from the previous step.
curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://
<External ip>
:8501/v1/models/saved_model_half_plus_two:predictThe following is sample output.
{ "predictions": [2.5, 3.0, 4.5 ] }
Important
If you are unable to connect to the external IP address, be sure that your corporate firewall is not blocking non-standards ports, like 8501. You can try switching to a guest network to verify.