Specifying deep learning in an Amazon ECS task definition
To run Habana Gaudi accelerated deep learning containers on Amazon ECS, your task definition must contain the container definition for a pre-built container that serves the deep learning model for TensorFlow or PyTorch using Habana SynapseAI that's provided by AWS Deep Learning Containers.
The following container image has TensorFlow 2.7.0 and Ubuntu 20.04. A complete
list of pre-built Deep Learning Containers that's optimized for the Habana Gaudi accelerators is
maintained on GitHub. For more information, see Habana Training Containers
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training-habana:2.7.0-hpu-py38-synapseai1.2.0-ubuntu20.04
The following is an example task definition for Linux containers on Amazon EC2,
displaying the syntax to use. This example uses an image containing the Habana Labs
System Management Interface Tool (HL-SMI) found here:
vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614
{ "family": "dl-test", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == dl1.24xlarge" } ], "networkMode": "host", "cpu": "10240", "memory": "1024", "containerDefinitions": [ { "entryPoint": [ "sh", "-c" ], "command": ["hl-smi"], "cpu": 8192, "environment": [ { "name": "HABANA_VISIBLE_DEVICES", "value": "all" } ], "image": "vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614", "essential": true, "name": "tensorflow-installer-tf-hpu" } ] }