Specifying deep learning in an Amazon ECS task definition - Amazon Elastic Container Service

Specifying deep learning in an Amazon ECS task definition

To run Habana Gaudi accelerated deep learning containers on Amazon ECS, your task definition must contain the container definition for a pre-built container that serves the deep learning model for TensorFlow or PyTorch using Habana SynapseAI that's provided by AWS Deep Learning Containers.

The following container image has TensorFlow 2.7.0 and Ubuntu 20.04. A complete list of pre-built Deep Learning Containers that's optimized for the Habana Gaudi accelerators is maintained on GitHub. For more information, see Habana Training Containers.

763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training-habana:2.7.0-hpu-py38-synapseai1.2.0-ubuntu20.04

The following is an example task definition for Linux containers on Amazon EC2, displaying the syntax to use. This example uses an image containing the Habana Labs System Management Interface Tool (HL-SMI) found here: vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614

{ "family": "dl-test", "requiresCompatibilities": ["EC2"], "placementConstraints": [ { "type": "memberOf", "expression": "attribute:ecs.os-type == linux" }, { "type": "memberOf", "expression": "attribute:ecs.instance-type == dl1.24xlarge" } ], "networkMode": "host", "cpu": "10240", "memory": "1024", "containerDefinitions": [ { "entryPoint": [ "sh", "-c" ], "command": ["hl-smi"], "cpu": 8192, "environment": [ { "name": "HABANA_VISIBLE_DEVICES", "value": "all" } ], "image": "vault.habana.ai/gaudi-docker/1.1.0/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.1.0-614", "essential": true, "name": "tensorflow-installer-tf-hpu" } ] }