TensorFlow with Horovod
This tutorial shows how to activate TensorFlow with Horovod on an AWS Deep Learning AMI (DLAMI) with Conda. Horovod is pre-installed in the Conda environments for TensorFlow. The Python3 environment is recommended.
Note
Only P3.*, P2.*, and G3.* instance types are supported.
To activate TensorFlow and test Horovod on the DLAMI with Conda
-
Open an Amazon Elastic Compute Cloud (Amazon EC2) instance of the DLAMI with Conda. For help getting started with a DLAMI, see How to Get Started with the DLAMI.
-
(Recommended) For TensorFlow 1.15 with Horovod on Python 3 with CUDA 11, run the following command:
$
source activate tensorflow_p37 -
Start the iPython terminal:
(tensorflow_p37)$
ipython -
Test importing TensorFlow with Horovod to verify that it's working properly:
import horovod.tensorflow as hvd hvd.init()
The following may appear on your screen (you may ignore any warning messages).
-------------------------------------------------------------------------- [[55425,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: ip-172-31-72-4 Another transport will be used instead, although this may result in lower performance. --------------------------------------------------------------------------
More Info
For tutorials, see the
examples/horovod
folder in the home directory of the DLAMI.For even more tutorials and examples, see the Horovod GitHub project
.