Training

With mixed-precision training you can deploy larger networks with the same amount of memory, or reduce memory usage compared to your single or double precision network, and you will see compute performance increases. You also get the benefit of smaller and faster data transfers, an important factor in multiple node distributed training. To take advantage of mixed-precision training you need to adjust data casting and loss scaling. The following are guides describing how to do this for the frameworks that support mixed-precision.

NVIDIA Deep Learning SDK - docs on the NVIDIA website describing mixed-precision implementation for MXNet, PyTorch, and TensorFlow.

Tip

Be sure to check the website for your framework of choice, and search for "mixed precision" or "fp16" for the latest optimization techniques. Here are some mixed-precision guides you might find helpful:

Mixed-precision training with TensorFlow (video) - on the NVIDIA blog site.
Mixed-precision training using float16 with MXNet - an FAQ article on the MXNet website.
NVIDIA Apex: a tool for easy mixed-precision training with PyTorch - a blog article on the NVIDIA website.

You might be interested in these other topics on GPU monitoring and optimization:

Monitoring
- Monitor GPUs with CloudWatch
Optimization
- Preprocessing
- Training

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Preprocessing

AWS Inferentia