View a markdown version of this page

FSISUS15: What is your testing process for workloads that require floating point precision? - Financial Services Industry Lens

FSISUS15: What is your testing process for workloads that require floating point precision?

FSISUS15-BP01 Minimize the bit count while maintaining precision

Prescriptive guidance

Floating point precision is a way to represent real numbers in a finite binary format. It stores a number in a fixed-width field with the intent to reduce the memory bandwidth and storage requirements compared to double-precision arithmetic results. Although double-precision can sometimes lead to more accurate results, single-precision calculations can be faster and thus

reduce overall energy consumption for particular workloads. Determine which of your workloads is suitable for use of floating-point accuracy, performance, and efficiency. Consider testing with a cluster of instances to see how well it performs at scale.

Implementation guidance:

  • For intensive financial simulations and calculations, test the number of bits that are required to achieve your floating point precision and consider reducing number of bits by selecting different floating-point formats, including bfloat16, that's supported by AWS Graviton.

  • Using floating point Quantization, you can represent numbers using lower bit-count integers or floating point numbers without incurring a significant loss in accuracy. Specifically, you can reduce resource usage by replacing the parameters in your workload with (1) half-precision (16 bit), (2) bfloat16 (16 bit, but the same dynamic range as 32 bit), or 8-bit integers instead of the usual single-precision floating-point (32 bit) values.

  • Service recommendations: Use the following services to achieve your goal.

  • Test generative AI models with reduced precision (quantization) to maintain accuracy while reducing resource consumption.

  • Validate generative AI model performance with different floating-point precisions.

  • Use mixed-precision training for generative AI models to optimize resource usage.