Temperature scaling - AWS Prescriptive Guidance

Temperature scaling

We took the average of five different values, obtaining a value of T = 2.62 across different training seeds. The following charts show calibration before and after temperature scaling. As the first chart shows, unscaled softmax values revealed major discrepancies. For example, the 70-80% confidence bucket contains predictions that are less than 50% accurate. After scaling, the calibration improves substantially. For example, the 70-80% bucket corresponds to 72% accuracy. Consequently, we used the temperature-scaled values for subsequent experiments.

Calibration before and after temperature scaling