Conclusion - AWS Prescriptive Guidance


This guide provided a conceptual overview of uncertainty in deep learning systems. It described experiments that extend the existing literature to cover the transfer learning scenario for natural language processing (NLP) in both in-distribution and out-of-distribution settings. Finally, it provided a case study that serves as a roadmap for how data scientists can apply these concepts in their work in a highly regulated industry.

When quantifying uncertainty in deep learning networks, our general recommendation is to use temperature scaling with deep ensembles. Temperature scaling provides interpretable uncertainty estimates when incoming data is in distribution. Therefore, temperature scaling addresses the total uncertainty by adjusting the softmax uncertainties so that they are not so overconfident. Temperature scaling should be performed on the validation dataset, after the model has been trained on the validation dataset.

Deep ensembles currently provide state-of-the-art estimates of uncertainty when data is out of distribution. They provide higher epistemic uncertainty estimates when presented with data that’s different from the training data. This is due to the strength in diversity of the underlying models that comprise the deep ensemble. We suggest that five models will suffice in most situations.

In two scenarios, we recommend that you consider MC dropout as an alternative to deep ensembles: when hosting multiple models is a concern due to additional load to the infrastructure, and in transfer learning (that is, when using pretrained weights). When the hosting requirements for multiple models is a concern, MC dropout is a valid alternative to deep ensembles. If you're using MC dropout as a replacement for deep ensembles, you should be prepared to sacrifice some computational latency for the sake of more iterations through the data. We recommend 30-100 iterations as an appropriate range. In transfer learning, there will be less diversification among the ensembled base learners (that is, the underlying model weights will be more similar to one another). This is why total predictive uncertainty can be low in transfer learning, especially in settings with out-of-distribution data. As a result, in the transfer learning situation, consider supplementing or replacing deep ensembles with MC dropout.