Quantifying uncertainty in deep learning systems - AWS Prescriptive Guidance

Quantifying uncertainty in deep learning systems

Josiah Davis, Jason Zhu, PhD, Jeremy Oldfather — AWS Professional Services

Samual MacDonald, Maciej Trzaskowski, PhD – Max Kelsen

August 2020

Delivering machine learning (ML) solutions to production is difficult. It’s not easy to know where to start, which tools and techniques to use, and whether you’re doing it right. ML professionals use different techniques based on their individual experiences, or they use prescribed tools that were developed within their company. In either case, deciding what to do, implementing the solution, and maintaining it require significant investments in time and resources. Although existing ML techniques help speed up parts of the process, integrating these techniques to deliver robust solutions requires months of work. This guide is the first part of a content series that focuses on machine learning and provides examples of how you can get started quickly. The goal of the series is to help you standardize your ML approach, make design decisions, and deliver your ML solutions efficiently. We will be publishing additional ML guides in the coming months, so please check the AWS Prescriptive Guidance website for updates.

This guide explores current techniques for quantifying and managing uncertainty in deep learning systems, to improve predictive modeling in ML solutions. This content is for data scientists, data engineers, software engineers, and data science leaders who are looking to deliver high-quality, production-ready ML solutions efficiently and at scale. The information is relevant for data scientists regardless of their cloud environment or the Amazon Web Services (AWS) services they are using or are planning to use.

This guide assumes familiarity with introductory concepts in probability and deep learning. For suggestions on building machine learning competency at your organization, see Deep Learning Specialization on the Coursera website, or the resources on the Machine Learning: Data Scientist page on the AWS Training and Certification website.


If success in data science is defined by the predictive performance of our models, deep learning is certainly a strong performer. This is especially true for solutions that use non-linear, high-dimensional patterns from very large datasets. However, if success is also defined by the ability to reason with uncertainty and detect failures in production, the efficacy of deep learning becomes questionable. How do we best quantify uncertainty? How do we use these uncertainties to manage risks? What are the pathologies of uncertainty that challenge the reliability, and therefore the safety, of our products? And how can we overcome such challenges?

This guide:

  • Introduces the motivation for quantifying uncertainty in deep learning systems

  • Explains important concepts in probability that relate to deep learning

  • Demonstrates current state-of-the-art techniques for quantifying uncertainty in deep learning systems, highlighting their associated benefits and limitations

  • Explores these techniques within the transfer learning setting of natural language processing (NLP)

  • Provides a case study inspired by projects performed in a similar setting

As discussed in this guide, when quantifying uncertainty in deep learning, a good rule of thumb is to use temperature scaling with deep ensembles.

  • Temperature scaling is an ideal tool for interpreting uncertainty estimates when data can be considered in distribution (Guo et al. 2017).

  • Deep ensembles provide state-of-the-art estimates of uncertainty of when data is out of distribution (Ovadia et al. 2019).

If the memory footprint of hosting models is a concern, you can use Monte Carlo (MC) dropout in place of deep ensembles. In the case of transfer learning, consider using either MC dropout or deep ensembles with MC dropout.