Monte Carlo dropout

One of the most popular ways to estimate uncertainty is by inferring predictive distributions with Bayesian neural networks. To denote a predictive distribution, use:

with target AWS logo with "Amazon Web Services" text on a white background. , input X icon, typically used to represent closing or canceling an action. , and Lambda function icon with a stylized λ (lambda) symbol in orange. many training examples Mathematical formula showing D as a set of pairs (x_i, y_i) from i=1 to n. . When you obtain a predictive distribution, you can inspect the variance and uncover uncertainty. One way to learn a predictive distribution requires learning a distribution over functions, or, equivalently, a distribution over the parameters (that is, the parametric posterior distribution Mathematical formula showing p(Θ|D) with vertical bar between Θ and D. .

The Monte Carlo (MC) dropout technique (Gal and Ghahramani 2016) provides a scalable way to learn a predictive distribution. MC dropout works by randomly switching off neurons in a neural network, which regularizes the network. Each dropout configuration corresponds to a different sample from the approximate parametric posterior distribution Mathematical formula showing q(θ|D) representing a probability distribution. :

where Greek letter theta subscript i, representing a mathematical variable or symbol. corresponds to a dropout configuration, or, equivalently, a simulation ~, sampled from the approximate parametric posterior Mathematical formula showing q(θ|D) representing a probability distribution. , as shown in the following figure. Sampling from the approximate posterior enables Monte Carlo integration of the model’s likelihood, which uncovers the predictive distribution, as follows:

For simplicity, the likelihood may be assumed to be Gaussian distributed:

with the Gaussian function Mathematical equation showing N subscript V, representing a variable in a formula. specified by the mean Mathematical function f(x, θ) with x and θ as variables. and variance Mathematical formula showing s prime as a function of x and theta. parameters, which are output by simulations from the Monte Carlo dropout BNN:

The following figure illustrates MC dropout. Each dropout configuration yields a different output by randomly switching neurons off (gray circles) and on (black circles) with each forward propagation. Multiple forward passes with different dropout configurations yield a predictive distribution over the mean p(f(x, ø)).

The number of forward passes through the data should be evaluated quantitatively, but 30-100 is an appropriate range to consider (Gal and Ghahramani 2016).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Temperature scaling

Deep ensembles