Monte Carlo dropout
One of the most popular ways to estimate uncertainty is by inferring predictive distributions with Bayesian neural networks. To denote a predictive distribution, use:
with target , input , and many training examples . When you obtain a predictive distribution, you can inspect the variance and uncover uncertainty. One way to learn a predictive distribution requires learning a distribution over functions, or, equivalently, a distribution over the parameters (that is, the parametric posterior distribution .
The Monte Carlo (MC) dropout technique (Gal and Ghahramani 2016) provides a scalable way to learn a predictive distribution. MC dropout works by randomly switching off neurons in a neural network, which regularizes the network. Each dropout configuration corresponds to a different sample from the approximate parametric posterior distribution :
where corresponds to a dropout configuration, or, equivalently, a simulation ~, sampled from the approximate parametric posterior , as shown in the following figure. Sampling from the approximate posterior enables Monte Carlo integration of the model’s likelihood, which uncovers the predictive distribution, as follows:
For simplicity, the likelihood may be assumed to be Gaussian distributed:
with the Gaussian function specified by the mean and variance parameters, which are output by simulations from the Monte Carlo dropout BNN:
The following figure illustrates MC dropout. Each dropout configuration yields a different output by randomly switching neurons off (gray circles) and on (black circles) with each forward propagation. Multiple forward passes with different dropout configurations yield a predictive distribution over the mean p(f(x, ø)).
The number of forward passes through the data should be evaluated quantitatively, but 30-100 is an appropriate range to consider (Gal and Ghahramani 2016).