Document coverage and accuracy – in domain - AWS Prescriptive Guidance

Document coverage and accuracy – in domain

We compared the predictive performance of deep ensembles with dropout applied at test time, MC dropout, and a naïve softmax function, as shown in the following graph. After inference, predictions with the highest uncertainties were dropped at different levels, yielding remaining data coverage that ranged from 10% to 100%. We expected the deep ensemble to more efficiently identify uncertain predictions due to its greater ability to quantify epistemic uncertainty; that is, to identify regions in the data where the model has less experience. This should be reflected in higher accuracy for different data coverage levels. For each deep ensemble, we used 5 models and applied inference 20 times. For MC dropout, we applied inference 100 times for each model. We used the same set of hyperparameters and model architecture for each method.

Comparison of predictive performance of deep ensembles, MC dropout, and softmax function

The graph appears to show a slight benefit to using deep ensembles and MC dropout compared with naïve softmax. This is most notable in the 50-80% data coverage range. Why is this not greater? As mentioned in the deep ensembles section, the strength of deep ensembles comes from the different loss trajectories taken. In this situation, we are using pretrained models. Although we fine-tune the entire model, the overwhelming majority of the weights are initialized from the pretrained model, and only a few hidden layers are randomly initialized. Consequently, we conjecture that pretraining of large models can cause an overconfidence due to little diversification. To our knowledge, the efficacy of deep ensembles has not been previously tested in transfer learning scenarios, and we see this as an exciting area for future research.