Aleatoric uncertainty - AWS Prescriptive Guidance

Aleatoric uncertainty

Aleatoric uncertainty refers to the data’s inherent randomness that cannot be explained away (aleator refers to someone who rolls the dice in Latin). Examples of data with aleatoric uncertainty include noisy telemetry data and low-resolution images or social media text. You can assume the aleatoric uncertainty Mathematical expression showing expectation of s squared, given theta. , the inherent randomness, to be either constant (homoscedastic) or variable (heteroscedastic), as a function of the input explanatory variables.

Homoscedastic aleatoric uncertainty

Homoscedastic aleatoric uncertainty, when Mathematical expression showing expectation of s squared, given theta. is constant, is the simplest case and commonly encountered in regression under the modeling assumption that Mathematical equation showing y equals f of x plus epsilon. , where Mathematical notation for a normal distribution with mean 0 and variance 1. , where AWS icon representing a cloud service or feature. is the identity matrix and Pencil icon representing an edit or modification action. is a constant scalar. It is highly restrictive to assume constant aleatoric risk—to assume that noise Magnifying glass icon with a plus sign, indicating a zoom or search function. about a response Letter "y" in lowercase, handwritten cursive style on a plain background. is independent from the explanatory variable Mathematical symbol representing a partial derivative or differential operator. and constant—and rarely reflective of reality. Many phenomena in nature do not exhibit constant randomness. For example, uncertainty about outcomes in physical systems, such as fluid motion, are usually a function of kinetic energy. Consider the contrast between the turbulent water flow of a large waterfall and the laminar water flow of a decorative fountain. The stochasticity (randomness) of a water particle’s trajectory is a function of the kinetic energy and therefore not constant. This assumption can lead to loss of valuable information when modeling relationships between targets and inputs that host variable noise, and cannot be explained with the observable information. As a consequence, in most cases, it is not sufficient to assume homoscedastic uncertainty. Unless the phenomena is known to be homoscedastic in nature, the inherent noise should be modeled as a function of the explanatory variables X icon, typically used to represent closing or canceling an action. , if it can be done so.

Heteroscedastic aleatoric uncertainty

Heteroscedastic aleatoric uncertainty is when we consider the inherent randomness within data to be a function of the data itself Mathematical function s*(x) represented in superscript notation. . To calculate this type of uncertainty, you average a sample set of the predictive variance:

Sample set of predictive variance

with Mathematical formula showing s prime as a function of x and theta. being estimated by a BNN. Learning aleatoric uncertainty during training encourages BNNs to encapsulate the inherent randomness within the data that can’t be explained away. If there is no inherent randomness, Mathematical function s*(x) represented in superscript notation. should tend toward zero.