Kullback-Leibler Divergence (KL)

The Kullback-Leibler divergence (KL) measures how much the observed label distribution of facet a, P_a(y), diverges from distribution of facet d, P_d(y). It is also known as the relative entropy of P_a(y) with respect to P_d(y) and quantifies the amount of information lost when moving from P_a(y) to P_d(y).

The formula for the Kullback-Leibler divergence is as follows:

KL(P_a || P_d) = ∑_yP_a(y)_*log[P_a(y)/P_d(y)]

It is the expectation of the logarithmic difference between the probabilities P_a(y) and P_d(y), where the expectation is weighted by the probabilities P_a(y). This is not a true distance between the distributions as it is asymmetric and does not satisfy the triangle inequality. The implementation uses natural logarithms, giving KL in units of nats. Using different logarithmic bases gives proportional results but in different units. For example, using base 2 gives KL in units of bits.

For example, assume that a group of applicants for loans have a 30% approval rate (facet d) and that the approval rate for other applicants (facet a) is 80%. The Kullback-Leibler formula gives you the label distribution divergence of facet a from facet d as follows:

KL = 0.8*ln(0.8/0.3) + 0.2*ln(0.2/0.7) = 0.53

There are two terms in the formula here because labels are binary in this example. This measure can be applied to multiple labels in addition to binary ones. For example, in a college admissions scenario, assume an applicant may be assigned one of three category labels: y_i = {y₀, y₁, y₂} = {rejected, waitlisted, accepted}.

Range of values for the KL metric for binary, multicategory, and continuous outcomes is [0, +∞).

Values near zero mean the outcomes are similarly distributed for the different facets.
Positive values mean the label distributions diverge, the more positive the larger the divergence.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Label Imbalance (DPL)

Jensen-Shannon Divergence (JS)