KullbackLeibler Divergence (KL)
The KullbackLeibler divergence (KL) measures how much the observed label distribution of facet a, P_{a}(y), diverges from distribution of facet d, P_{d}(y). It is also known as the relative entropy of P_{a}(y) with respect to P_{d}(y) and quantifies the amount of information lost when moving from P_{a}(y) to P_{d}(y).
The formula for the KullbackLeibler divergence is as follows:
KL(P_{a}  P_{d}) = ∑_{y}P_{a}(y)_{*}log[P_{a}(y)/P_{d}(y)]
It is the expectation of the logarithmic difference between the probabilities P_{a}(y) and P_{d}(y), where the expectation is weighted by the probabilities P_{a}(y). This is not a true distance between the distributions as it is asymmetric and does not satisfy the triangle inequality. The implementation uses natural logarithms, giving KL in units of nats. Using different logarithmic bases gives proportional results but in different units. For example, using base 2 gives KL in units of bits.
For example, assume that a group of applicants for loans have a 30% approval rate (facet d) and that the approval rate for other applicants (facet a) is 80%. The KullbackLeibler formula gives you the label distribution divergence of facet a from facet d as follows:
KL = 0.8*ln(0.8/0.3) + 0.2*ln(0.2/0.7) = 0.53
There are two terms in the formula here because labels are binary in this example. This measure can be applied to multiple labels in addition to binary ones. For example, in a college admissions scenario, assume an applicant may be assigned one of three category labels: y_{i} = {y_{0}, y_{1}, y_{2}} = {rejected, waitlisted, accepted}.
Range of values for the KL metric for binary, multicategory, and continuous outcomes is [0, +∞).

Values near zero mean the outcomes are similarly distributed for the different facets.

Positive values mean the label distributions diverge, the more positive the larger the divergence.