Lp-norm (LP) - Amazon SageMaker

Lp-norm (LP)

The Lp-norm (LP) measures the p-norm distance between the facet distributions of the observed labels in a training dataset. This metric is non-negative and so cannot detect reverse bias.

The formula for the Lp-norm is as follows:

        Lp(Pa, Pd) = ( ∑y||Pa - Pd||p)1/p

Where the p-norm distance between the points x and y is defined as follows:

        Lp(x, y) = (|x1-y1|p + |x2-y2|p + … +|xn-yn|p)1/p

The 2-norm is the Euclidean norm. Assume you have an outcome distribution with three categories, for example, yi = {y0, y1, y2} = {accepted, waitlisted, rejected} in a college admissions multicategory scenario. You take the sum of the squares of the differences between the outcome counts for facets a and d. The resulting Euclidean distance is calculated as follows:

        L2(Pa, Pd) = [(na(0) - nd(0))2 + (na(1) - nd(1))2 + (na(2) - nd(2))2]1/2

Where:

  • na(i) is number of the ith category outcomes in facet a: for example na(0) is number of facet a acceptances.

  • nd(i) is number of the ith category outcomes in facet d: for example nd(2) is number of facet d rejections.

    The range of LP values for binary, multicategory, and continuous outcomes is [0, √2), where:

    • Values near zero mean the labels are similarly distributed.

    • Positive values mean the label distributions diverge, the more positive the larger the divergence.