Difference in Positive Proportions in Predicted Labels (DPPL)
The difference in positive proportions in predicted labels (DPPL) metric determines whether the model predicts outcomes differently for each facet. It is defined as the difference between the proportion of positive predictions (y’ = 1) for facet a and the proportion of positive predictions (y’ = 1) for facet d. For example, if the model predictions grant loans to 60% of a middleaged group (facet a) and 50% other age groups (facet d), it might be biased against facet d. In this example, you must determine whether the 10% difference is material to a case for bias.
A comparison of difference in proportions of labels (DPL), a measure of pretraining
bias, with DPPL, a measure of posttraining bias, assesses whether bias in positive
proportions that are initially present in the dataset changes after training. If DPPL is
larger than DPL, then bias in positive proportions increased after training. If DPPL is
smaller than DPL, the model did not increase bias in positive proportions after
training. Comparing DPL against DPPL does not guarantee that the model reduces bias
along all dimensions. For example, the model may still be biased when considering other
metrics such as Counterfactual Fliptest
(FT) or Accuracy Difference (AD). For more information about
bias detection, see the blog post Learn how Amazon SageMaker Clarify helps detect bias
The formula for the DPPL is:
DPPL = q'_{a}  q'_{d}
Where:

q'_{a} = n'_{a}^{(1)}/n_{a} is the predicted proportion of facet a who get a positive outcome of value 1. In our example, the proportion of a middleaged facet predicted to get granted a loan. Here n'_{a}^{(1)} represents the number of members of facet a who get a positive predicted outcome of value 1 and n_{a} the is number of members of facet a.

q'_{d} = n'_{d}^{(1)}/n_{d} is the predicted proportion of facet d who get a positive outcome of value 1. In our example, a facet of older and younger people predicted to get granted a loan. Here n'_{d}^{(1)} represents the number of members of facet d who get a positive predicted outcome and n_{d} the is number of members of facet d.
If DPPL is close enough to 0, it means that posttraining demographic parity has been achieved.
For binary and multicategory facet labels, the normalized DPL values range over the interval [1, 1]. For continuous labels, the values vary over the interval (∞, +∞).

Positive DPPL values indicate that facet a has a higher proportion of predicted positive outcomes when compared with facet d.
This is referred to as positive bias.

Values of DPPL near zero indicate a more equal proportion of predicted positive outcomes between facets a and d and a value of zero indicates perfect demographic parity.

Negative DPPL values indicate that facet d has a higher proportion of predicted positive outcomes when compared with facet a. This is referred to as negative bias.