Generate Reports for Bias in Pretraining Data in SageMaker Studio
SageMaker Clarify is integrated with Amazon SageMaker Data Wrangler, which can help you identify bias during data preparation without having to write your own code. Data Wrangler provides an end-to-end solution to import, prepare, transform, featurize, and analyze data with Amazon SageMaker Studio. For an overview of the Data Wrangler data prep workflow, see Prepare ML Data with Amazon SageMaker Data Wrangler. You specify attributes of interest, such as gender or age, and SageMaker Clarify runs a set of algorithms to detect the presence of bias in those attributes. After the algorithm runs, SageMaker Clarify provides a visual report with a description of the sources and severity of possible bias so that you can plan steps to mitigate. For example, in a financial dataset that contains few examples of business loans to one age group as compared to others, SageMaker flags the imbalance so that you can avoid a model that disfavors that age group.
To analyze and report on data bias
To get started with Data Wrangler, see Get Started with Data Wrangler.
-
Open Amazon SageMaker Studio and choose Create Data Flow from the Import and prepare your data tile.
-
From the Import data tab, choose Amazon S3 and then specify your data source on the Data sources/S3 source page.
-
After you have imported your data, choose the plus sign on the Data flow page and then choose Add Analysis.
-
On the Create Analysis page, go to the Configure panel and then choose Bias Report from the Chart menu.
-
Configure the bias report by providing the Name, the column to predict and whether it is a value or threshold, the column to analyze for bias (the facet) and whether it is a value or threshold.
-
Continue configuring the bias report by choosing the bias metrics.
-
Choose Check for bias to generate and view the bias report. Scroll down to view all of the reports.
-
Choose the caret to the right of the bias metric description to see documentation that can help you interpret the significance of the metric values.
-
To view a table summary of the bias metric values, choose the table, You can save the report for export by choosing Create in the lower-right corner of the page.
-
On the page where your data bias reports are stored, choose the Export tab to download the reports.