GENSEC06-BP01 Implement data purification filters for model training workflows

Data poisoning is best handled at the data layer before training or customization has taken place. Data purification filters can be introduced to data pipelines when curating a dataset for training or customization.

Desired outcome: When implemented, this best practice reduces the likelihood of inappropriate or undesirable data being introduced into a model training or customization workflow.

Benefits of establishing this best practice: Apply security at all layers - Security at all layers reduces the risk of subtle security vulnerabilities entering an otherwise advanced workflow.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Data poisoning happens during pre-training, domain adaptation, and fine-tuning, where poisoned data is introduced, intentionally or by mistake, into a model. Data poisoning is considered successful if the model has learned from poisoned data. Protect models from poisoning during pre-training and ongoing training steps by isolating your model training environment, infrastructure, and data. Data should be examined and cleaned for content which may be considered poisonous before introducing that data to a training job. There are several ways to accomplish this, all of which are dependent on the data used to train a model. For example, consider using Amazon Transcribe's Toxicity Detection capability for voice data. For text data, consider using the Amazon Bedrock Guardrails API to filter data. Trained models can be tested using toxicity evaluation techniques from fmeval or Amazon SageMaker AI Studio's model evaluation capability. Carefully consider what your use case defines as poisonous, and develop mechanisms for surfacing this kind of data before it is introduced to a model through pre- and post-training steps.

Implementation steps

Identify the data intended for model pre-training or model customization.
Develop filters to check for data which may be considered poisonous to the model.
- Examples include data which is biased, factually incorrect, hateful, or violent.
- Other examples include data which is irrelevant to the models intended purpose.
Consider a guardrail from Amazon Bedrock Guardrails or a third-party solution to check for less discrete signals of poisoning.
Run these checks on the data intended for model pre-training and/or model customization, remediating issues as they are discovered.

Resources

Related practices:

SEC07-BP02

Related guides, videos, and documentation:

Related examples:

Implement Model Independent Safety Measures with Amazon Bedrock Guardrails

Related tools:

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Data poisoning

Reliability