Note
Guardrails image content filters for Amazon Bedrock is in preview release, and is subject to change.
Block harmful images with content filters (Preview)
Amazon Bedrock Guardrails can help block inappropriate or harmful images by enabling image as a modality while configuring content filters within a guardrail.
Prerequisites and Limitations
-
The support to detect and block harmful images in content filters is currently in preview and not recommended for production workloads.
-
This capability is supported for only images and not supported for images with embedded video content.
-
This capability is only supported for Hate, Insults, Sexual, and Violence categories within content filters, and not for any other categories including misconduct and prompt attacks.
-
Users can upload images with sizes up to a maximum of 4 MB, with a maximum of 20 images for a single request.
-
Only PNG and JPEG formats are supported for image content.
Overview
The detection and blocking of harmful images is supported for Hate, Insults, Sexual, and Violence categories within content filters, and for images without any text in them. In addition to text, users can select the image modality for the above categories within content filters while creating a guardrail, and set the respective filtering strength to NONE, LOW, MEDIUM, or HIGH. These thresholds will be common to both text and image content for these categories, if both text and image are selected. Guardrails will evaluate images sent as an input by users, or generated as output from the model responses.
The four supported categories for detection of harmful image content are described below:
-
Hate – Describes contents that discriminate, criticize, insult, denounce, or dehumanize a person or group on the basis of an identity (such as race, ethnicity, gender, religion, sexual orientation, ability, and national origin). It also includes graphic and real-life visual content displaying symbols of hate groups, hateful symbols, and imagery associated with various organizations promoting discrimination, racism, and intolerance.
-
Insults – Describes content that includes demeaning, humiliating, mocking, insulting, or belittling language. This type of language is also labeled as bullying. It also encompasses various forms of rude, disrespectful or offensive hand gestures intended to express contempt, anger, or disapproval.
-
Sexual – Describes content that indicates sexual interest, activity, or arousal using direct or indirect references to body parts, physical traits, or sex. It also includes images displaying private parts and sexual activity involving intercourse. This category also encompasses cartoons, animé, drawings, sketches, and other illustrated content with sexual themes.
-
Violence – Describes content that includes glorification of or threats to inflict physical pain, hurt, or injury toward a person, group, or thing.
Amazon Bedrock Guardrails image content filter is supported in the following regions:
Region |
---|
US East (N. Virginia) |
US East (Ohio) |
US West (Oregon) |
AWS GovCloud (US-West) |
Europe (Ireland) (gated access) |
Europe (London) |
Europe (Frankfurt) |
Asia Pacific (Mumbai) |
Asia Pacific (Seoul) |
Asia Pacific (Tokyo) |
Asia Pacific (Singapore) (gated access) |
Asia Pacific (Sydney) |
Asia Pacific (Mumbai) |
You can use Amazon Bedrock Guardrails image content filter with the following models:
Model name | Model ID |
---|---|
Titan Image Generator G1 | amazon.titan-image-generator-v1 |
Titan Image Generator G1 v2 | amazon.titan-image-generator-v2:0 |
SD3 Large 1.0 | stability.sd3-large-v1:0 |
SDXL 1.0 | stability.stable-diffusion-xl-v1 |
Stable Image Core 1.0 | stability.stable-image-core-v1 |
Stable Image Ultra 1.0 | stability.stable-image-ultra-v1 |
Anthropic Claude 3 Haiku | anthropic.claude-3-haiku-20240307-v1 |
Anthropic Claude 3 Opus | anthropic.claude-3-opus-20240229-v1 |
Anthropic Claude 3 Sonnet | anthropic.claude-3-sonnet-20240229-v1 |
Anthropic Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0 |
Llama 3.2 11B Instruct | meta.llama3-2-11b-instruct-v1:0 |
Llama 3.2 90B Instruct | meta.llama3-2-90b-instruct-v1:0 |
Topics
Using the image content filter
Creating or updating a Guardrail with content filters for images
While creating a new guardrail or updating an existing guardrail, users will now see an option to select image (in preview) in addition to the existing text option. The image option is available for Hate, Insults, Sexual, or Violence categories. (Note: By default, the text option is enabled, and the image option needs to be explicitly enabled. Users can choose both text and image or either one of them depending on the use case.
Filter classification and blocking levels
Filtering is done based on the confidence classification of user inputs and FM responses. All user inputs and model responses are classified across four strength levels - None, Low, Medium, and High. The filter strength determines the sensitivity of filtering harmful content. As the filter strength is increased, the likelihood of filtering harmful content increases and the probability of seeing harmful content in your application decreases. When both image and text options are selected, the same filter strength is applied to both modalities for a particular category.
To configure image and text filters for harmful categories, select Configure harmful categories filter.
Note
Image content filters are in preview and will not be available if the model does not use images for model prompts or responses.
-
Select Text and/or Image to filter text or image content from prompts or responses to and from the model.
-
Select None, Low, Medium, or High for the level of filtration you want to apply to each category. A setting of High helps to block the most text or images that apply to that category of the filter.
-
Select Use the same harmful categories filters for responses to use the same filter settings you used for prompts. You can also choose to have different filter levels for prompts or responses by not selecting this option. Select Reset threshold to reset all the filter levels for prompts or responses.
-
Select Review and create or Next to create the guardrail.
Configuring content filters for images with API
You can use the guardrail API to configure the image content filter in Amazon Bedrock Guardrails. The example below shows an Amazon Bedrock Guardrails filter with different harmful content categories and filter strengths applied. You can use this template as an example for your own use case.
With the contentPolicyConfig
operation, filtersConfig
is a
object, as shown in the following
example.
Example Python Boto3 code for creating a Guardrail with Image Content Filters
import boto3
import botocore
import json
def main():
bedrock = boto3.client('bedrock', region_name='us-east-1')
try:
create_guardrail_response = bedrock.create_guardrail(
name='my-image-guardrail
',
contentPolicyConfig={
'filtersConfig': [
{
'type': 'SEXUAL',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH',
'inputModalities': ['TEXT', 'IMAGE'],
'outputModalities': ['TEXT', 'IMAGE']
},
{
'type': 'VIOLENCE',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH',
'inputModalities': ['TEXT', 'IMAGE'],
'outputModalities': ['TEXT', 'IMAGE']
},
{
'type': 'HATE',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH',
'inputModalities': ['TEXT', 'IMAGE'],
'outputModalities': ['TEXT', 'IMAGE']
},
{
'type': 'INSULTS',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH',
'inputModalities': ['TEXT', 'IMAGE'],
'outputModalities': ['TEXT', 'IMAGE']
},
{
'type': 'MISCONDUCT',
'inputStrength': 'HIGH',
'outputStrength': 'HIGH',
'inputModalities': ['TEXT'],
'outputModalities': ['TEXT']
},
{
'type': 'PROMPT_ATTACK',
'inputStrength': 'HIGH',
'outputStrength': 'NONE',
'inputModalities': ['TEXT'],
'outputModalities': ['TEXT']
}
]
},
blockedInputMessaging='Sorry, the model cannot answer this question.',
blockedOutputsMessaging='Sorry, the model cannot answer this question.',
)
create_guardrail_response['createdAt'] = create_guardrail_response['createdAt'].strftime('%Y-%m-%d %H:%M:%S')
print("Successfully created guardrail with details:")
print(json.dumps(create_guardrail_response, indent=2))
except botocore.exceptions.ClientError as err:
print("Failed while calling CreateGuardrail API with RequestId = " + err.response['ResponseMetadata']['RequestId'])
raise err
if __name__ == "__main__":
main()
Configuring the image filter to work with ApplyGuardrail API
You can use content filters for both image and text content using the ApplyGuardrail API. This option allows you to use the content filter settings without invoking the Amazon Bedrock model. You can update the request payload in the below script for various models by following the inference parameters documentation for each bedrock foundation model that is supported by Amazon Bedrock Guardrails.
You can update the request payload in below script for various models by following the inference parameters documentation for each bedrock foundation model that is supported by Amazon Bedrock Guardrails.
import boto3
import botocore
import json
guardrail_id = 'guardrail-id'
guardrail_version = 'DRAFT'
content_source = 'INPUT'
image_path = '/path/to/image.jpg'
with open(image_path, 'rb') as image:
image_bytes = image.read()
content = [
{
"text": {
"text": "Hi, can you explain this image art to me."
}
},
{
"image": {
"format": "jpeg",
"source": {
"bytes": image_bytes
}
}
}
]
def main():
bedrock_runtime_client = boto3.client("bedrock-runtime", region_name="us-east-1")
try:
print("Making a call to ApplyGuardrail API now")
response = bedrock_runtime_client.apply_guardrail(
guardrailIdentifier=guardrail_id,
guardrailVersion=guardrail_version,
source=content_source,
content=content
)
print("Received response from ApplyGuardrail API:")
print(json.dumps(response, indent=2))
except botocore.exceptions.ClientError as err:
print("Failed while calling ApplyGuardrail API with RequestId = " + err.response['ResponseMetadata']['RequestId'])
raise err
if __name__ == "__main__":
main()