Create your guardrail

Amazon Bedrock Guardrails consists of a collection of different filtering policies that you can configure to help avoid undesirable and harmful content and remove or mask sensitive information for privacy protection.

You can configure the following policies in a guardrail:

Content filters — Detect and filter harmful text or image content in input prompts or model responses. Filtering is done based on detection of certain predefined harmful content categories: Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack. You also can adjust the filter strength for each of these categories. With Standard tier, content filters extend to the code-related content.
Prompt attacks — Can help you detect and filter prompt attacks and prompt injections. Helps detect prompts that are intended to bypass moderation, override instructions, or generate harmful content.
Denied topics — You can define a set of topics to avoid within your generative AI application. For example, a banking assistant application can be designed to help avoid topics related to illegal investment advice. With Standard tier, content filters extend to the code-related content.
Word filters — You can configure a set of custom words or phrases (exact match) that you want to detect and block in the interaction between your users and generative AI applications. For example, you can detect and block profanity as well as specific custom words such as competitor names, or other offensive words.
Sensitive information filters — Can help you detect sensitive content such as Personally Identifiable Information (PII) in standard formats or custom regex entities in user inputs and FM responses. Based on the use case, you can reject inputs containing sensitive information or redact them in FM responses. For example, you can redact users’ personal information while generating summaries from customer and agent conversation transcripts.
Contextual grounding checks — Can help you detect and filter hallucinations in model responses if they are not grounded (factually inaccurate or add new information) in the source information or are irrelevant to the user’s query. For example, you can block or flag responses in RAG applications (retrieval-augmented generation), if the model responses deviate from the information in the retrieved passages or doesn’t answer the question by the user.
Automated reasoning checks — Can help you validate that model responses adhere to logical rules and policies that you define. You can create policies using natural language that specify the reasoning requirements, and the guardrail will evaluate whether model outputs comply with these logical constraints. For example, you can ensure that a customer service chatbot only recommends products that are actually available in inventory, or verify that financial advice follows regulatory compliance rules.

Note

All blocked content from the above policies will appear as plain text in Amazon Bedrock Model Invocation Logs, if you have enabled them. You can disable Amazon Bedrock Invocation Logs if you do not want your blocked content to appear as plain text in the logs.

A guardrail must contain at least one filter and messaging for when prompts and user responses are blocked. You can opt to use the default messaging. You can add filters and iterate on your guardrail later by following the steps at Modify your guardrail.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Using resource-based policies for guardrails

Configure content filters