Create your guardrail
Amazon Bedrock Guardrails consists of a collection of different filtering policies that you can configure to help avoid undesirable and harmful content and remove or mask sensitive information for privacy protection.
You can configure the following policies in a guardrail:
Content filters — You can configure thresholds to help block input prompts or model responses in natural language for text and separately for images containing harmful content such as: hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attacks (prompt injection and jailbreaks). For example, an e-commerce site can design its online assistant to avoid using inappropriate language and/or images such as hate or violence.
-
Prompt attacks — Can help you detect and filter prompt attacks and prompt injections. Helps detect prompts that are intended to bypass moderation, override instructions, or generate harmful content.
Denied topics — You can define a set of topics to avoid within your generative AI application. For example, a banking assistant application can be designed to help avoid topics related to illegal investment advice.
Word filters — You can configure a set of custom words or phrases (exact match) that you want to detect and block in the interaction between your users and generative AI applications. For example, you can detect and block profanity as well as specific custom words such as competitor names, or other offensive words.
Sensitive information filters — Can help you detect sensitive content such as Personally Identifiable Information (PII) in standard formats or custom regex entities in user inputs and FM responses. Based on the use case, you can reject inputs containing sensitive information or redact them in FM responses. For example, you can redact users’ personal information while generating summaries from customer and agent conversation transcripts.
Contextual grounding check — Can help you detect and filter hallucinations in model responses if they are not grounded (factually inaccurate or add new information) in the source information or are irrelevant to the user’s query. For example, you can block or flag responses in RAG applications (retrieval-augmented generation), if the model responses deviate from the information in the retrieved passages or doesn’t answer the question by the user.
Note
All blocked content from the above policies will appear as plain text in Amazon Bedrock Model Invocation Logs, if you have enabled them. You can disable Amazon Bedrock Invocation Logs if you do not want your blocked content to appear as plain text in the logs.
A guardrail must contain at least one filter and messaging for when prompts and user responses are blocked. You can opt to use the default messaging. You can add filters and iterate on your guardrail later by following the steps at Modify your guardrail.