Create an Amazon Bedrock knowledge base
You can create an Amazon Bedrock knowledge base to retrieve information from your proprietary data and generate responses to answer natural language questions. As part of creating a knowledge base, you configure a data source and a vector store of your choice.
Note
You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.
Select the tab corresponding to your method of choice and follow the steps.
- Console
-
To create a knowledge base
-
Sign in to the AWS Management Console using an IAM role with Amazon Bedrock permissions, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/
. -
From the left navigation pane, select Knowledge bases.
-
In the Knowledge bases section, select Create knowledge base.
-
On the Provide knowledge base details page, set up the following configurations:
-
(Optional) In the Knowledge base details section, change the default name and provide a description for your knowledge base.
-
In the IAM permissions section, choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.
-
(Optional) Add tags to your knowledge base. For more information, see Manage resources using tags.
-
Select Next.
-
-
On the Choose data source page, select your data source to use for the knowledge base:
-
Follow the connection configuration steps for your selected data source. See Supported data sources to select your data source and follow the console connection configuration steps.
-
(Optional) To configure the following advanced settings as part the data source configuration, expand the Advanced settings - optional section.
For KMS key settings, you can choose either a custom key or use the default provided data encryption key.
While converting your data into embeddings, Amazon Bedrock encrypts your transient data with a key that AWS owns and manages, by default. You can use your own KMS key. For more information, see Encryption of transient data storage during data ingestion.
For data deletion policy settings, you can choose either:
-
Delete: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an AWS account is deleted.
-
Retain: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.
-
-
To configure the following content chunking and parsing settings as part the data source configuration, go to the Content chunking and parsing section.
Choose one of the follow chunking options:
-
Fixed-size chunking: Content split into chunks of text of your set approximate token size. You can set the maximum number of tokens that must not exceed for a chunk and the overlap percentage between consecutive chunks.
-
Default chunking: Content split into chunks of text of up to 300 tokens. If a single document or piece of content contains less than 300 tokens, the document is not further split.
-
Hierarchical chunking: Content organized into nested structures of parent-child chunks. You set the maximum parent chunk token size and the maximum child chunk token size. You also set the absolute number of overlap tokens between consecutive parent chunks and consecutive child chunks.
-
Semantic chunking: Content organized into semantically similar text chunks or groups of sentences. You set the maximum number of sentences surrounding the target/current sentence to group together (buffer size). You also set the breakpoint percentile threshold for dividing the text into meaningful chunks. Semantic chunking uses a foundation model. View Amazon Bedrock pricing
for information on the cost of foundation models. -
No chunking: Each document is treated as a single text chunk. You might want to pre-process your documents by splitting them into separate files.
Note
You can’t change the chunking strategy after you have created the data source.
You can choose to use Amazon Bedrock’s foundation model for parsing documents to parse more than standard text. You can parse tabular data within documents with their structure intact, for example. View Amazon Bedrock pricing
for information on the cost of foundation models. You can choose to use an AWS Lambda function to customize your chunking strategy and how your document metadata attributes/fields are treated and ingested. Provide the Amazon S3 bucket location for the Lambda function input and output.
-
-
Select Next.
-
-
On the Select embeddings model and configure vector store page, choose a supported embeddings model to convert your data into vector embeddings for the knowledge base.
-
In the Vector store section, choose one of the following options to store the vector embeddings for your knowledge base:
-
Quick create a new vector store – Amazon Bedrock creates an Amazon OpenSearch Serverless vector search collection for you. With this option, a public vector search collection and vector index is set up for you with the required fields and necessary configurations. After the collection is created, you can manage it in the Amazon OpenSearch Serverless console or through the AWS API. For more information, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide. If you select this option, you can optionally enable the following settings:
-
To enable redundant active replicas, such that the availability of your vector store isn't compromised in case of infrastructure failure, select Enable redundancy (active replicas).
Note
We recommend that you leave this option disabled while you test your knowledge base. When you're ready to deploy to production, we recommend that you enable redundant active replicas. For information about pricing, see Pricing for OpenSearch Serverless
-
To encrypt the automated vector store with a customer managed key select Add customer-managed KMS key for Amazon OpenSearch Serverless vector – optional and choose the key. For more information, see Encryption of information passed to Amazon OpenSearch Service.
-
-
Select a vector store you have created – Select the service for the vector store that you have already created. Fill in the fields to allow Amazon Bedrock to map information from the knowledge base to your vector store, so that it can store, update, and manage vector embeddings. For more information about the fields, see Set up your own supported vector store.
Note
If you use a database in Amazon OpenSearch Serverless, Amazon Aurora, or MongoDB Atlas, you need to have configured the fields under Field mapping beforehand. If you use a database in Pinecone or Redis Enterprise Cloud, you can provide names for these fields here and Amazon Bedrock will dynamically create them in the vector store for you.
-
-
Select Next.
-
On the Review and create page, check the configuration and details of your knowledge base. Choose Edit in any section that you need to modify. When you are satisfied, select Create knowledge base.
-
The time it takes to create the knowledge base depends on your specific configurations. When the knowledge base creation has completed, the status of the knowledge base changes to either state it is ready or available.
-
- API
-
To create a knowledge base, send a CreateKnowledgeBase request with a Agents for Amazon Bedrock build-time endpoint and provide the name, description, instructions for what it should do, and the foundation model for it to orchestrate with.
Note
If you prefer to let Amazon Bedrock create and manage a vector store for you in Amazon OpenSearch Service, use the console. For more information, see Create an Amazon Bedrock knowledge base.
-
Provide the ARN with permissions to create a knowledge base in the
roleArn
field. -
Provide the vector embeddings model to use in the
embeddingModelArn
field in theknowledgeBaseConfiguration
object. See supported models for knowledge bases.You must enable model access to use a model that's supported for knowledge bases. Take note of your model Amazon Resource Name (ARN) that's required for converting your data into vector embeddings. Copy the model ID for your chosen model for knowledge bases and construct the model ARN using the model (resource) ID, following the provided ARN examples for your model resource type.
-
Provide the configuration for your vector store in the
storageConfiguration
object. For more information, see Prerequisites for your own vector store for a knowledge base-
For an Amazon OpenSearch Service database, use the
opensearchServerlessConfiguration
object. -
For a Pinecone database, use the
pineconeConfiguration
object. -
For a Redis Enterprise Cloud database, use the
redisEnterpriseCloudConfiguration
object. -
For an Amazon Aurora database, use the
rdsConfiguration
object. -
For an MongoDB Atlas database, use the
mongodbConfiguration
object.
-
After you create a knowledge base, create a data source containing the documents or content for your knowledge base. To create the data source send a CreateDataSource request. See Supported data sources to select your data source and follow the API connection configuration example.
-
Provide the connection information for the data source files in the
dataSourceConfiguration
field. -
Specify how to chunk the data sources in the
vectorIngestionConfiguration
field.Note
You can't change the chunking configuration after you create the data source.
-
Provide the
dataDeletionPolicy
for your data source. You canDELETE
all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. This flag is ignored if an AWS account is deleted. You canRETAIN
all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource. -
(Optional) While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, include it in the
serverSideEncryptionConfiguration
object. For more information, see Encryption of knowledge base resources.
-
Set up security configurations for your knowledge base
After you've created a knowledge base, you might have to set up the following security configurations:
Topics
Set up data access policies for your knowledge base
If you're using a custom role, set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.
- Amazon OpenSearch Serverless
-
To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:
-
Use the Amazon OpenSearch Service console by following the steps at Creating data access policies (console) in the Amazon OpenSearch Service Developer Guide.
-
Use the AWS API by sending a CreateAccessPolicy request with an OpenSearch Serverless endpoint. For an AWS CLI example, see Creating data access policies (AWS CLI).
Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:
[ { "Description": "
${data access policy description}
", "Rules": [ { "Resource": [ "index/${collection_name}
/*" ], "Permission": [ "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument" ], "ResourceType": "index" } ], "Principal": [ "arn:aws:iam::${account-id}
:role/${kb-service-role}
" ] } ] -
- Pinecone, Redis Enterprise Cloud or MongoDB Atlas
-
To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "bedrock:AssociateThirdPartyKnowledgeBase" ], "Resource": "*", "Condition": { "StringEquals": { "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:iam::
${region}
:${account-id}
:secret:${secret-id}
" } } }] }
Set up network access policies for your Amazon OpenSearch Serverless knowledge base
If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you set up your Amazon OpenSearch Serverless vector collection or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.
The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:
-
For more information about setting up a VPC endpoint for a private Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink).
-
For more information about network access policies in Amazon OpenSearch Serverless, see Network access for Amazon OpenSearch Serverless.
To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Select the tab corresponding to your method of choice and follow the steps.
- Console
-
-
Open the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/
. -
From the left navigation pane, select Collections. Then choose your collection.
-
In the Network section, select the Associated Policy.
-
Choose Edit.
-
For Select policy definition method, do one of the following:
-
Leave Select policy definition method as Visual editor and configure the following settings in the Rule 1 section:
-
(Optional) In the Rule name field, enter a name for the network access rule.
-
Under Access collections from, select Private (recommended).
-
Select AWS service private access. In the text box, enter
bedrock.amazonaws.com
. -
Unselect Enable access to OpenSearch Dashboards.
-
-
Choose JSON and paste the following policy in the JSON editor.
[ { "AllowFromPublic": false, "Description":"
${network access policy description}
", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}
" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
-
-
Choose Update.
-
- API
-
To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:
-
Send a GetSecurityPolicy request with an OpenSearch Serverless endpoint. Specify the
name
of the policy and specify thetype
asnetwork
. Note thepolicyVersion
in the response. -
Send a UpdateSecurityPolicy request with an OpenSearch Serverless endpoint. Minimally, specify the following fields:
Field Description name The name of the policy policyVersion The policyVersion
returned to you from theGetSecurityPolicy
response.type The type of security policy. Specify network
.policy The policy to use. Specify the following JSON object [ { "AllowFromPublic": false, "Description":"
${network access policy description}
", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}
" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
For an AWS CLI example, see Creating data access policies (AWS CLI).
-
-
Use the Amazon OpenSearch Service console by following the steps at Creating network policies (console). Instead of creating a network policy, note the Associated policy in the Network subsection of the collection details.