Create a knowledge base
Note
You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.
After you set up your data source in Amazon S3 and a vector store of your choice, you can create a knowledge base. Select the tab corresponding to your method of choice and follow the steps.
- Console
-
To create a knowledge base
-
Sign in to the AWS Management Console, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/
. -
From the left navigation pane, select Knowledge base.
-
In the Knowledge bases section, select Create knowledge base.
-
On the Provide knowledge base details page, set up the following configurations:
-
(Optional) In the Knowledge base details section, change the default name and provide a description for your knowledge base.
-
In the IAM permissions section, choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.
-
(Optional) Add tags to your knowledge base. For more information, see Tag resources.
-
Select Next.
-
-
On the Set up data source page, provide the information for the data source to use for the knowledge base:
-
(Optional) Change the default Data source name.
-
Select Current account or Other account for Data source location
-
Provide the S3 URI of the object containing the files for the data source that you prepared. If you selection Other account you may need to update the other account's Amazon S3 bucket policy, AWS KMS key policy, and the current account's Knowledge Base role.
Note
Choose an Amazon S3 bucket in the same region as the knowledge base that you're creating. Otherwise, your data source will fail to sync.
-
If you encrypted your Amazon S3 data with a customer managed key, select Add customer-managed AWS KMS key for Amazon S3 data and choose a KMS key to allow Amazon Bedrock to decrypt it. For more information, see Encryption of information passed to Amazon OpenSearch Service.
-
(Optional) To configure the following advanced settings, expand the Advanced settings - optional section.
-
While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, expand Advanced settings, select Customize encryption settings (advanced), and choose a key. For more information, see Encryption of transient data storage during data ingestion.
-
Choose from the following options for the Chunking strategy for your data source:
-
Default chunking – By default, Amazon Bedrock automatically splits your source data into chunks, such that each chunk contains, at most, 300 tokens. If a document contains less than 300 tokens, then it is not split any further.
-
Fixed size chunking – Amazon Bedrock splits your source data into chunks of the approximate size that you set. Configure the following options.
-
Max tokens – Amazon Bedrock creates chunks that don't exceed the number of tokens that you choose.
-
Overlap percentage between chunks – Each chunk overlaps with consecutive chunks by the percentage that you choose.
-
-
No chunking – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
Note
You can't change the chunking strategy after you have created the data source.
-
-
Choose from the following options for the data deletion policy for your data source:
-
Delete: Deletes all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted.
-
Retain: Retains all underlying data in your vector store upon deletion of a knowledge base or data source resource.
-
-
-
Select Next.
-
-
In the Embeddings model section, choose a supported embeddings model to convert your data into vector embeddings for the knowledge base.
-
In the Vector database section, choose one of the following options to store the vector embeddings for your knowledge base:
-
Quick create a new vector store – Amazon Bedrock creates an Amazon OpenSearch Serverless vector search collection for you. With this option, a public vector search collection and vector index is set up for you with the required fields and necessary configurations. After the collection is created, you can manage it in the Amazon OpenSearch Serverless console or through the AWS API. For more information, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide. If you select this option, you can optionally enable the following settings:
-
To enable redundant active replicas, such that the availability of your vector store isn't compromised in case of infrastructure failure, select Enable redundancy (active replicas).
Note
We recommend that you leave this option disabled while you test your knowledge base. When you're ready to deploy to production, we recommend that you enable redundant active replicas. For information about pricing, see Pricing for OpenSearch Serverless
-
To encrypt the automated vector store with a customer managed key select Add customer-managed KMS key for Amazon OpenSearch Serverless vector – optional and choose the key. For more information, see Encryption of information passed to Amazon OpenSearch Service.
-
-
Select a vector store you have created – Select the service that contains a vector database that you have already created. Fill in the fields to allow Amazon Bedrock to map information from the knowledge base to your database, so that it can store, update, and manage embeddings. For more information about how these fields map to the fields that you created, see Set up a vector index for your knowledge base in a supported vector store.
Note
If you use a database in Amazon OpenSearch Serverless, Amazon Aurora, or MongoDB Atlas, you need to have configured the fields under Field mapping beforehand. If you use a database in Pinecone or Redis Enterprise Cloud, you can provide names for these fields here and Amazon Bedrock will dynamically create them in the vector store for you.
-
-
Select Next.
-
On the Review and create page, check the configuration and details of your knowledge base. Choose Edit in any section that you need to modify. When you are satisfied, select Create knowledge base.
-
The time it takes to create the knowledge base depends on the amount of data you provided. When the knowledge base is finished being created, the Status of the knowledge base changes to Ready.
-
- API
-
To create a knowledge base, send a CreateKnowledgeBase request with a Agents for Amazon Bedrock build-time endpoint and provide the name, description, instructions for what it should do, and the foundation model for it to orchestrate with.
Note
If you prefer to let Amazon Bedrock create and manage a vector store for you in Amazon OpenSearch Service, use the console. For more information, see Create a knowledge base.
-
Provide the ARN with permissions to create a knowledge base in the
roleArn
field. -
Provide the embedding model to use in the
embeddingModelArn
field in theknowledgeBaseConfiguration
object. -
Provide the configuration for your vector store in the
storageConfiguration
object. For more information, see Set up a vector index for your knowledge base in a supported vector store-
For an Amazon OpenSearch Service database, use the
opensearchServerlessConfiguration
object. -
For a Pinecone database, use the
pineconeConfiguration
object. -
For a Redis Enterprise Cloud database, use the
redisEnterpriseCloudConfiguration
object. -
For an Amazon Aurora database, use the
rdsConfiguration
object. -
For an MongoDB Atlas database, use the
mongodbConfiguration
object.
-
After you create a knowledge base, create a data source from the S3 bucket containing the files for your knowledge base. To create the data source send a CreateDataSource request.
-
Provide the information for the S3 bucket containing the data source files in the
dataSourceConfiguration
field. -
Specify how to chunk the data sources in the
vectorIngestionConfiguration
field. For more information, see Set up a data source for your knowledge base.Note
You can't change the chunking configuration after you create the data source.
-
Provide the
dataDeletionPolicy
for your data source. You canDELETE
all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted. You canRETAIN
all underlying data in your vector store upon deletion of a knowledge base or data source resource. -
(Optional) While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, include it in the
serverSideEncryptionConfiguration
object. For more information, see Encryption of knowledge base resources.
-
Set up security configurations for your knowledge base
After you've created a knowledge base, you might have to set up the following security configurations:
Topics
Set up data access policies for your knowledge base
If you're using a custom role, set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.
- Amazon OpenSearch Serverless
-
To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:
-
Use the Amazon OpenSearch Service console by following the steps at Creating data access policies (console) in the Amazon OpenSearch Service Developer Guide.
-
Use the AWS API by sending a CreateAccessPolicy request with an OpenSearch Serverless endpoint. For an AWS CLI example, see Creating data access policies (AWS CLI).
Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:
[ { "Description": "
${data access policy description}
", "Rules": [ { "Resource": [ "index/${collection_name}
/*" ], "Permission": [ "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument" ], "ResourceType": "index" } ], "Principal": [ "arn:aws:iam::${account-id}
:role/${kb-service-role}
" ] } ] -
- Pinecone, Redis Enterprise Cloud or MongoDB Atlas
-
To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "bedrock:AssociateThirdPartyKnowledgeBase" ], "Resource": "*", "Condition": { "StringEquals": { "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:iam::
${region}
:${account-id}
:secret:${secret-id}
" } } }] }
Set up network access policies for your Amazon OpenSearch Serverless knowledge base
If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you set up your Amazon OpenSearch Serverless vector collection or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.
The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:
-
For more information about setting up a VPC endpoint for a private Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink).
-
For more information about network access policies in Amazon OpenSearch Serverless, see Network access for Amazon OpenSearch Serverless.
To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Select the tab corresponding to your method of choice and follow the steps.
- Console
-
-
Open the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/
. -
From the left navigation pane, select Collections. Then choose your collection.
-
In the Network section, select the Associated Policy.
-
Choose Edit.
-
For Select policy definition method, do one of the following:
-
Leave Select policy definition method as Visual editor and configure the following settings in the Rule 1 section:
-
(Optional) In the Rule name field, enter a name for the network access rule.
-
Under Access collections from, select Private (recommended).
-
Select AWS service private access. In the text box, enter
bedrock.amazonaws.com
. -
Unselect Enable access to OpenSearch Dashboards.
-
-
Choose JSON and paste the following policy in the JSON editor.
[ { "AllowFromPublic": false, "Description":"
${network access policy description}
", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}
" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
-
-
Choose Update.
-
- API
-
To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:
-
Send a GetSecurityPolicy request with an OpenSearch Serverless endpoint. Specify the
name
of the policy and specify thetype
asnetwork
. Note thepolicyVersion
in the response. -
Send a UpdateSecurityPolicy request with an OpenSearch Serverless endpoint. Minimally, specify the following fields:
Field Description name The name of the policy policyVersion The policyVersion
returned to you from theGetSecurityPolicy
response.type The type of security policy. Specify network
.policy The policy to use. Specify the following JSON object [ { "AllowFromPublic": false, "Description":"
${network access policy description}
", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}
" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
For an AWS CLI example, see Creating data access policies (AWS CLI).
-
-
Use the Amazon OpenSearch Service console by following the steps at Creating network policies (console). Instead of creating a network policy, note the Associated policy in the Network subsection of the collection details.