Create a knowledge base - Amazon Bedrock

Create a knowledge base

Note

You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.

As part of creating a knowledge base, you configure a data source and a vector store of your choice.

Select the tab corresponding to your method of choice and follow the steps.

Console
To create a knowledge base
  1. Sign in to the AWS Management Console using an IAM role with Amazon Bedrock permissions, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/.

  2. From the left navigation pane, select Knowledge bases.

  3. In the Knowledge bases section, select Create knowledge base.

  4. On the Provide knowledge base details page, set up the following configurations:

    1. (Optional) In the Knowledge base details section, change the default name and provide a description for your knowledge base.

    2. In the IAM permissions section, choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.

    3. (Optional) Add tags to your knowledge base. For more information, see Tag resources.

    4. Select Next.

  5. On the Choose data source page, select your data source to use for the knowledge base:

    1. Follow the connection configuration steps for your selected data source. See Supported data sources to select your data source and follow the console connection configuration steps.

    2. (Optional) To configure the following advanced settings as part the data source configuration, expand the Advanced settings - optional section.

      For KMS key settings, you can choose either a custom key or use the default provided data encryption key.

      While converting your data into embeddings, Amazon Bedrock encrypts your transient data with a key that AWS owns and manages, by default. You can use your own KMS key. For more information, see Encryption of transient data storage during data ingestion.

      For data deletion policy settings, you can choose either:

      • Delete: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an AWS account is deleted.

      • Retain: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.

    3. To configure the following content chunking and parsing settings as part the data source configuration, go to the Content chunking and parsing section.

      Choose one of the follow chunking options:

      • Fixed-size chunking: Content split into chunks of text of your set approximate token size. You can set the maximum number of tokens that must not exceed for a chunk and the overlap percentage between consecutive chunks.

      • Default chunking: Content split into chunks of text of up to 300 tokens. If a single document or piece of content contains less than 300 tokens, the document is not further split.

      • Hierarchical chunking: Content organized into nested structures of parent-child chunks. You set the maximum parent chunk token size and the maximum child chunk token size. You also set the absolute number of overlap tokens between consecutive parent chunks and consecutive child chunks.

      • Semantic chunking: Content organized into semantically similar text chunks or groups of sentences. You set the maximum number of sentences surrounding the target/current sentence to group together (buffer size). You also set the breakpoint percentile threshold for dividing the text into meaningful chunks. Semantic chunking uses a foundation model. View Amazon Bedrock pricing for information on the cost of foundation models.

      • No chunking: Each document is treated as a single text chunk. You might want to pre-process your documents by splitting them into separate files.

      Note

      You can’t change the chunking strategy after you have created the data source.

      You can choose to use Amazon Bedrock’s foundation model for parsing documents to parse more than standard text. You can parse tabular data within documents with their structure intact, for example. View Amazon Bedrock pricing for information on the cost of foundation models.

      You can choose to use an AWS Lambda function to customize your chunking strategy and how your document metadata attributes/fields are treated and ingested. Provide the Amazon S3 bucket location for the Lambda function input and output.

    4. Select Next.

  6. On the Select embeddings model and configure vector store page, choose a supported embeddings model to convert your data into vector embeddings for the knowledge base.

  7. In the Vector database section, choose one of the following options to store the vector embeddings for your knowledge base:

    • Quick create a new vector store – Amazon Bedrock creates an Amazon OpenSearch Serverless vector search collection for you. With this option, a public vector search collection and vector index is set up for you with the required fields and necessary configurations. After the collection is created, you can manage it in the Amazon OpenSearch Serverless console or through the AWS API. For more information, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide. If you select this option, you can optionally enable the following settings:

      1. To enable redundant active replicas, such that the availability of your vector store isn't compromised in case of infrastructure failure, select Enable redundancy (active replicas).

        Note

        We recommend that you leave this option disabled while you test your knowledge base. When you're ready to deploy to production, we recommend that you enable redundant active replicas. For information about pricing, see Pricing for OpenSearch Serverless

      2. To encrypt the automated vector store with a customer managed key select Add customer-managed KMS key for Amazon OpenSearch Serverless vector – optional and choose the key. For more information, see Encryption of information passed to Amazon OpenSearch Service.

    • Select a vector store you have created – Select the service that contains a vector database that you have already created. Fill in the fields to allow Amazon Bedrock to map information from the knowledge base to your database, so that it can store, update, and manage embeddings. For more information about how these fields map to the fields that you created, see Set up a vector index for your knowledge base in a supported vector store.

      Note

      If you use a database in Amazon OpenSearch Serverless, Amazon Aurora, or MongoDB Atlas, you need to have configured the fields under Field mapping beforehand. If you use a database in Pinecone or Redis Enterprise Cloud, you can provide names for these fields here and Amazon Bedrock will dynamically create them in the vector store for you.

  8. Select Next.

  9. On the Review and create page, check the configuration and details of your knowledge base. Choose Edit in any section that you need to modify. When you are satisfied, select Create knowledge base.

  10. The time it takes to create the knowledge base depends on the amount of data you provided. When the knowledge base is finished being created, the Status of the knowledge base changes to Ready.

API

To create a knowledge base, send a CreateKnowledgeBase request with a Agents for Amazon Bedrock build-time endpoint and provide the name, description, instructions for what it should do, and the foundation model for it to orchestrate with.

Note

If you prefer to let Amazon Bedrock create and manage a vector store for you in Amazon OpenSearch Service, use the console. For more information, see Create a knowledge base.

  • Provide the ARN with permissions to create a knowledge base in the roleArn field.

  • Provide the embedding model to use in the embeddingModelArn field in the knowledgeBaseConfiguration object.

  • Provide the configuration for your vector store in the storageConfiguration object. For more information, see Set up a vector index for your knowledge base in a supported vector store

    • For an Amazon OpenSearch Service database, use the opensearchServerlessConfiguration object.

    • For a Pinecone database, use the pineconeConfiguration object.

    • For a Redis Enterprise Cloud database, use the redisEnterpriseCloudConfiguration object.

    • For an Amazon Aurora database, use the rdsConfiguration object.

    • For an MongoDB Atlas database, use the mongodbConfiguration object.

After you create a knowledge base, create a data source containing the documents or content for your knowledge base. To create the data source send a CreateDataSource request. See Supported data sources to select your data source and follow the API connection configuration example.

  • Provide the connection information for the data source files in the dataSourceConfiguration field.

  • Specify how to chunk the data sources in the vectorIngestionConfiguration field.

    Note

    You can't change the chunking configuration after you create the data source.

  • Provide the dataDeletionPolicy for your data source. You can DELETE all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. This flag is ignored if an AWS account is deleted. You can RETAIN all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.

  • (Optional) While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, include it in the serverSideEncryptionConfiguration object. For more information, see Encryption of knowledge base resources.

Set up security configurations for your knowledge base

After you've created a knowledge base, you might have to set up the following security configurations:

Set up data access policies for your knowledge base

If you're using a custom role, set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.

Amazon OpenSearch Serverless

To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:

Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:

[ { "Description": "${data access policy description}", "Rules": [ { "Resource": [ "index/${collection_name}/*" ], "Permission": [ "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument" ], "ResourceType": "index" } ], "Principal": [ "arn:aws:iam::${account-id}:role/${kb-service-role}" ] } ]
Pinecone, Redis Enterprise Cloud or MongoDB Atlas

To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.

{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "bedrock:AssociateThirdPartyKnowledgeBase" ], "Resource": "*", "Condition": { "StringEquals": { "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:iam::${region}:${account-id}:secret:${secret-id}" } } }] }

Set up network access policies for your Amazon OpenSearch Serverless knowledge base

If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you set up your Amazon OpenSearch Serverless vector collection or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.

The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:

To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Select the tab corresponding to your method of choice and follow the steps.

Console
  1. Open the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/.

  2. From the left navigation pane, select Collections. Then choose your collection.

  3. In the Network section, select the Associated Policy.

  4. Choose Edit.

  5. For Select policy definition method, do one of the following:

    • Leave Select policy definition method as Visual editor and configure the following settings in the Rule 1 section:

      1. (Optional) In the Rule name field, enter a name for the network access rule.

      2. Under Access collections from, select Private (recommended).

      3. Select AWS service private access. In the text box, enter bedrock.amazonaws.com.

      4. Unselect Enable access to OpenSearch Dashboards.

    • Choose JSON and paste the following policy in the JSON editor.

      [ { "AllowFromPublic": false, "Description":"${network access policy description}", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
  6. Choose Update.

API

To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:

  1. Send a GetSecurityPolicy request with an OpenSearch Serverless endpoint. Specify the name of the policy and specify the type as network. Note the policyVersion in the response.

  2. Send a UpdateSecurityPolicy request with an OpenSearch Serverless endpoint. Minimally, specify the following fields:

    Field Description
    name The name of the policy
    policyVersion The policyVersion returned to you from the GetSecurityPolicy response.
    type The type of security policy. Specify network.
    policy The policy to use. Specify the following JSON object
    [ { "AllowFromPublic": false, "Description":"${network access policy description}", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]

For an AWS CLI example, see Creating data access policies (AWS CLI).

  • Use the Amazon OpenSearch Service console by following the steps at Creating network policies (console). Instead of creating a network policy, note the Associated policy in the Network subsection of the collection details.