Create a knowledge base - Amazon Bedrock

Create a knowledge base

Note

You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.

After you set up your data source in Amazon S3 and a vector store of your choice, you can create a knowledge base. Select the tab corresponding to your method of choice and follow the steps.

Console
To create a knowledge base
  1. Sign in to the AWS Management Console, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/.

  2. From the left navigation pane, select Knowledge base.

  3. In the Knowledge bases section, select Create knowledge base.

  4. On the Provide knowledge base details page, set up the following configurations:

    1. (Optional) In the Knowledge base details section, change the default name and provide a description for your knowledge base.

    2. In the IAM permissions section, choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.

    3. (Optional) Add tags to your knowledge base. For more information, see Tag resources.

    4. Select Next.

  5. On the Set up data source page, provide the information for the data source to use for the knowledge base:

    1. (Optional) Change the default Data source name.

    2. Select Current account or Other account for Data source location

    3. Provide the S3 URI of the object containing the files for the data source that you prepared. If you selection Other account you may need to update the other account's Amazon S3 bucket policy, AWS KMS key policy, and the current account's Knowledge Base role.

      Note

      Choose an Amazon S3 bucket in the same region as the knowledge base that you're creating. Otherwise, your data source will fail to sync.

    4. If you encrypted your Amazon S3 data with a customer managed key, select Add customer-managed AWS KMS key for Amazon S3 data and choose a KMS key to allow Amazon Bedrock to decrypt it. For more information, see Encryption of information passed to Amazon OpenSearch Service.

    5. (Optional) To configure the following advanced settings, expand the Advanced settings - optional section.

      1. While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, expand Advanced settings, select Customize encryption settings (advanced), and choose a key. For more information, see Encryption of transient data storage during data ingestion.

      2. Choose from the following options for the Chunking strategy for your data source:

        • Default chunking – By default, Amazon Bedrock automatically splits your source data into chunks, such that each chunk contains, at most, 300 tokens. If a document contains less than 300 tokens, then it is not split any further.

        • Fixed size chunking – Amazon Bedrock splits your source data into chunks of the approximate size that you set. Configure the following options.

          • Max tokens – Amazon Bedrock creates chunks that don't exceed the number of tokens that you choose.

          • Overlap percentage between chunks – Each chunk overlaps with consecutive chunks by the percentage that you choose.

        • No chunking – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.

        Note

        You can't change the chunking strategy after you have created the data source.

      3. Choose from the following options for the data deletion policy for your data source:

        • Delete: Deletes all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted.

        • Retain: Retains all underlying data in your vector store upon deletion of a knowledge base or data source resource.

    6. Select Next.

  6. In the Embeddings model section, choose a supported embeddings model to convert your data into vector embeddings for the knowledge base.

  7. In the Vector database section, choose one of the following options to store the vector embeddings for your knowledge base:

    • Quick create a new vector store – Amazon Bedrock creates an Amazon OpenSearch Serverless vector search collection for you. With this option, a public vector search collection and vector index is set up for you with the required fields and necessary configurations. After the collection is created, you can manage it in the Amazon OpenSearch Serverless console or through the AWS API. For more information, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide. If you select this option, you can optionally enable the following settings:

      1. To enable redundant active replicas, such that the availability of your vector store isn't compromised in case of infrastructure failure, select Enable redundancy (active replicas).

        Note

        We recommend that you leave this option disabled while you test your knowledge base. When you're ready to deploy to production, we recommend that you enable redundant active replicas. For information about pricing, see Pricing for OpenSearch Serverless

      2. To encrypt the automated vector store with a customer managed key select Add customer-managed KMS key for Amazon OpenSearch Serverless vector – optional and choose the key. For more information, see Encryption of information passed to Amazon OpenSearch Service.

    • Select a vector store you have created – Select the service that contains a vector database that you have already created. Fill in the fields to allow Amazon Bedrock to map information from the knowledge base to your database, so that it can store, update, and manage embeddings. For more information about how these fields map to the fields that you created, see Set up a vector index for your knowledge base in a supported vector store.

      Note

      If you use a database in Amazon OpenSearch Serverless, Amazon Aurora, or MongoDB Atlas, you need to have configured the fields under Field mapping beforehand. If you use a database in Pinecone or Redis Enterprise Cloud, you can provide names for these fields here and Amazon Bedrock will dynamically create them in the vector store for you.

  8. Select Next.

  9. On the Review and create page, check the configuration and details of your knowledge base. Choose Edit in any section that you need to modify. When you are satisfied, select Create knowledge base.

  10. The time it takes to create the knowledge base depends on the amount of data you provided. When the knowledge base is finished being created, the Status of the knowledge base changes to Ready.

API

To create a knowledge base, send a CreateKnowledgeBase request with a Agents for Amazon Bedrock build-time endpoint and provide the name, description, instructions for what it should do, and the foundation model for it to orchestrate with.

Note

If you prefer to let Amazon Bedrock create and manage a vector store for you in Amazon OpenSearch Service, use the console. For more information, see Create a knowledge base.

  • Provide the ARN with permissions to create a knowledge base in the roleArn field.

  • Provide the embedding model to use in the embeddingModelArn field in the knowledgeBaseConfiguration object.

  • Provide the configuration for your vector store in the storageConfiguration object. For more information, see Set up a vector index for your knowledge base in a supported vector store

    • For an Amazon OpenSearch Service database, use the opensearchServerlessConfiguration object.

    • For a Pinecone database, use the pineconeConfiguration object.

    • For a Redis Enterprise Cloud database, use the redisEnterpriseCloudConfiguration object.

    • For an Amazon Aurora database, use the rdsConfiguration object.

    • For an MongoDB Atlas database, use the mongodbConfiguration object.

After you create a knowledge base, create a data source from the S3 bucket containing the files for your knowledge base. To create the data source send a CreateDataSource request.

  • Provide the information for the S3 bucket containing the data source files in the dataSourceConfiguration field.

  • Specify how to chunk the data sources in the vectorIngestionConfiguration field. For more information, see Set up a data source for your knowledge base.

    Note

    You can't change the chunking configuration after you create the data source.

  • Provide the dataDeletionPolicy for your data source. You can DELETE all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted. You can RETAIN all underlying data in your vector store upon deletion of a knowledge base or data source resource.

  • (Optional) While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages, by default. To use your own KMS key, include it in the serverSideEncryptionConfiguration object. For more information, see Encryption of knowledge base resources.

Set up security configurations for your knowledge base

After you've created a knowledge base, you might have to set up the following security configurations:

Set up data access policies for your knowledge base

If you're using a custom role, set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.

Amazon OpenSearch Serverless

To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:

Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:

[ { "Description": "${data access policy description}", "Rules": [ { "Resource": [ "index/${collection_name}/*" ], "Permission": [ "aoss:DescribeIndex", "aoss:ReadDocument", "aoss:WriteDocument" ], "ResourceType": "index" } ], "Principal": [ "arn:aws:iam::${account-id}:role/${kb-service-role}" ] } ]
Pinecone, Redis Enterprise Cloud or MongoDB Atlas

To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.

{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "bedrock:AssociateThirdPartyKnowledgeBase" ], "Resource": "*", "Condition": { "StringEquals": { "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:iam::${region}:${account-id}:secret:${secret-id}" } } }] }

Set up network access policies for your Amazon OpenSearch Serverless knowledge base

If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you set up your Amazon OpenSearch Serverless vector collection or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.

The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:

To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Select the tab corresponding to your method of choice and follow the steps.

Console
  1. Open the Amazon OpenSearch Service console at https://console.aws.amazon.com/aos/.

  2. From the left navigation pane, select Collections. Then choose your collection.

  3. In the Network section, select the Associated Policy.

  4. Choose Edit.

  5. For Select policy definition method, do one of the following:

    • Leave Select policy definition method as Visual editor and configure the following settings in the Rule 1 section:

      1. (Optional) In the Rule name field, enter a name for the network access rule.

      2. Under Access collections from, select Private (recommended).

      3. Select AWS service private access. In the text box, enter bedrock.amazonaws.com.

      4. Unselect Enable access to OpenSearch Dashboards.

    • Choose JSON and paste the following policy in the JSON editor.

      [ { "AllowFromPublic": false, "Description":"${network access policy description}", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]
  6. Choose Update.

API

To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:

  1. Send a GetSecurityPolicy request with an OpenSearch Serverless endpoint. Specify the name of the policy and specify the type as network. Note the policyVersion in the response.

  2. Send a UpdateSecurityPolicy request with an OpenSearch Serverless endpoint. Minimally, specify the following fields:

    Field Description
    name The name of the policy
    policyVersion The policyVersion returned to you from the GetSecurityPolicy response.
    type The type of security policy. Specify network.
    policy The policy to use. Specify the following JSON object
    [ { "AllowFromPublic": false, "Description":"${network access policy description}", "Rules":[ { "ResourceType": "collection", "Resource":[ "collection/${collection-id}" ] }, ], "SourceServices":[ "bedrock.amazonaws.com" ] } ]

For an AWS CLI example, see Creating data access policies (AWS CLI).

  • Use the Amazon OpenSearch Service console by following the steps at Creating network policies (console). Instead of creating a network policy, note the Associated policy in the Network subsection of the collection details.