Connect to Salesforce for your Amazon Bedrock knowledge base - Amazon Bedrock

Connect to Salesforce for your Amazon Bedrock knowledge base

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can connect to your Salesforce instance for your Amazon Bedrock knowledge base by using either the AWS Management Console for Amazon Bedrock or the CreateDataSource API (see Amazon Bedrock supported SDKs and AWS CLI).

Note

Salesforce data source connector is in preview release and is subject to change.

Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See Quotas for knowledge bases.

Supported features

  • Auto detection of main document fields

  • Inclusion/exclusion content filters

  • Incremental content syncs for added, updated, deleted content

  • OAuth 2.0 authentication

Prerequisites

In Salesforce, make sure you:

  • Take note of your Salesforce instance URL. For example, https://company.salesforce.com/. The instance must be running a Salesforce Connected App.

  • Create a Salesforce Connected App and configure client credentials. Then, for your selected app, copy the consumer key (client ID) and consumer secret (client secret) from the OAuth settings. For more information, see Salesforce documentation on Create a Connected App and Configure a Connected App for the OAuth 2.0 Client Credentials.

    Note

    For Salesforce Connected Apps, under Client Credentials Flow, make sure you search and select the user’s name or alias for your client credentials in the “Run As” field.

In your AWS account, make sure you:

  • Store your authentication credentials in an AWS Secrets Manager secret and note the Amazon Resource Name (ARN) of the secret. Follow the Connection configuration instructions on this page to include the key-values pairs that must be included in your secret.

  • Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see Permissions to access data sources.

Note

If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.

We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

Connection configuration

To connect to your Salesforce instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the Prerequisites.

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

Important

If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch_modification_time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch_modification_time" is greater than a certain number. For more information on the filtering operators you can apply to your metadata fields, see Metadata and filtering.

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out campaigns that contain "private" in the campaign name: ".*private.*"

You can apply inclusion/exclusion filters on the following content types:

  • Account: Account number/identifier

  • Attachment: Attachment file name with its extension

  • Campaign: Campaign name and associated identifiers

  • ContentVersion: Document version and associated identifiers

  • Partner: Partner information fields including associated identifiers

  • Pricebook2: Product/price list name

  • Case: Customer inquiry/issue number and other information fields including associated identifiers (please note: can contain personal information, which you can choose to exclude or filter out)

  • Contact: Customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)

  • Contract: Contract name and associated identifiers

  • Document: File name with its extension

  • Idea: Idea information fields and associated identifiers

  • Lead: Potential new customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)

  • Opportunity: Pending sale/deal information fields and associated identifiers

  • Product2: Product information fields and associated identifiers

  • Solution: Solution name for a customer inquiry/issue and associated identifiers

  • Task: Task information fields and associated identifiers

  • FeedItem: Identifier of the chatter feed post

  • FeedComment: Identifier of the chatter feed post that the comments belong to

  • Knowledge__kav: Knowledge article version and associated identifiers

  • User: User alias within your organization

  • CollaborationGroup: Chatter group name (unique)

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the StartIngestionJob API or select your knowledge base in the console and select Sync within the data source overview section.

Important

All data that you sync from your data source becomes available to anyone with bedrock:Retrieve permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see Knowledge base permissions.

(For OAuth 2.0 authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:

  • consumerKey: app client ID

  • consumerSecret: app client secret

  • authenticationUrl: Salesforce instance URL or the URL to request the authentication token from

Note

Your secret in AWS Secrets Manager must use the same region of your knowledge base.

Console

The following is an example of a configuration for connecting to Salesforce for your Amazon Bedrock knowledge base. You configure your data source as part of the knowledge base creation steps in the console.

  1. Sign in to the AWS Management Console using an IAM role with Amazon Bedrock permissions, and open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/.

  2. From the left navigation pane, select Knowledge bases.

  3. In the Knowledge bases section, select Create knowledge base.

  4. Provide the knowledge base details.

    1. Provide the knowledge base name and optional description.

    2. Provide the AWS Identity and Access Management role for the necessary access permissions required to create a knowledge base.

      Note

      The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have completed the steps for creating a knowledge base, the IAM role with all the required permissions are applied to your specific knowledge base.

    3. Create any tags you want to assign to your knowledge base.

    Go to the next section to configure your data source.

  5. Choose Salesforce as your data source and provide the connection configuration details.

    1. Provide the data source name and optional description.

    2. Provide your Salesforce instance URL. For example, https://company.salesforce.com/. The instance must be running a Salesforce Connected App.

    Check the advanced settings. You can optionally change the default selected settings.

  6. Set your transient data encryption key and data deletion policy in the advanced settings.

    For KMS key settings, you can choose either a custom key or use the default provided data encryption key.

    While converting your data into embeddings, Amazon Bedrock encrypts your transient data with a key that AWS owns and manages, by default. You can use your own KMS key. For more information, see Encryption of transient data storage during data ingestion.

    For data deletion policy settings, you can choose either:

    • Delete: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an AWS account is deleted.

    • Retain: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.

    Continue configuring your data source.

  7. Provide the authentication information to connect to your Salesforce instance:

    1. For OAuth 2.0 authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the Salesforce Connected App consumer key (client ID), consumer secret (client secret), and the Salesforce instance URL or the URL to request the authentication token from. For more information, see Salesforce documentation on Create a Connected App and Configure a Connected App for the OAuth 2.0 Client Credentials.

    Continue configuring your data source.

  8. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

    Continue configuring your data source.

  9. Choose either the default or customized chunking and parsing configurations.

    1. If you choose custom settings, select one of the following chunking options:

      • Fixed-size chunking: Content split into chunks of text of your set approximate token size. You can set the maximum number of tokens that must not exceed for a chunk and the overlap percentage between consecutive chunks.

      • Default chunking: Content split into chunks of text of up to 300 tokens. If a single document or piece of content contains less than 300 tokens, the document is not further split.

      • Hierarchical chunking: Content organized into nested structures of parent-child chunks. You set the maximum parent chunk token size and the maximum child chunk token size. You also set the absolute number of overlap tokens between consecutive parent chunks and consecutive child chunks.

      • Semantic chunking: Content organized into semantically similar text chunks or groups of sentences. You set the maximum number of sentences surrounding the target/current sentence to group together (buffer size). You also set the breakpoint percentile threshold for dividing the text into meaningful chunks. Semantic chunking uses a foundation model. View Amazon Bedrock pricing for information on the cost of foundation models.

      • No chunking: Each document is treated as a single text chunk. You might want to pre-process your documents by splitting them into separate files.

      Note

      You can’t change the chunking strategy after you have created the data source.

    2. You can choose to use Amazon Bedrock’s foundation model for parsing documents to parse more than standard text. You can parse tabular data within documents with their structure intact, for example. View Amazon Bedrock pricing for information on the cost of foundation models.

    3. You can choose to use an AWS Lambda function to customize your chunking strategy and how your document metadata attributes/fields are treated and ingested. Provide the Amazon S3 bucket location for the Lambda function input and output.

    Go to the next section to configure your vector store.

  10. Choose a model for converting your data into vector embeddings.

    Create a vector store to allow Amazon Bedrock to store, update, and manage embeddings. You can quick create a new vector store or select from a supported vector store you have created. Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source. If you create a new vector store, an Amazon OpenSearch Serverless vector search collection and index with the required fields is set up for you. If you select from a supported vector store, you must map the vector field names and metadata field names.

    Go to the next section to review your knowledge base configurations.

  11. Check the details of your knowledge base. You can edit any section before going ahead and creating your knowledge base.

    Note

    The time it takes to create the knowledge base depends on your specific configurations. When the creation of the knowledge base has completed, the status of the knowledge base changes to either state it is ready or available.

    Once your knowledge base is ready and available, sync your data source for the first time and whenever you want to keep your content up to date. Select your knowledge base in the console and select Sync within the data source overview section.

API

The following is an example of a configuration for connecting to Salesforce for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call CreateKnowledgeBase, you call CreateDataSource to create your data source with your connection information in dataSourceConfiguration. Remember to also specify your chunking strategy/approach in vectorIngestionConfiguration and your data deletion policy in dataDeletionPolicy

AWS Command Line Interface

aws bedrock create-data-source \ --name "Salesforce connector" \ --description "Salesforce data source connector for Amazon Bedrock to use content in Salesforce" \ --knowledge-base-id "your-knowledge-base-id" \ --data-source-configuration file://salesforce-bedrock-connector-configuration.json \ --data-deletion-policy "DELETE" \ --vector-ingestion-configuration '{"chunkingConfiguration":[{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":[{"maxTokens":"100","overlapPercentage":"10"}]}]}' salesforce-bedrock-connector-configuration.json { "salesforceConfiguration": { "sourceConfiguration": { "hostUrl": "https://company.salesforce.com/", "authType": "OAUTH2_CLIENT_CREDENTIALS", "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-Salesforce" }, "crawlerConfiguration": { "filterConfiguration": { "type": "PATTERN", "patternObjectFilter": { "filters": [ { "objectType": "Campaign", "inclusionFilters": [ ".*public.*" ], "exclusionFilters": [ ".*private.*" ] } ] } } } }, "type": "SALESFORCE" }