Enabling embeddings support Settings available for text embeddings Recommendations for tuning with LLMs Test using example phrases

Semantic question matching using text embeddings LLM

Note

This is an optional feature available as of v5.3.0. We encourage you to try it out on non-production instances initially to validate expected accuracy improvements and to test for any regression issues. See the Cost section to see estimates of how this feature affects pricing.

QnABot on AWS can use text embeddings to provide semantic search capabilities by using LLMs. The goals of these features are to improve question matching accuracy while reducing the amount of tuning required when compared to the default OpenSearch keyword-based matching. Some of the benefits include:

Improved FAQ accuracy due to semantic matching compared to keyword matching (comparing the meaning of questions as opposed to comparing the individual words).
Fewer training utterances are required to match a diverse set of queries. This results in significantly less tuning to get and maintain good results.
Better multi-language support because translated utterances only need to match the original question’s meaning, not the exact wording.

For example, with semantic matching activated, “What’s the address of the Whitehouse?” matches to “Where does the president live?” and “How old are you?” would match with “What is your age?”. These examples won’t match using the default keywords because they don’t share any of the same words.

To enable these expanded semantic search capabilities, QnABot can use:

Select from several embeddings models provided by Amazon Bedrock using the EmbeddingsBedrockModelId Cloudformation parameter. These models provide the best performance and operate on a pay-per-request model. Bedrock is currently only supported in the following Regions: us-east-1, us-west-2, ap-southeast-1, ap-northeast-1, eu-central-1 (preferred).
Embeddings from a text embedding model hosted on a pre-built Amazon SageMaker endpoint.
Embeddings from a user provided custom Lambda function.

Note

By choosing to use the generative responses features, you acknowledge that QnABot on AWS engages third-party generative AI models that AWS does not own or otherwise has any control over (“Third-Party Generative AI Models”). Your use of the Third-Party Generative AI Models is governed by the terms provided to you by the Third-Party Generative AI Model providers when you acquired your license to use them (for example, their terms of service, license agreement, acceptable use policy, and privacy policy).

You are responsible for ensuring that your use of the Third-Party Generative AI Models comply with the terms governing them, and any laws, rules, regulations, policies, or standards that apply to you.

You are also responsible for making your own independent assessment of the Third-Party Generative AI Models that you use, including their outputs and how Third-Party Generative AI Model providers use any data that may be transmitted to them based on your deployment configuration.

AWS does not make any representations, warranties, or guarantees regarding the Third-Party Generative AI Models, which are “Third-Party Content” under your agreement with AWS. QnABot on AWS is offered to you as “AWS Content” under your agreement with AWS.

Enabling embeddings support

Using Amazon Bedrock model (Preferred)

Utilizes one of the Amazon Bedrock foundation models to generate text embeddings. Currently, the following embeddings models are supported by QnABot on AWS:

Note

Access must be requested for the Amazon Bedrock embeddings model that you want to use. This step must be performed for each account and Region where QnABot on AWS is deployed. To request access, navigate to Model Access in the Amazon Bedrock console. Select the models you need access to and request access.

Request Amazon Bedrock embeddings

From the CloudFormation console, set the following parameters:

Set EmbeddingsAPI to BEDROCK.
Set EmbeddingsBedrockModelId to one of the three options.

Configure Amazon Bedrock embeddings

Using the built-in Amazon SageMaker model

QnABot on AWS comes bundled with the ability to manage the lifecycle of a pre-built embeddings model hosted on Amazon SageMaker. In this mode, the solution provisions a SageMaker inference endpoint running the SageMaker Jumpstart intfloat/e5-large-v2 model which is offered from HuggingFace.

To enable, deploy a stack and set EmbeddingsAPI to SAGEMAKER. By default, a one node ml.m5.xlarge endpoint automatically provisions. For large volume deployments, users can add nodes by setting the SagemakerInitialInstanceCount CloudFormation parameter. See the Cost section for pricing details.

Semantic search with embeddings

Note

These settings cannot be changed through the content designer Settings page. To provision and deprovision the SageMaker instances, you must update your CloudFormation stack.
The embeddings model provided by SageMaker for QnABot is intfloat/e5-large-v2. This only supports English, so if you are trying to work with a non-English language then you should use your own embeddings model and provide that Lambda ARN in your deployment. For more information, read the Using a custom Lambda function section.

Using a custom Lambda function

Users that want to explore alternate pre-trained or fine-tuned embeddings models can integrate a custom built Lambda function. By using a custom Lambda function, you can build your own embeddings model or even choose to connect to an external embeddings API.

Note

If integrating your Lambda function with external resources, evaluate the security implications of sharing data outside of AWS.

To begin, you must create a Lambda function. Your custom Lambda function should accept a JSON object containing the input string and return an array, which contains the embeddings. Record the length of your embeddings array because you need it to deploy the stack (this is also referred to as the dimensions).

Lambda event input:


{
  // inputtype has either a value of 'q' for question or 'a' for answer
  "inputType": "string",
  
  // inputtext is the string on which to generate your custom embeddings
  "inputText":"string"
}

Expected Lambda JSON return object:


{“embedding”: [...] }

When your Lambda function is ready, you can deploy the stack. To activate your Lambda function for embeddings, deploy the stack with EmbeddingsAPI set to LAMBDA. You must also set EmbeddingsLambdaArn to the ARN of your Lambda function and EmbeddingsLambdaDimensions to the dimensions returned by your Lambda function.

Semantic search with Lambda function

Note

You can't change these settings through the content designer Settings page. To correctly reconfigure your deployment, update your CloudFormation stack to modify these values.

Settings available for text embeddings

Note

Many of these settings depend on the underlying infrastructure being correctly configured. Follow the instructions found at Using the built-in Amazon SageMaker model or Using a custom Lambda function before modifying any of the following settings.

When your QnABot stack is installed with EmbeddingsApi enabled, you can manage several different settings through the content designer Settings page:

EMBEDDINGS_ENABLE - To turn on and off use of semantic search using embeddings:
- Set to FALSE to turn off the use of embeddings-based queries.
- Set to TRUE to activate the use of embeddings-based queries after previously setting it to FALSE.

Note

Setting TRUE when the stack has EmbeddingsAPI set to DISABLED will cause failures since the QnABot on AWS stack isn't provisioned to support generation of embeddings.
EMBEDDINGS_ENABLE will be set default to TRUE, if EmbeddingsAPI is provisioned to SAGEMAKER or LAMBDA. If not provisioned, EMBEDDINGS_ENABLE will be set default to FALSE.

If you disable embeddings, you will likely also want to re-enable keyword filters by setting ES_USE_KEYWORD_FILTERS to TRUE.

If you add, modify, or import any items in the content designer when EMBEDDINGS_ENABLE is set to FALSE, then embeddings won't get created and you'll need to re-import or re-save those items after re-enabling embeddings.

This setting allows you to toggle embeddings on and off, it does not manage the underlying infrastructure. If you choose to permanently turn off embeddings, update the stack as well. This will allow you to deprovision the SageMaker instance to prevent incurring additional costs.

Important

If you update or change your embeddings model, for example, from Amazon Titan Embeddings G1 to Cohere English, or change EmbeddingsApi the embedding dimensions need to be recalculated. QnABot on AWS will need to export and re-import the Q&As in your content designer; however, we recommend backing up the Q&As using export before making this change. If any discrepancies occur, they can be addressed by import of exported Q&As.

ES_USE_KEYWORD_FILTERS - This setting should now default to FALSE. Although you can use keyword filters with embeddings based semantic queries, they limit the power of semantic search by forcing keyword matches (preventing matches based on different words with similar meanings).
ES_SCORE_ANSWER_FIELD - If set to TRUE, QnABot on AWS runs embedding vector searches on embeddings generated on answer field if no match is found on question fields. This allows QnABot to find matches based on the contents on the answer field as well as the questions. Only the plaintext answer field is used (not the Markdown or SSML alternatives). Tune the individual thresholds for questions and answers using the additional settings of:
- EMBEDDINGS_SCORE_THRESHOLD
- EMBEDDINGS_SCORE_ANSWER_THRESHOLD
EMBEDDINGS_SCORE_THRESHOLD - Change this value to customize the score threshold on question fields. Unlike regular OpenSearch queries, embeddings queries always return scores between 0 and 1, so we can apply a threshold to separate good from bad results.
- If no question has a similarity score above the threshold set, the match gets rejected and QnABot reverts to:
  1. Trying to find a match using the answer field (only if ES_SCORE_ANSWER_FIELD is set to TRUE).
  2. Amazon Kendra fallback (only if enabled)
  3. no_hits
  The default threshold is 0.7 for BEDROCK and 0.85 for SAGEMAKER, but you can modify this based on your embeddings model and your experiments.

Tip

Use the content designer TEST tab to see the hits ranked by score for your query results.

EMBEDDINGS_SCORE_ANSWER_THRESHOLD - Change this value to customize the score threshold on answer fields. This setting is only used when ES_SCORE_ANSWER_FIELD is set to TRUE and QnABot has failed to find a suitable response using the question field.
- If no answer has a similarity score above the threshold set, the match gets rejected and QnABot reverts to:
  1. Amazon Kendra fallback (only if enabled)
  2. no_hits
  The default threshold is 0.8, but you can modify this based on your embeddings model and your experiments.

Tip

Use the content designer TEST tab and select the Score on answer field checkbox to see the hits ranked by score for your answer field query results.

EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD - Change this value to customize the passage score threshold. This setting is only used if ES_SCORE_TEXT_ITEM_PASSAGES is TRUE.
- If no answer has a similarity score above the threshold set, the match gets rejected and QnABot reverts to:
  1. Amazon Kendra fallback (only if enabled)
  2. no_hits
The default threshold is 0.65 for BEDROCK and 0.8 for SAGEMAKER, but you can modify this based on your embeddings model and your experiments.

Tip

Use the content designer TEST tab and select the Score on answer field checkbox to see the hits ranked by score for your answer field query results.

Recommendations for tuning with LLMs

When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings model will cluster checkings and savings with account, so if you want to match both account types, just see account in your questions.

Similarly for the question and utterance of transfer to an agent, consider using transfer to someone as it will better match with agent, representative, human, person, etc.

In addition for LLMs, we recommend tuning the EMBEDDINGS_SCORE_THRESHOLD, EMBEDDINGS_SCORE_ANSWER_THRESHOLD, and EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD settings. The default values are generalized to all multiple models but you might need to modify this based on your embeddings model and your experiments.

Test using example phrases

Add Q&As using the QnABot content designer

Choose Add to add a new question of QnA type with an Item ID: EMBEDDINGS.WhiteHouse
1. Add a single example question/utterance: What is the address of the White House?
2. Add an Answer: The address is: 1600 Pennsylvania Avenue NW, Washington, DC 20500
3. Choose CREATE to save the item.
Add another question with an Item ID of EMBEDDINGS.Agent
1. This time add a few questions/utterances:
  - I want to speak to an agent
  - Representative
  - Operator please
  - Zero (Zero handles when a customer presses “0” on their dial pad when integrated with a contact center)
2. Add an answer: Ok. Let me route you to a representative who can assist you. {{setSessionAttr 'nextAction' 'AGENT'}}
  
  This Handlebars syntax will set a nextAction session attribute with the value AGENT.
3. Choose CREATE to save the item.
Select the TEST tab in the content designer UI.
1. Enter the question, Where does the President live? and choose SEARCH.
2. Observe that the correct answer has the top ranked score (displayed on the left), even though it does not use any of the same words as the stored example utterance.
3. Try some other variations, such as, Where's the Whitehouse?, Where's the whitehousw? (with a typo), or Where is the President’s mansion?
4. To detect when a caller wants to speak with an agent, we entered only a few example phrases into QnABot. Try some tests where you ask for an agent in a variety of different ways.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Integrating Amazon Kendra

Text generation and query disambiguation using LLMs