Semantic question matching using LLM text embeddings - QnABot on AWS

Semantic question matching using LLM text embeddings

QnABot can use text embeddings to provide semantic search capabilities by using large language models (LLMs). The goals of these features are to improve question matching accuracy while reducing the amount of tuning required when compared to the default Opensearch keyword-based matching. Some of the benefits include:

  • Improved FAQ accuracy with semantic matching vs. keyword matching (comparing the meaning of questions vs. comparing the individual words).

  • Fewer training utterances are required to match a diverse set of queries. This results in significantly less tuning to get and maintain good results.

  • Better multi-language support because translated utterances only need to match the original question’s meaning, not the exact wording.

For example, with semantic matching activated, “What’s the address of the Whitehouse?” matches to “Where does the president live?” and “How old are you?” matches with “What is your age?”. These examples won't match using the default keywords because they don't share any of the same words.

To enable these expanded semantic search capabilities, QnABot can use:

  • Embeddings from a Text Embedding model hosted on a pre-built Amazon SageMaker endpoint (recommended).

  • Embeddings from a user provided custom Lambda function.


This is an optional feature available as of v5.3.0. We encourage you to try it out on non-production instances initially to validate expected accuracy improvements and to test for any regression issues. Refer to the Cost section to see estimates of how this feature affects pricing.


By choosing to enable the Semantic question matching using LLM text embeddings, you acknowledge that QnABot on AWS engages third-party generative artificial intelligence (AI) models that AWS does not own or otherwise has any control over (“Third-Party Generative AI Models”). Your use of the Third-Party Generative AI Models is governed by the terms provided to you by the Third-Party Generative AI Model providers when you acquired your license to use them (for example, their terms of service, license agreement, acceptable use policy, and privacy policy). You are responsible for ensuring that your use of the Third-Party Generative AI Models comply with the terms governing them, and any laws, rules, regulations, policies, or standards that apply to you. You are also responsible for making your own independent assessment of the Third-Party Generative AI Models that you use, including their outputs and how Third-Party Generative AI Model providers use any data that may be transmitted to them based on your deployment configuration. AWS does not make any representations, warranties, or guarantees regarding the Third-Party Generative AI Models, which are “Third-Party Content” under your agreement with AWS. QnABot on AWS is offered to you as “AWS Content” under your agreement with AWS.

Using the built-in Amazon SageMaker model

QnABot comes bundled with the ability to manage the lifecycle of a pre-built embeddings model hosted on Amazon SageMaker. In this mode, QnABot provisions a SageMaker inference endpoint running the Hugging Face el5-large model.

To activate, deploy a stack and set EmbeddingsAPI to SageMaker. By default, a 1 node ml.m5.xlarge endpoint automatically provisions. For large volume deployments, users can add nodes by setting the parameter SagemakerInitialInstanceCount. See the Cost section for pricing details.

Semantic Search with Embeddings

Semantic Search with Embeddings


These settings cannot be changed through the content designer Settings page. To provision and deprovision the SageMaker instances, you must update your CloudFormation stack. 


The embeddings model provided by Sagemaker for QnABot is EL5. This only supports the English Language so if you are trying to work with a non-English language then you should use your own Embeddings model and provide that Lambda Arn in your deployment. For more information read the section below on using a custom Lambda Function.

Using a custom Lambda function

Users that wish to explore alternate pretrained or fine-tuned embeddings models can integrate a custom-built Lambda function. By using a custom Lambda function, you can build your own embeddings model or even choose to connect to an external embeddings API.


If integrating your Lambda with external resources, evaluate the security implications of sharing data outside of AWS.

To begin, you’ll need to create a valid Lambda function. Your custom Lambda function should accept a JSON object containing the input string and return an array which contains the embeddings. Record the length of your embeddings array because you need it to deploy the stack (this is also referred to as the dimensions).

Lambda event Input:

{ // inputtype has either a value of 'q' for question or 'a' for answer "inputType": "string", // inputtext is the string on which to generate your custom embeddings "inputText":"string" }

Expected Lambda JSON return object:

{“embedding”: [...] }

Once your Lambda function is ready, you can deploy the stack. To activate your Lambda function for embeddings, deploy the stack with EmbeddingsAPI set to LAMBDA. You will also need to set EmbeddingsLambdaArn to the ARN of your Lambda function and EmbeddingsLambdaDimensions to the dimensions returned by your Lambda function.

Semantic Search with LambdaLambda function

Semantic Search with Lambda function


You can't change these settings through the content designer Settings page. To correctly reconfigure your deployment, update your CloudFormation stack to modify these values. 

Settings available for text embeddings


Many of these settings depend on the underlying infrastructure being correctly configured. Follow the instructions found at Using the built-in Amazon SageMaker model or Using a custom Lambda Function before modifying any of the settings below. 

Once your QnABot stack is installed with EmbeddingsApi activated, you can manage several settings through the content designer Settings page:

  • EMBEDDINGS_ENABLE: To enable/disable use of semantic search using embeddings:

    • Set to FALSE to turn off the use of embeddings-based queries.

    • Set to TRUE to re-activate the use of embeddings based queries after previously setting it to FALSE.


      EMBEDDINGS_ENABLE will be set default to TRUE, if EmbeddingsAPI is provisioned to SAGEMAKER or LAMBDA. If not provisioned, EMBEDDINGS_ENABLE will be set default to FALSE.

      Setting TRUE when the stack has EmbeddingsAPI set to DISABLED will cause failures since the QnABot stack isn't provisioned to support generation of embeddings.

    • If you turn off embeddings, you will also want to re-activate keyword filters by setting ES_USE_KEYWORD_FILTERS to TRUE.

    • If you add, modify, or import any items in the content designer when EMBEDDINGS_ENABLE is set to FALSE, then embeddings won't get created and you'll need to re-import or re-save those items after re-enabling embeddings. Similarly, if you update/change your embeddings model or parameter and need the embeddings to be recalculated, you will need to export and re-import the embeddings related items in your content designer, for example, the questions.


This setting allows you to toggle embeddings on and off, it does not manage the underlying infrastructure. If you choose to permanately turn off embeddings, update the stack as well. This will allow you to deprovision the SageMaker instance to prevent incurring additional costs.

  • ES_USE_KEYWORD_FILTERS: This setting should now default to FALSE. Although you can use keyword filters with embeddings based semantic queries, they limit the power of semantic search by forcing keyword matches (preventing matches based on different words with similar meanings)

  • ES_SCORE_ANSWER_FIELD: If set to TRUE, QnABot on AWS runs embedding vector searches on embeddings generated on answer field if no match is found on question fields. This allows QnABot to find matches based on the contents on the answer field as well as the questions. Only the plain text answer field is used (not the Markdown or SSML alternatives). Tune the individual thresholds for questions and answers using the additional settings of:



  • EMBEDDINGS_SCORE_THRESHOLD: Change this value to customize the score threshold on question fields. Unlike regular ElasticSearch queries, embeddings queries always return scores between 0 and 1, so we can apply a threshold to separate good from bad results.

    • If no question has a similarity score above the threshold set, QnABot on AWS rejects the match and reverts to:

      1. Tries to find a match using the answer field (only if ES_SCORE_ANSWER_FIELD is set to TRUE.

      2. Kendra fallback (only if enabled)

      3. no_hits

    • The default threshold is 0.85, but you may need to modify this based on your embeddings model and your experiments.


Use the content designer TEST tab to see the hits ranked by score for your query results.

  • EMBEDDINGS_SCORE_ANSWER_THRESHOLD: change this value to customize the score threshold on answer fields. This setting is only used when ES_SCORE_ANSWER_FIELD is set to TRUE and QnABot has failed to find a suitable response using the question field.

    • If no question has a similarity score above the threshold set, QnABot on AWS rejects the match and reverts to:

      1. Amazon Kendra fallback (only if enabled)

      2. no_hits

    • The default threshold is 0.80, but you may need to modify this based on your embeddings model and your experiments.


Use the content designer TEST tab and select the Score on answer field checkbox to see the hits ranked by score for your answer field query results

Recommendations for tuning with LLMs

When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings model will cluster checkings and savings with account, so if you want to match both account types, just refer to account in your questions.

Similarly for the question/utterance of transfer to an agent, consider using transfer to someone as it will better match with agent, representative, human, person, etc.