Semantic question matching using LLM text embeddings - QnABot on AWS

Semantic question matching using LLM text embeddings

QnABot can use text embeddings to provide semantic search capabilities by using large language models (LLMs). The goals of these features are to improve question matching accuracy while reducing the amount of tuning required when compared to the default Opensearch keyword-based matching. Some of the benefits include:

  • Improved FAQ accuracy with semantic matching vs. keyword matching (comparing the meaning of questions vs. comparing the individual words).

  • Fewer training utterances are required to match a diverse set of queries. This results in significantly less tuning to get and maintain good results.

  • Better multi-language support because translated utterances only need to match the original question’s meaning, not the exact wording.

For example, with semantic matching activated, “What’s the address of the Whitehouse?” matches to “Where does the president live?” and “How old are you?” matches with “What is your age?”. These examples won't match using the default keywords because they don't share any of the same words.

To enable these expanded semantic search capabilities, QnABot can use:

  • Embeddings from a Text Embedding model hosted on a pre-built Amazon SageMaker endpoint (recommended).

  • Embeddings from a user provided custom Lambda function.


This is an optional feature available as of v5.3.0. We encourage you to try it out on non-production instances initially to validate expected accuracy improvements and to test for any regression issues. Refer to the Cost section to see estimates of how this feature affects pricing.

Using the built-in Amazon SageMaker model

QnABot comes bundled with the ability to manage the lifecycle of a pre-built embeddings model hosted on Amazon SageMaker. In this mode, QnABot provisions a SageMaker inference endpoint running the Hugging Face el5-large model.

To activate, deploy a stack and set EmbeddingsAPI to SageMaker. By default, a 1 one node ml.m5.xlarge endpoint automatically provisions. For large volume deployments, users can add nodes by setting the parameter SagemakerInitialInstanceCount. See Cost section for pricing detials.

By setting the parameter SagemakerInitialInstanceCount to 0, a Serverless Sagemaker endpoint is activated. A serverless endpoint can save money by scaling down to zero when not in use; however, there is a 'cold start' time of approximately 2 minutes during which QnABot requests, imports, or add/modify items operations will time out or be delayed. In this configuration, QnABot creates the endpoint with 4GBs of memory and a max concurrency of 50 requests.

        Semantic Search with Embeddings

Semantic Search with Embeddings


These settings cannot be changed through the Content Designer settings page. To provision and deprovision the SageMaker instances, you must update your CloudFormation stack. 

Using a custom Lambda function

Users that wish to explore alternate pretrained or fine-tuned embeddings models can integrate a custom built Lambda function. By using a custom Lambda function, you can build your own embeddings model or even choose to connect to an external embeddings API.


If integrating your Lambda with external resources, evaluate the security implications of sharing data outside of AWS.

To begin, you’ll need to create a valid Lambda function. Your custom Lambda function should accept a JSON object containing the input string and return an array which contains the embeddings. Record the length of your embeddings array because you need it to deploy the stack (this is also reffered to as the dimensions).

Lambda event Input:

{ // inputtype has either a value of 'q' for question or 'a' for answer "inputType": "string", // inputtext is the string on which to generate your custom embeddings "inputText":"string" }

Expected Lambda JSON return object:

{“embedding”: [...] }

Once your Lambda function is ready, you can deploy the stack. To activate your Lambda function for embeddings, deploy the stack with EmbeddingsAPI set to LAMBDA. You will also need to set EmbeddingsLambdaArn to the ARN of your Lambda function and EmbeddingsLambdaDimensions to the dimensions returned by your Lambda function.

        Semantic Search with LambdaLambda function

Semantic Search with Lambda function


You can't change these settings through the Content Designer settings page. To correctly reconfigure your deployment, update your CloudFormation stack to modify these values. 

Settings available for text embeddings


Many of these settings depend on the underlying infrastructure being correctly configured. Follow the instructions found at Using the built-in Amazon SageMaker model or Using a custom Lambda Function before modifying any of the settings below. 

Once your QnABot stack is installed with EmbeddingsApi activated, you can manage several settings through the Content Designer Settings page:

  • EMBEDDINGS_ENABLE: To enable/disable use of semantic search using embeddings:

    • Set to FALSE to turn off the use of embeddings-based queries.

    • Set to TRUE to re-activate the use of embeddings based queries after previously setting it to FALSE.


      EMBEDDINGS_ENABLE will be set default to TRUE, if EmbeddingsAPI is provisioned to SAGEMAKER or LAMBDA. If not provisioned, EMBEDDINGS_ENABLE will be set default to FALSE.

      Setting TRUE when the stack has EmbeddingsAPI set to DISABLED will cause failures since the QnABot stack isn't provisioned to support generation of embeddings.

    • If you turn off embeddings, you will also want to re-activate keyword filters by setting ES_USE_KEYWORD_FILTERS to TRUE.

    • If you add, modify, or import any items in Content Designer when EMBEDDINGS_ENABLE is set to FALSE, then embeddings won't get created and you'll need to re-import or re-save those items after re-enabling embeddings.


This setting allows you to toggle embeddings on and off, it does not manage the underlying infrastructure. If you choose to permanately turn off embeddings, update the stack as well. This will allow you to deprovision the SageMaker instance to prevent incurring additional costs.

  • ES_USE_KEYWORD_FILTERS: This setting should now default to FALSE. Although you can use keyword filters with embeddings based semantic queries, they limit the power of semantic search by forcing keyword matches (preventing matches based on different words with similar meanings)

  • ES_SCORE_ANSWER_FIELD: If set to TRUE, QnABot on AWS runs embedding vector searches on embeddings generated on answer field if no match is found on question fields. This allows QnABot to find matches based on the contents on the answer field as well as the questions. Only the plain text answer field is used (not the Markdown or SSML alternatives). Tune the individual thresholds for questions and answers using the additional settings of:



  • EMBEDDINGS_SCORE_THRESHOLD: Change this value to customize the score threshold on question fields. Unlike regular ElasticSearch queries, embeddings queries always return scores between 0 and 1, so we can apply a threshold to separate good from bad results.

    • If no question has a similarity score above the threshold set, QnABot on AWS rejects the match and reverts to:

      1. Tries to find a match using the answer field (only if ES_SCORE_ANSWER_FIELD is set to TRUE.

      2. Kendra fallback (only if enabled)

      3. no_hits

    • The default threshold is 0.85, but you may need to modify this based on your embeddings model and your experiments.


Use the Content Designer TEST tab to see the hits ranked by score for your query results.

  • EMBEDDINGS_SCORE_ANSWER_THRESHOLD: change this value to customize the score threshold on answer fields. This setting is only used when ES_SCORE_ANSWER_FIELD is set to TRUE and QnABot has failed to find a suitable response using the question field.

    • If no question has a similarity score above the threshold set, QnABot on AWS rejects the match and reverts to:

      1. Amazon Kendra fallback (only if enabled)

      2. no_hits

    • The default threshold is 0.80, but you may need to modify this based on your embeddings model and your experiments.


Use the Content Designer TEST tab and select the Score on answer field checkbox to see the hits ranked by score for your answer field query results

Recommendations for tuning with LLMs

When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings model will cluster checkings and savings with account, so if you want to match both account types, just refer to account in your questions.

Similarly for the question/utterance of transfer to an agent, consider using transfer to someone as it will better match with agent, representative, human, person, etc.