Architecture overview - QnABot on AWS

Architecture overview

This section provides a reference implementation architecture diagram for the components deployed with this solution.

Architecture diagram

Deploying this solution with the default parameters deploys the following components in your AWS account (components with dotted line border are optional).

QnABot on AWS architecture on AWS

The high-level process flow for the solution components deployed with the AWS CloudFormation template is as follows:

  1. The admin deploys the solution into their AWS account, opens the Content Designer UI or Amazon Lex web client, and uses Amazon Cognito to authenticate.

  2. After authentication, Amazon API Gateway and Amazon S3 deliver the contents of the Content Designer UI.

  3. The admin configures questions and answers in the Content Designer and the UI sends requests to Amazon API Gateway to save the questions and answers.

  4. The Content Designer AWS Lambda function saves the input in Amazon OpenSearch Service in a questions bank index. If using text embeddings, these requests will first pass through a LLM model hosted on Amazon Bedrock to generate embeddings before being saved into the question bank on OpenSearch. In addition, the Content Designer saves default and custom configuration settings in Amazon DynamoDB.

  5. Users of the chatbot interact with Amazon Lex via the web client UI, Amazon Alexa or Amazon Connect.

  6. Amazon Lex forwards requests to the Bot Fulfillment AWS Lambda function. Users can also send requests to this Lambda function via Amazon Alexa devices. NOTE: When streaming is enabled, the chat client uses Amazon Lex sessionId to establish WebSocket connections through API Gateway V2.

  7. The user and chat information is stored in Amazon DynamoDB to disambiguate follow up questions from previous question and answer context.

  8. Amazon Comprehend and Amazon Translate (if necessary) are used by the Bot Fulfillment AWS Lambda function to translate non-native Language requests to the native Language selected by the user during the deployment and look up the answer in Amazon OpenSearch Service.

  9. If using LLM features such as text generation and text embeddings, these requests will first pass through various foundational models hosted on Amazon Bedrock to generate the search query and embeddings to compare with those saved in the question bank on OpenSearch.

    1. If pre-processing guardrails are enabled, they scan and block potentially harmful user inputs before they reach the QnABot application. This acts as the first line of defense to prevent malicious or inappropriate queries from being processed.

    2. If using Bedrock guardrails for LLMs or Knowledge Base, it can apply contextual guarding and safety controls during LLM inference to ensure appropriate answer generation.

    3. If post-processing guardrails are enabled, they scan, mask, or block potentially harmful content in the final responses before they are sent to the client through the fulfillment Lambda. This serves as the last line of defense to ensure that sensitive information (like PII) is properly masked and inappropriate content is blocked.

  10. If no match is returned from the OpenSearch question bank or text passages, then the Bot fulfillment Lambda function forwards the request as follows:

    1. If an Amazon Kendra index is configured for fallback, then the Bot Fulfillment AWS Lambda function forwards the request to Kendra if no match is returned from the OpenSearch question bank. The text generation LLM can optionally be used to create the search query and to synthesize a response from the returned document excerpts.

    2. If a Bedrock Knowledge Base ID is configured, then the Bot Fulfillment AWS Lambda function forwards the request to the Bedrock Knowledge Base. The Bot Fulfillment AWS Lambda function leverages the RetrieveAndGenerate or RetrieveAndGenerateStream APIs to fetch the relevant results for an user’s query, augment the foundational model’s prompt and return the response.

  11. When streaming is enabled, RAG-enhanced LLM responses from text passages or external data sources is streamed via WebSocket connection using same Lex sessionId, while the final response is processed through the fulfillment Lambda.

  12. User interactions with the Bot Fulfillment function generate logs and metrics data, which is sent to Amazon Kinesis Data Firehose then to Amazon S3 for later data analysis. The OpenSearch Dashboards can be used to view usage history, logged utterances, no hits utterances, positive user feedback, and negative user feedback and also provides the ability to create custom reports.

  13. The OpenSearch Dashboards can be used to view usage history, logged utterances, no hits utterances, positive user feedback, and negative user feedback, and also provides the ability to create custom reports.

  14. Using Amazon CloudWatch, the admins can monitor service logs and use the CloudWatch dashboard created by QnABot to monitor deployment’s operational health.