Architecture overview - Enhanced Document Understanding on AWS

Architecture overview

This section provides a reference implementation architecture diagram for the components deployed with this solution.

Architecture diagram

Deploying this solution with the default parameters deploys the following components in your AWS account.

The solution deploys a UI, APIs, workflows, and optional machine learning to analyze files.

Enhanced Document Understanding on AWS architecture

Note

CloudFormation resources are created from AWS Cloud Development Kit (AWS CDK) constructs.

The high-level process flow for the solution components deployed with the CloudFormation template is as follows:

  1. The user requests the browser to navigate to an Amazon CloudFront URL.

  2. The UI prompts the user for authentication, which the solution validates using Amazon Cognito.

  3. The UI interacts with the REST endpoint deployed on Amazon API Gateway.

  4. The user creates a case that the solution stores in the Case management store Amazon DynamoDB table.

  5. The user requests a signed Amazon Simple Storage Service (Amazon S3) URL to upload documents to an S3 bucket.

  6. Amazon S3 generates an s3:PutObject event on the default Amazon EventBridge event bus.

  7. The s3:PutObject event invokes the workflow orchestrator AWS Lambda function. This function uses the configuration stored in the Configuration for orchestrating workflows DynamoDB table to determine the workflows to be called.

  8. The workflow orchestrator Lambda function creates an event and sends it to the custom event bus.

  9. The custom event bus invokes one of the three AWS Step Functions state machine workflows based on the event definition.

  10. The workflow completes and publishes an event to the custom EventBridge event bus.

  11. The custom EventBridge event bus invokes the workflow orchestrator Lambda function. This function uses the configuration stored in the Configuration for orchestrating workflows DynamoDB table to determine whether the sequence is complete or if the sequence requires another workflow:

    1. The solution updates the Case management store DynamoDB table.

    2. If the sequence is not complete, the solution returns to step 8 for the next state machine workflow.

  12. (Optional) The workflow orchestrator Lambda function writes metadata from the processed information to an Amazon Kendra index. This index provides the ability to perform ML powered search.

    Note

    The deployment to Amazon Kendra is optional. If not deployed the search feature is not available.

  13. (Optional) The workflow orchestrator Lambda function writes metadata from the processed information to an Amazon OpenSearch Serverless collection. This collection provides the ability to perform keyword search.

  14. (Optional) Open Search is powered by AWS OpenSearch Serverless and the OpenSearch collection is protected by running in Vpc – 2 private subnets. The Vpc currently provisions a security group that allows all outbound traffic from OpenSearch and an ingress rule for Lambda to write inferences. The Vpc also provisions 2 interface endpoint (AWS PrivateLink) that allows both Lambda and KMS to access the Open Search collection. KMS does not directly access OpenSearch but it is used for storing and managing the encryption keys to perform the encryption of data at rest.

    Note

    The deployment to Amazon OpenSearch Serverless is optional. If not deployed the search feature is not available.