Deploy an event-driven AWS Solution that automates document ingestion, analysis, detection, and redaction - Enhanced Document Understanding on AWS

Deploy an event-driven AWS Solution that automates document ingestion, analysis, detection, and redaction

Publication date: August 2023 (last update: September 2024)

Organizations across industries are increasingly required to process large volumes of semi-structured and unstructured documents with greater accuracy and speed. They need a document processing system that ingests and analyzes documents, extracts their content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.

Many industries have stringent compliance requirements to redact personally identifiable information (PII) and protected health information (PHI) from documents. In most cases, organizations manually process documents to extract information and insights. This approach can be time consuming, expensive, and difficult to scale. Organizations need information to rapidly extract insights from documents. They can benefit from a smart document processing system as a foundation to automating business processes that rely on manual inputs and interventions.

To help meet these needs, the Enhanced Document Understanding on AWS solution:

  • Automates document ingestion process to improve operational efficiency and reduce cost.

  • Ingests and analyzes document files at scale using artificial intelligence (AI) and machine learning (ML).

  • Extracts text from documents.

  • Identifies structural data (such as single word, a line, a table, or individual cells within a table).

  • Extracts critical information (such as entities).

  • Creates smart search indexes from the data.

  • Detects and redacts PII and PHI to generate a redacted version of the original document.

You can use each of these features standalone or configure the solution as a unique composition of workflow orchestration based on your use case.

The solution also provides a web user interface (UI) for users to upload documents. Once the documents are uploaded, a backend workflow orchestrates AWS managed AI services to process documents at scale.

This implementation guide provides an overview of the Enhanced Document Understanding on AWS solution, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying Enhanced Document Understanding on AWS to the Amazon Web Services (AWS) Cloud.

The intended audience for implementing this solution in their environment includes solution architects, business decision makers, DevOps engineers, data scientists, and cloud professionals.

Use this navigation table to quickly find answers to these questions:

If you want to . . . Read . . .

Know the cost for running this solution.

The estimated cost for running this solution in the US East (N. Virginia) Region is USD $1,847.28 per month.

Cost
Understand the security considerations for this solution. Security
Know how to plan for quotas for this solution. Quotas
Know which AWS Regions are supported for this solution. Supported AWS Regions
Know how to configure different workflow options to meet business needs Architecture details
View or download the AWS CloudFormation template included in this solution to automatically deploy the infrastructure resources (the “stack”) for this solution. AWS CloudFormation template
Access the source code and optionally use the AWS Cloud Development Kit (AWS CDK) to deploy the solution. GitHub repository