What is Amazon Textract? - Amazon Textract

What is Amazon Textract?

Amazon Textract helps you add document text detection and analysis to your applications. Using Amazon Textract, you can do the following:

  • Detect typed and handwritten text in a variety of documents, including financial reports, medical records, and tax forms.

  • Extract text, forms, and tables from documents with structured data, using the Amazon Textract Document Analysis API.

  • Specify and extract information from documents using the Queries feature within the Amazon Textract Analyze Document API.

  • Process invoices and receipts with the AnalyzeExpense API.

  • Process ID documents such as drivers licenses and passports issued by U.S. government, using the AnalyzeID API.

  • Upload and process mortgage loan packages, through automatic routing of the the document pages to the appropriate Amazon Textract analysis operations using the Analyze Lending workflow. You can retrieve analysis results for each document page or you can retrieve summarized results for a set of document pages.

  • Use Custom Queries to customize the pretrained Queries feature using your data to support your down stream processing needs.

Amazon Textract is based on the same proven, highly scalable, deep-learning technology that was developed by Amazon's computer vision scientists to analyze billions of images and videos daily. You don't need any machine learning expertise to use it, as Amazon Textract includes simple, easy-to-use API operations that can analyze image files and PDF files. Amazon Textract is always learning from new data, and Amazon is continually adding new features to the service.

The following are common use cases for using Amazon Textract:

  • Creating an intelligent search index – Using Amazon Textract you can create libraries of text that is detected in image and PDF files.

  • Using intelligent text extraction for natural language processing (NLP) – Amazon Textract provides you with control over how text is grouped as an input for NLP applications. It can extract text as words and lines. It also groups text by table cells if Amazon Textract document table analysis is enabled.

  • Accelerating the capture and normalization of data from different sources – Amazon Textract enables text and tabular data extraction from a wide variety of documents, such as financial documents, research reports, and medical notes. With Amazon Textract Analyze Document APIs, you can easily and quickly extract unstructured and structured data from your documents.

  • Automating data capture from forms – Amazon Textract enables structured data to be extracted from forms. With Amazon Textract Analysis APIs, you can build extraction capabilities into existing business workflows so that user data submitted through forms can be extracted into a usable format.

  • Automating document classification and extraction – With Amazon Textract's Analyze Lending document processing API, you can automate the classification of lending documents into various document classes, and then automatically route the classified pages to the correct analysis operation for further processing.

Some of the benefits of using Amazon Textract include:

  • Integration of document text detection into your apps – Amazon Textract removes the complexity of building text detection capabilities into your applications by making powerful and accurate analysis available with a simple API. You don’t need computer vision or deep learning expertise to use Amazon Textract to detect document text. With Amazon Textract Text APIs, you can easily build text detection into any web, mobile, or connected device application.

  • Scalable document analysis – Amazon Textract enables you to analyze and extract data quickly from millions of documents, which can accelerate decision making.

  • Low cost – With Amazon Textract, you only pay for the documents you analyze. There are no minimum fees or upfront commitments. You can get started for free, and save more as you grow with our tiered pricing model.

With synchronous processing, Amazon Textract can analyze single-page documents for applications where latency is critical. Amazon Textract also provides asynchronous operations to extend support to multipage documents.

Amazon Textract's API operations have quotas that limit how quickly and how often you can use them. If the limit set for your account is frequently exceeded, you can request a limit increase. To change a limit, select the Amazon Textract option in the Service Quotas console. You can use the Quotas Calculator in the Amazon Textract console to determine your quota requirements. To learn more about default quotas that can be changed, see Information on Default Quotas in Amazon Textract.

Other quotas, like file size and languages supported by Amazon Textract, cannot be changed. For more information on set quotas, see Set Quotas in Amazon Textract.

First-Time Amazon Textract Users

If this is your first time using Amazon Textract, we recommend that you read the following sections in order:

  1. How Amazon Textract Works – This section introduces the Amazon Textract components and how they work together for an end-to-end experience.

  2. Getting Started with Amazon Textract – In this section, you set up your account and test the Amazon Textract API.