Analyzing Invoices and Receipts
Amazon Textract extracts relevant data such as vendor and receiver contact information, from almost any invoice or receipt without the need for any templates or configuration. Invoices and receipts often use various layouts, making it difficult and time-consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts. It automatically extracts data such as invoice or receipt date, invoice or receipt number, item prices, total amount, and payment terms.
Amazon Textract also identifies vendor names that are critical for your workflows but may not be explicitly labeled. For example, Amazon Textract can find the vendor name on a receipt even if it's only indicated within a logo at the top of the page without an explicit key-value pair combination.
Amazon Textract also makes it easy for you to consolidate input from diverse receipts
and invoices that use different words for the same concept. For example, Amazon Textract
maps relationships between field names in different documents such as bill number,
invoice number, receipt number, outputting standard taxonomy as
INVOICE_RECEIPT_ID
. In this case, Amazon Textract represents data
consistently across different document types. The address fields are categorized as
'receiver', 'supplier', 'vendor', 'bill to', 'ship to', and 'remit to'. When expense
documents do not have unique values for each of these categories, Amazon Textract will return
only the categories will unique values.
Fields that do not align with the standard taxonomy are categorized as
OTHER
.
Following is a list of standard fields supported by expense analysis operations.
-
Invoice Receipt Date —
INVOICE_RECEIPT_DATE
-
Invoice Receipt ID —
INVOICE_RECEIPT_ID
-
Invoice Tax Payer ID —
TAX_PAYER_ID
-
Customer Number —
CUSTOMER_NUMBER
-
Account Number —
ACCOUNT_NUMBER
-
Vendor Name —
VENDOR_NAME
-
Receiver Name —
RECEIVER_NAME
-
Vendor Address —
VENDOR_ADDRESS
-
Receiver Address —
RECEIVER_ADDRESS
-
Order Date —
ORDER_DATE
-
Due Date —
DUE_DATE
-
Delivery Date —
DELIVERY_DATE
-
PO Number —
PO_NUMBER
-
Payment Terms —
PAYMENT_TERMS
-
Total —
TOTAL
-
Amount Due —
AMOUNT_DUE
-
Amount Paid —
AMOUNT_PAID
-
Subtotal —
SUBTOTAL
-
Tax —
TAX
-
Service Charge —
SERVICE_CHARGE
-
Gratuity —
GRATUITY
-
Prior Balance —
PRIOR_BALANCE
-
Discount —
DISCOUNT
-
Shipping and Handling Charge —
SHIPPING_HANDLING_CHARGE
-
Vendor ABN Number —
VENDOR_ABN_NUMBER
-
Vendor GST Number —
VENDOR_GST_NUMBER
-
Vendor PAN Number —
VENDOR_PAN_NUMBER
-
Vendor VAT Number —
VENDOR_VAT_NUMBER
-
Receiver ABN Number —
RECEIVER_ABN_NUMBER
-
Receiver GST Number —
RECEIVER_GST_NUMBER
-
Receiver PAN Number —
RECEIVER_PAN_NUMBER
-
Receiver VAT Number —
RECEIVER_VAT_NUMBER
-
Vendor Phone —
VENDOR_PHONE
-
Receiver Phone —
RECEIVER_PHONE
-
Vendor URL —
VENDOR_URL
-
Line Item/Item Description —
ITEM
-
Line Item/Quantity —
QUANTITY
-
Line Item/Total Price —
PRICE
-
Line Item/Unit Price —
UNIT_PRICE
-
Line Item/ProductCode —
PRODUCT_CODE
-
Address (Bill To, Ship To, Remit To, Supplier) —
ADDRESS
-
Name (Bill To, Ship To, Remit To, Supplier) —
NAME
-
Core Address (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
ADDRESS_BLOCK
-
Street Address (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
STREET
-
City (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
CITY
-
State (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
STATE
-
Country (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
COUNTRY
-
ZIP Code (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) —
ZIP_CODE
The AnalyzeExpense API returns the following elements for a given document page:
-
The number of receipts or invoices within a document represented as
ExpenseIndex
-
The standardized name for individual fields represented as
Type
-
The actual name of the field as it appears on the document, represented as
LabelDetection
-
The value of the corresponding field represented as
ValueDetection
-
The number of pages within the submitted document represented as
Pages
-
The page number on which the field, value, or line items are detected, represented as
PageNumber
-
The geometry, which includes the bounding box and coordinates location of the individual field, value, or line items on the page, represented as
Geometry
-
The confidence score associated with each piece of data detected on the document, represented as
Confidence
-
The entire row of individual line items purchased, represented as
EXPENSE_ROW
The following is a portion of the API output for a receipt processed by AnalyzeExpense
that shows the Total: $55.64 in the document extracted as standard field
TOTAL
. Actual text on the document appears as “Total,” Confidence Score
as “97.1,” Page Number as “1,” and the total value as “$55.64.” This also includes the
bounding box and polygon coordinates:
{ "Type": { "Text": "TOTAL", "Confidence": 99.94717407226562 }, "LabelDetection": { "Text": "Total:", "Geometry": { "BoundingBox": { "Width": 0.09809663146734238, "Height": 0.0234375, "Left": 0.36822840571403503, "Top": 0.8017578125 }, "Polygon": [ { "X": 0.36822840571403503, "Y": 0.8017578125 }, { "X": 0.466325044631958, "Y": 0.8017578125 }, { "X": 0.466325044631958, "Y": 0.8251953125 }, { "X": 0.36822840571403503, "Y": 0.8251953125 } ] }, "Confidence": 97.10792541503906 }, "ValueDetection": { "Text": "$55.64", "Currency": { "Code": USD } "Geometry": { "BoundingBox": { "Width": 0.10395314544439316, "Height": 0.0244140625, "Left": 0.66837477684021, "Top": 0.802734375 }, "Polygon": [ { "X": 0.66837477684021, "Y": 0.802734375 }, { "X": 0.7723279595375061, "Y": 0.802734375 }, { "X": 0.7723279595375061, "Y": 0.8271484375 }, { "X": 0.66837477684021, "Y": 0.8271484375 } ] }, "Confidence": 99.85165405273438 }, "PageNumber": 1 }
You can use synchronous operations to analyze an invoice or receipt. To analyze these
documents, you use the AnalyzeExpense operation and pass a receipt or invoice to it.
AnalyzeExpense
returns the entire set of results. For more information,
see Analyzing Invoices and Receipts with Amazon Textract.
To analyze invoices and receipts asynchronously, use StartExpenseAnalysis to
start processing an input document file. To get the results, call GetExpenseAnalysis. The
results for a given call to StartExpenseAnalysis are returned by
GetExpenseAnalysis
. For more information and an example, see Processing Documents Asynchronously.