Form Data (Key-Value Pairs) - Amazon Textract

Form Data (Key-Value Pairs)

Amazon Textract can extract form data from documents as key-value pairs. For example, in the following text, Amazon Textract can identify a key (Name:) and a value (Ana Carolina).

Name: Ana Carolina

Detected key-value pairs are returned as Block objects in the responses from AnalyzeDocument and GetDocumentAnalysis. You can use the FeatureTypes input parameter to retrieve information about key-value pairs, tables, or both. For key-value pairs only, use the value FORMS. For an example, see Extracting Key-Value Pairs from a Form Document. For general information about how a document is represented by Block objects, see Text Detection and Document Analysis Response Objects.

Dates found through key-value pair detection are returned exactly as detected on the input document, with most date formats supported.

Block objects with the type KEY_VALUE_SET are the containers for KEY or VALUE Block objects that store information about linked text items detected in a document. You can use the EntityType attribute to determine if a block is a KEY or a VALUE.

  • A KEY object contains information about the key for linked text. For example, Name:. A KEY block has two relationship lists. A relationship of type VALUE is a list that contains the ID of the VALUE block associated with the key. A relationship of type CHILD is a list of IDs for the WORD blocks that make up the text of the key.

  • A VALUE object contains information about the text associated with a key. In the preceding example, Ana Carolina is the value for the key Name:. A VALUE block has a relationship with a list of CHILD blocks that identify WORD blocks. Each WORD block contains one of the words that make up the text of the value. A VALUE object can also contain information about selected elements. For more information, see Selection Elements.

Amazon Textract returns the same confidence value for both KEY and VALUE in a KEY_VALUE_SET, as both KEY and VALUE are evaluated as a pair. It returns a different confidence value for a word in WORD blocks.

Each instance of a KEY_VALUE_SET Block object is a child of the PAGE Block object that corresponds to the current page.

The following diagram shows how the key-value pair Name: Ana Carolina is represented by Block objects.

The following examples show how the key-value pair Name: Ana Carolina is represented by JSON.

The PAGE block has CHILD blocks of type KEY_VALUE_SET for each KEY and VALUE block detected in the document.

{ "Geometry": .... "Relationships": [ { "Type": "CHILD", "Ids": [ "2602b0a6-20e3-4e6e-9e46-3be57fd0844b", "82aedd57-187f-43dd-9eb1-4f312ca30042", "52be1777-53f7-42f6-a7cf-6d09bdc15a30", // Key - Name: "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value - Ana Caroline ] } ], "BlockType": "PAGE", "Id": "8136b2dc-37c1-4300-a9da-6ed8b276ea97" // Page identifier },

The following JSON shows that the KEY block (52be1777-53f7-42f6-a7cf-6d09bdc15a30) has a relationship with the VALUE block (7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c). It also has a CHILD block for the WORD block (c734fca6-c4c4-415c-b6c1-30f7510b72ee) that contains the text for the key (Name:).

{ "Relationships": [ { "Type": "VALUE", "Ids": [ "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier ] }, { "Type": "CHILD", "Ids": [ "c734fca6-c4c4-415c-b6c1-30f7510b72ee" // Name: ] } ], "Confidence": 51.55965805053711, "Geometry": ...., "BlockType": "KEY_VALUE_SET", "EntityTypes": [ "KEY" ], "Id": "52be1777-53f7-42f6-a7cf-6d09bdc15a30" //Key identifier },

The following JSON shows that VALUE block 7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c has a CHILD list of IDs for the WORD blocks that make up the text of the value (Ana and Carolina).

{ "Relationships": [ { "Type": "CHILD", "Ids": [ "db553509-64ef-4ecf-ad3c-bea62cc1cd8a", // Ana "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3" // Carolina ] } ], "Confidence": 51.55965805053711, "Geometry": ...., "BlockType": "KEY_VALUE_SET", "EntityTypes": [ "VALUE" ], "Id": "7ca7caa6-00ef-4cda-b1aa-5571dfed1a7c" // Value identifier }

The following JSON shows the Block objects for the words Name:, Ana, and Carolina.

{ "Geometry": {...}, "Text": "Name:", "TextType": "PRINTED". "BlockType": "WORD", "Confidence": 99.56285858154297, "Id": "c734fca6-c4c4-415c-b6c1-30f7510b72ee" }, { "Geometry": {...}, "Text": "Ana", "TextType": "PRINTED", "BlockType": "WORD", "Confidence": 99.52057647705078, "Id": "db553509-64ef-4ecf-ad3c-bea62cc1cd8a" }, { "Geometry": {...}, "Text": "Carolina", "TextType": "PRINTED", "BlockType": "WORD", "Confidence": 99.84207916259766, "Id": "e5d7646c-eaa2-413a-95ad-f4ae19f53ef3" },