Tables
Amazon Textract can extract tables in a document, and extract cells, merged cells, and column headers within a table. For example, when the following table is detected in a document, Amazon Textract detects a table with thirty cells, 3 merged cells, and 5 cells that are column headers.

Detected tables are returned as Block objects in the responses from AnalyzeDocument and
GetDocumentAnalysis. You can use the
FeatureTypes
input parameter to retrieve information about
key-value pairs, tables, or both. For tables only, use the value
TABLES
. For an example, see Exporting Tables into a CSV File. For general information about
how a document is represented by Block
objects, see Text Detection and Document Analysis
Response Objects.
The following diagram shows how a single cell in a table is represented by
Block
objects.

A cell contains WORD
blocks for detected words, and
SELECTION_ELEMENT
blocks for selection elements such as check
boxes.
The following is partial JSON for the preceding table, which has 23 cells (counting each merged cell as one cell). The PAGE Block object has a list of CHILD Block IDs for the TABLE block and each LINE of text that's detected.
The PAGE Block object has a list of CHILD Block IDs for the TABLE block and each LINE of text that's detected.
{ { "BlockType": "PAGE", "Geometry": { "BoundingBox": { "Width": 1.0, "Height": 1.0, "Left": 0.0, "Top": 0.0 }, }, "Id": "ddfcf314-11bb-4088-ae57-542843c4f7b1", "Relationships": [ { "Type": "CHILD", "Ids": [ "7cf63193-d316-4cf5-8fa3-0210a19a8ef0", "06d7f631-2ce0-4f8e-8a02-b05a54b09900", "05830c30-a8b7-43f4-9744-d0a85db92ca7", "c9ca14ac-f8f1-4d90-8ef9-3a85493d348e", "da6052ad-b763-428b-8a5f-88818a34ed33", "c98ffa0d-19f3-44f9-a2a1-40596fe8a990", "af191f61-793d-47df-aa99-043c2d460873", "394a838c-82eb-4ca4-b1fb-d7c596051d51", "d575c7f2-580e-4e39-8012-09d406a52171", "cccb2ecf-618e-4928-ae7c-668f6508f26c", "566ee5ac-408f-4a1b-aa4d-c4f0e90b9ecc", "6290f376-3431-4c74-8547-edc740bae6e2", "6ece89b1-dcd1-4f0f-91be-523510f19ed1", "ea9677a3-6745-4013-ae85-e6f6d4fce468", "42ebe479-081f-4bdc-ad0d-747e15f6e838", "acfc52ea-46d1-4203-8a6d-c30a00ca9d07", "9c3953ce-1aa1-431a-b663-eb6d32e18ef4", "49160b50-c75a-46b0-a2b9-340e6edd8b40", "c15ade38-231c-422f-8b2a-122937d01277", "63e465d6-c927-4367-895d-4403ba666b2f", "82dd37f1-f17b-4fa1-864a-c2f2daf3bfb5" ] } ] }, },
To learn more about the TABLE
, access the TABLE
Block object. The TABLE
block includes two types of relationships:
“Child” and “Merged Cells”. For relationship type “CHILD”, each child ID
represents a single cell within the table. A merged cell will be broken down all
the individual cells that are combined to make one merged cell. The following
JSON shows that the preceding table has 30 cells for 6 rows and 5 columns, which
are listed in the Ids
array. For relationship type “MERGED_CELL”,
each merged cell ID represents a single merged cell within the table. The
following JSON shows that the table has 3 merged cells, which are listed in the
Ids
array.
{ "BlockType": "TABLE", "Confidence": 99.79180145263672, "Geometry": {}, "Id": "82dd37f1-f17b-4fa1-864a-c2f2daf3bfb5", "Relationships": [ { "Type": "CHILD", "Ids": [ "cda64a58-28d2-47d9-857e-bd7fd9f99d57", "3bc1c578-b733-4e69-af6b-a7ae44b4be5d", "c31985ad-959a-42b7-b150-91d1b446261a", "ec1de7f2-f402-48a2-a787-f92e1559dd3a", "28680139-13be-4a36-bb5d-a47d01d5f8bd", "e106eaff-8269-43c7-9217-22fa9af654d1", "12e26ecd-a58a-408b-a2a3-276752bb0fe4", "f5174a5d-6c69-4626-b96b-80bbfa7190c9", "59949cfc-a2bf-4106-9b04-ace67a95ed5f", "09ee8754-be3c-4c6e-bc03-afd54eb376de", "26e02458-559a-4baf-91f8-4c4cd23e19a6", "fb50ddd5-242e-4468-80d3-3802ed61279a", "44546a71-1f3e-4b96-8811-fd049f254bea", "d991cbe5-3f5c-4c97-887c-ae6c907bb33d", "4952bd89-096f-42f0-afe7-eb35a0378089", "00d2969f-79c3-40c3-b37d-bfd49a36c599", "4df5c57d-5677-4fa9-af7d-ca3ae3aa75aa", "4f502514-c231-4722-a3a2-ba10d3c681b4", "f8ea9a9e-afd3-472c-81d3-112b5dbdfa8c", "13d6cebf-e8ff-4195-b47e-28aabde23e46", "da50d1c0-6c3b-432b-a376-b0a55e2ea9d7", "3dbc11a3-c1ba-4b91-ae50-eb372ac60d2c", "35af88fc-fce7-4841-8794-a83aaa7bebde", "a95c0b14-2043-444c-9c02-a1b664735d90", "cf9e1225-31b3-4ddd-8150-63f88ad11d1f", "d0fa361f-b30c-4b75-a616-6971b3695dc0", "bf7eed90-df5e-4c6a-acb9-23c03b941e0e", "b711c551-432b-4e32-bf95-a5e7d2e28fec", "a69add2b-3afe-4dd0-88e8-f4c61bcf4486", "fa3ef61d-0408-4b89-9732-cff3872d004e" ] }, }, { "Type": "MERGED_CELL", "Ids": [ "7063a4ff-4e60-4d8a-ba44-b79e5c0e22f4", "21f2ef7d-2fda-4d72-b771-4ce67cd505ec", "6b46b39c-9bbc-47f6-af09-3e8c71697dd8" ] }
The Block type for each table cell is CELL
. The CELL Block type
will always have row span of 1 and column span of 1. The Block object for each
cell includes information about the cell location compared to other cells in the
table. It also includes geometry information for the location of the cell on the
document. If addition, an ENTITY TYPE
of COLUMN HEADER
identifies CELLs
that are column headers in the table. In the
preceding example, dacc81f7-e304-43e2-b5ab-9fc45ec24d95
is the
child ID for the cell that contains the word ‘Date’ and this cell is a column
header, see below.
{ "BlockType": "CELL", "Confidence": 93.32925415039062, "RowIndex": 1, "ColumnIndex": 1, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {...}, "Id": "cda64a58-28d2-47d9-857e-bd7fd9f99d57", "Relationships": [ { "Type": "CHILD", "Ids": [ "b49e883c-bd8b-43e2-aed6-0aa93a52b2b1" ] } ], "EntityTypes": [ "COLUMN_HEADER" ] },
For the cell that contains word ‘Deposit’, the cell is not a column header as
shown by the lack of field "EntityTypes": "COLUMN_HEADER"
.
{ "BlockType": "CELL", "Confidence": 92.3740005493164, "RowIndex": 1, "ColumnIndex": 4, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {...}, "Id": "ec1de7f2-f402-48a2-a787-f92e1559dd3a", "Relationships": [
All the merged cells are listed under "Type": "MERGED_CELL"
. In
the example table, we have 3 merged cells.
"Type": "MERGED_CELL", "Ids": [ "7d0ed24e-ec5e-47ef-998f-8aed26218d76", "27ccf2ef-ff8f-45a1-ad1d-45d9c06697fc", "0a0d399a-9ae3-4f92-b9eb-c5c9038a585c" ]
To find specific details associated with each merged cell, go to
"BlockType": "MERGED_CELL"
. For the merged cell “Previous
Balance”, the ID associated with it is
"7d0ed24e-ec5e-47ef-998f-8aed26218d76".
There are 4 cells that constitute this merged cell as seen by the "ColumnSpan"
of 4. To find the text within the merged cell, proceed further down to the Ids
array to find details on "BlockType": "CELL"
followed by
"BlockType": "WORD"
.
"BlockType": "MERGED_CELL", "Confidence": 92.46913146972656, "RowIndex": 2, "ColumnIndex": 1, "RowSpan": 1, "ColumnSpan": 4, "Geometry": {...}, "Id": "7063a4ff-4e60-4d8a-ba44-b79e5c0e22f4", "Relationships": [ { "Type": "CHILD", "Ids": [ "e106eaff-8269-43c7-9217-22fa9af654d1", "12e26ecd-a58a-408b-a2a3-276752bb0fe4", "f5174a5d-6c69-4626-b96b-80bbfa7190c9", "59949cfc-a2bf-4106-9b04-ace67a95ed5f" ] } ] },
On the cell level, we have 4 cells for this merged cell “Previous Balance”.
{ "BlockType": "CELL", "Confidence": 92.35404205322266, "RowIndex": 2, "ColumnIndex": 1, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {…}, "Id": "2a5b5530-afc9-49ac-8c18-208c6f28ac51", "Relationships": [ { "Type": "CHILD", "Ids": [ "c398d626-eff0-4cdd-af52-4f4961585778"]}]}, { "BlockType": "CELL", "Confidence": 92.35404205322266, "RowIndex": 2, "ColumnIndex": 2, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {…}, "Id": "6b70be2d-403b-4fd3-b409-a6dcbad0d5ff", "Relationships": [ { "Type": "CHILD", "Ids": [ "22c9cb13-2016-45b4-a4ce-2a10dd4b3cc4"]}]}, { "BlockType": "CELL", "Confidence": 92.35404205322266, "RowIndex": 2, "ColumnIndex": 3, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {…}, "Id": "249baa50-17a6-45e2-abf5-c1546dc84798"}, { "BlockType": "CELL", "Confidence": 92.35404205322266, "RowIndex": 2, "ColumnIndex": 4, "RowSpan": 1, "ColumnSpan": 1, "Geometry": {…}, "Id": "947d71fb-b8af-47be-b1ab-695690a71552" }
On word level, there are two words, “Previous” and “Balance”. Since the last two cells on column 3 and 4 are blank, there are no words associated with them.
{ "BlockType": "WORD", "Confidence": 99.517578125, "Text": "Previous", "TextType": "PRINTED", "Geometry": {…}, "Id": "c398d626-eff0-4cdd-af52-4f4961585778" } { "BlockType": "WORD", "Confidence": 99.52200317382812, "Text": "Balance", "TextType": "PRINTED", "Geometry": {…}, "Id": "22c9cb13-2016-45b4-a4ce-2a10dd4b3cc4" }