DocumentReaderConfig - Amazon Comprehend

DocumentReaderConfig

The input properties for a topic detection job.

Contents

DocumentReadAction

This enum field will start with two values which will apply to PDFs:

  • TEXTRACT_DETECT_DOCUMENT_TEXT - The service calls DetectDocumentText for PDF documents per page.

  • TEXTRACT_ANALYZE_DOCUMENT - The service calls AnalyzeDocument for PDF documents per page.

Type: String

Valid Values: TEXTRACT_DETECT_DOCUMENT_TEXT | TEXTRACT_ANALYZE_DOCUMENT

Required: Yes

DocumentReadMode

This enum field provides two values:

  • SERVICE_DEFAULT - use service defaults for Document reading. For Digital PDF it would mean using an internal parser instead of Textract APIs

  • FORCE_DOCUMENT_READ_ACTION - Always use specified action for DocumentReadAction, including Digital PDF.

Type: String

Valid Values: SERVICE_DEFAULT | FORCE_DOCUMENT_READ_ACTION

Required: No

FeatureTypes

Specifies how the text in an input file should be processed:

Type: Array of strings

Array Members: Minimum number of 1 item. Maximum number of 2 items.

Valid Values: TABLES | FORMS

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: