InputDataConfig - Amazon Comprehend API Reference

InputDataConfig

The input properties for an inference job. The document reader config field applies only to non-text inputs for custom analysis.

Contents

DocumentReaderConfig

Provides configuration parameters to override the default actions for extracting text from PDF documents and image files.

Type: DocumentReaderConfig object

Required: No

InputFormat

Specifies how the text in an input file should be processed:

  • ONE_DOC_PER_FILE - Each file is considered a separate document. Use this option when you are processing large documents, such as newspaper articles or scientific papers.

  • ONE_DOC_PER_LINE - Each line in a file is considered a separate document. Use this option when you are processing many short documents, such as text messages.

Type: String

Valid Values: ONE_DOC_PER_FILE | ONE_DOC_PER_LINE

Required: No

S3Uri

The Amazon S3 URI for the input data. The URI must be in same Region as the API endpoint that you are calling. The URI can point to a single input file or it can provide the prefix for a collection of data files.

For example, if you use the URI S3://bucketName/prefix, if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input.

Type: String

Length Constraints: Maximum length of 1024.

Pattern: s3://[a-z0-9][\.\-a-z0-9]{1,61}[a-z0-9](/.*)?

Required: Yes

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: