DatasetInputDataConfig - Amazon Comprehend API Reference

DatasetInputDataConfig

Specifies the format and location of the input data for the dataset.

Contents

AugmentedManifests

A list of augmented manifest files that provide training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth.

Type: Array of DatasetAugmentedManifestsListItem objects

Required: No

DataFormat

COMPREHEND_CSV: The data format is a two-column CSV file, where the first column contains labels and the second column contains documents.

AUGMENTED_MANIFEST: The data format

Type: String

Valid Values: COMPREHEND_CSV | AUGMENTED_MANIFEST

Required: No

DocumentClassifierInputDataConfig

The input properties for training a document classifier model.

For more information on how the input file is formatted, see Preparing training data in the Comprehend Developer Guide.

Type: DatasetDocumentClassifierInputDataConfig object

Required: No

EntityRecognizerInputDataConfig

The input properties for training an entity recognizer model.

Type: DatasetEntityRecognizerInputDataConfig object

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: