Class: Aws::Comprehend::Types::AugmentedManifestsListItem

Inherits:
Struct
  • Object
show all
Defined in:
gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb

Overview

Note:

When making an API call, you may pass AugmentedManifestsListItem data as a hash:

{
  s3_uri: "S3Uri", # required
  split: "TRAIN", # accepts TRAIN, TEST
  attribute_names: ["AttributeNamesListItem"], # required
  annotation_data_s3_uri: "S3Uri",
  source_documents_s3_uri: "S3Uri",
  document_type: "PLAIN_TEXT_DOCUMENT", # accepts PLAIN_TEXT_DOCUMENT, SEMI_STRUCTURED_DOCUMENT
}

An augmented manifest file that provides training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth.

Constant Summary collapse

SENSITIVE =
[]

Instance Attribute Summary collapse

Instance Attribute Details

#annotation_data_s3_uriString

The S3 prefix to the annotation files that are referred in the augmented manifest file.

Returns:

  • (String)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end

#attribute_namesArray<String>

The JSON attribute that contains the annotations for your training documents. The number of attribute names that you specify depends on whether your augmented manifest file is the output of a single labeling job or a chained labeling job.

If your file is the output of a single labeling job, specify the LabelAttributeName key that was used when the job was created in Ground Truth.

If your file is the output of a chained labeling job, specify the LabelAttributeName key for one or more jobs in the chain. Each LabelAttributeName key provides the annotations from an individual job.

Returns:

  • (Array<String>)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end

#document_typeString

The type of augmented manifest. PlainTextDocument or SemiStructuredDocument. If you don't specify, the default is PlainTextDocument.

  • PLAIN_TEXT_DOCUMENT A document type that represents any unicode text that is encoded in UTF-8.

  • SEMI_STRUCTURED_DOCUMENT A document type with positional and structural context, like a PDF. For training with Amazon Comprehend, only PDFs are supported. For inference, Amazon Comprehend support PDFs, DOCX and TXT.

Returns:

  • (String)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end

#s3_uriString

The Amazon S3 location of the augmented manifest file.

Returns:

  • (String)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end

#source_documents_s3_uriString

The S3 prefix to the source files (PDFs) that are referred to in the augmented manifest file.

Returns:

  • (String)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end

#splitString

The purpose of the data you've provided in the augmented manifest. You can either train or test this data. If you don't specify, the default is train.

TRAIN - all of the documents in the manifest will be used for training. If no test documents are provided, Amazon Comprehend will automatically reserve a portion of the training documents for testing.

TEST - all of the documents in the manifest will be used for testing.

Returns:

  • (String)


89
90
91
92
93
94
95
96
97
98
# File 'gems/aws-sdk-comprehend/lib/aws-sdk-comprehend/types.rb', line 89

class AugmentedManifestsListItem < Struct.new(
  :s3_uri,
  :split,
  :attribute_names,
  :annotation_data_s3_uri,
  :source_documents_s3_uri,
  :document_type)
  SENSITIVE = []
  include Aws::Structure
end