Comprehend: New-COMPDocumentClassifier Cmdlet

Synopsis

Calls the Amazon Comprehend CreateDocumentClassifier API operation.

Syntax

New-COMPDocumentClassifier
-DocumentClassifierName <String>
-InputDataConfig_AugmentedManifest <AugmentedManifestsListItem[]>
-ClientRequestToken <String>
-DataAccessRoleArn <String>
-InputDataConfig_DataFormat <DocumentClassifierDataFormat>
-DocumentReaderConfig_DocumentReadAction <DocumentReadAction>
-DocumentReaderConfig_DocumentReadMode <DocumentReadMode>
-InputDataConfig_DocumentType <DocumentClassifierDocumentTypeFormat>
-DocumentReaderConfig_FeatureType <String[]>
-OutputDataConfig_FlywheelStatsS3Prefix <String>
-OutputDataConfig_KmsKeyId <String>
-InputDataConfig_LabelDelimiter <String>
-LanguageCode <LanguageCode>
-Mode <DocumentClassifierMode>
-ModelKmsKeyId <String>
-ModelPolicy <String>
-Documents_S3Uri <String>
-InputDataConfig_S3Uri <String>
-OutputDataConfig_S3Uri <String>
-VpcConfig_SecurityGroupId <String[]>
-VpcConfig_Subnet <String[]>
-Tag <Tag[]>
-Documents_TestS3Uri <String>
-InputDataConfig_TestS3Uri <String>
-VersionName <String>
-VolumeKmsKeyId <String>
-Select <String>
-Force <SwitchParameter>
-ClientConfig <AmazonComprehendConfig>

Description

Creates a new document classifier that you can use to categorize documents. To create a classifier, you provide a set of training documents that are labeled with the categories that you want to use. For more information, see Training classifier models in the Comprehend Developer Guide.

Parameters

-ClientConfig <AmazonComprehendConfig>

Amazon.PowerShell.Cmdlets.COMP.AmazonComprehendClientCmdlet.ClientConfig

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-ClientRequestToken <String>

A unique identifier for the request. If you don't set the client request token, Amazon Comprehend generates one.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-DataAccessRoleArn <String>

The Amazon Resource Name (ARN) of the IAM role that grants Amazon Comprehend read access to your input data.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-DocumentClassifierName <String>

The name of the document classifier.

Required?	True
Position?	1
Accept pipeline input?	True (ByValue, ByPropertyName)

-DocumentReaderConfig_DocumentReadAction <DocumentReadAction>

This field defines the Amazon Textract API operation that Amazon Comprehend uses to extract text from PDF files and image files. Enter one of the following values:

TEXTRACT_DETECT_DOCUMENT_TEXT - The Amazon Comprehend service uses the DetectDocumentText API operation.
TEXTRACT_ANALYZE_DOCUMENT - The Amazon Comprehend service uses the AnalyzeDocument API operation.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_DocumentReaderConfig_DocumentReadAction

-DocumentReaderConfig_DocumentReadMode <DocumentReadMode>

Determines the text extraction actions for PDF files. Enter one of the following values:

SERVICE_DEFAULT - use the Amazon Comprehend service defaults for PDF files.
FORCE_DOCUMENT_READ_ACTION - Amazon Comprehend uses the Textract API specified by DocumentReadAction for all PDF files, including digital PDF files.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_DocumentReaderConfig_DocumentReadMode

-DocumentReaderConfig_FeatureType <String[]>

Specifies the type of Amazon Textract features to apply. If you chose TEXTRACT_ANALYZE_DOCUMENT as the read action, you must specify one or both of the following values:

TABLES - Returns additional information about any tables that are detected in the input document.
FORMS - Returns additional information about any forms that are detected in the input document.

Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_DocumentReaderConfig_FeatureTypes

-Documents_S3Uri <String>

The S3 URI location of the training documents specified in the S3Uri CSV file.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_Documents_S3Uri

-Documents_TestS3Uri <String>

The S3 URI location of the test documents included in the TestS3Uri CSV file. This field is not required if you do not specify a test CSV file.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_Documents_TestS3Uri

-Force <SwitchParameter>

This parameter overrides confirmation prompts to force the cmdlet to continue its operation. This parameter should always be used with caution.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_AugmentedManifest <AugmentedManifestsListItem[]>

A list of augmented manifest files that provide training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth.This parameter is required if you set DataFormat to AUGMENTED_MANIFEST. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_AugmentedManifests

-InputDataConfig_DataFormat <DocumentClassifierDataFormat>

The format of your training data:

COMPREHEND_CSV: A two-column CSV file, where labels are provided in the first column, and documents are provided in the second. If you use this value, you must provide the S3Uri parameter in your request.
AUGMENTED_MANIFEST: A labeled dataset that is produced by Amazon SageMaker Ground Truth. This file is in JSON lines format. Each line is a complete JSON object that contains a training document and its associated labels. If you use this value, you must provide the AugmentedManifests parameter in your request.

If you don't specify a value, Amazon Comprehend uses COMPREHEND_CSV as the default.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_DocumentType <DocumentClassifierDocumentTypeFormat>

The type of input documents for training the model. Provide plain-text documents to create a plain-text model, and provide semi-structured documents to create a native document model.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_LabelDelimiter <String>

Indicates the delimiter used to separate each label for training a multi-label classifier. The default delimiter between labels is a pipe (|). You can use a different character as a delimiter (if it's an allowed character) by specifying it under Delimiter for labels. If the training documents use a delimiter other than the default or the delimiter you specify, the labels on that line will be combined to make a single unique label, such as LABELLABELLABEL.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_S3Uri <String>

The Amazon S3 URI for the input data. The S3 bucket must be in the same Region as the API endpoint that you are calling. The URI can point to a single input file or it can provide the prefix for a collection of input files.For example, if you use the URI S3://bucketName/prefix, if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input.This parameter is required if you set DataFormat to COMPREHEND_CSV.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_TestS3Uri <String>

This specifies the Amazon S3 location that contains the test annotations for the document classifier. The URI must be in the same Amazon Web Services Region as the API endpoint that you are calling.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-LanguageCode <LanguageCode>

The language of the input documents. You can specify any of the languages supported by Amazon Comprehend. All documents must be in the same language.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Mode <DocumentClassifierMode>

Indicates the mode in which the classifier will be trained. The classifier can be trained in multi-class (single-label) mode or multi-label mode. Multi-class mode identifies a single class label for each document and multi-label mode identifies one or more class labels for each document. Multiple labels for an individual document are separated by a delimiter. The default delimiter between labels is a pipe (|).

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-ModelKmsKeyId <String>

ID for the KMS key that Amazon Comprehend uses to encrypt trained custom models. The ModelKmsKeyId can be either of the following formats:

KMS Key ID: "1234abcd-12ab-34cd-56ef-1234567890ab"
Amazon Resource Name (ARN) of a KMS Key: "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-ModelPolicy <String>

The resource-based policy to attach to your custom document classifier model. You can use this policy to allow another Amazon Web Services account to import your custom model.Provide your policy as a JSON body that you enter as a UTF-8 encoded string without line breaks. To provide valid JSON, enclose the attribute names and values in double quotes. If the JSON body is also enclosed in double quotes, then you must escape the double quotes that are inside the policy:"{\"attribute\": \"value\", \"attribute\": [\"value\"]}"To avoid escaping quotes, you can use single quotes to enclose the policy and double quotes to enclose the JSON names and values:'{"attribute": "value", "attribute": ["value"]}'

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-OutputDataConfig_FlywheelStatsS3Prefix <String>

The Amazon S3 prefix for the data lake location of the flywheel statistics.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-OutputDataConfig_KmsKeyId <String>

ID for the Amazon Web Services Key Management Service (KMS) key that Amazon Comprehend uses to encrypt the output results from an analysis job. The KmsKeyId can be one of the following formats:

KMS Key ID: "1234abcd-12ab-34cd-56ef-1234567890ab"
Amazon Resource Name (ARN) of a KMS Key: "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
KMS Key Alias: "alias/ExampleAlias"
ARN of a KMS Key Alias: "arn:aws:kms:us-west-2:111122223333:alias/ExampleAlias"

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-OutputDataConfig_S3Uri <String>

When you use the OutputDataConfig object while creating a custom classifier, you specify the Amazon S3 location where you want to write the confusion matrix and other output files. The URI must be in the same Region as the API endpoint that you are calling. The location is used as the prefix for the actual location of this output file.When the custom classifier job is finished, the service creates the output file in a directory specific to the job. The S3Uri field contains the location of the output file, called output.tar.gz. It is a compressed archive that contains the confusion matrix.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Select <String>

Use the -Select parameter to control the cmdlet output. The default value is 'DocumentClassifierArn'. Specifying -Select '*' will result in the cmdlet returning the whole service response (Amazon.Comprehend.Model.CreateDocumentClassifierResponse). Specifying the name of a property of type Amazon.Comprehend.Model.CreateDocumentClassifierResponse will result in that property being returned. Specifying -Select '^ParameterName' will result in the cmdlet returning the selected cmdlet parameter value.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Tag <Tag[]>

Tags to associate with the document classifier. A tag is a key-value pair that adds as a metadata to a resource used by Amazon Comprehend. For example, a tag with "Sales" as the key might be added to a resource to indicate its use by the sales department. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	Tags

-VersionName <String>

The version name given to the newly created classifier. Version names can have a maximum of 256 characters. Alphanumeric characters, hyphens (-) and underscores (_) are allowed. The version name must be unique among all models with the same classifier name in the Amazon Web Services account/Amazon Web Services Region.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-VolumeKmsKeyId <String>

ID for the Amazon Web Services Key Management Service (KMS) key that Amazon Comprehend uses to encrypt data on the storage volume attached to the ML compute instance(s) that process the analysis job. The VolumeKmsKeyId can be either of the following formats:

KMS Key ID: "1234abcd-12ab-34cd-56ef-1234567890ab"
Amazon Resource Name (ARN) of a KMS Key: "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-VpcConfig_SecurityGroupId <String[]>

The ID number for a security group on an instance of your private VPC. Security groups on your VPC function serve as a virtual firewall to control inbound and outbound traffic and provides security for the resources that you’ll be accessing on the VPC. This ID number is preceded by "sg-", for instance: "sg-03b388029b0a285ea". For more information, see Security Groups for your VPC. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	VpcConfig_SecurityGroupIds

-VpcConfig_Subnet <String[]>

The ID for each subnet being used in your private VPC. This subnet is a subset of the a range of IPv4 addresses used by the VPC and is specific to a given availability zone in the VPC’s Region. This ID number is preceded by "subnet-", for instance: "subnet-04ccf456919e69055". For more information, see VPCs and Subnets. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	VpcConfig_Subnets

Common Credential and Region Parameters

-AccessKey <String>

The AWS access key for the user account. This can be a temporary access key if the corresponding session token is supplied to the -SessionToken parameter.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	AK

-Credential <AWSCredentials>

An AWSCredentials object instance containing access and secret key information, and optionally a token for session-based credentials.

Required?	False
Position?	Named
Accept pipeline input?	True (ByValue, ByPropertyName)

-EndpointUrl <String>

The endpoint to make the call against.Note: This parameter is primarily for internal AWS use and is not required/should not be specified for normal usage. The cmdlets normally determine which endpoint to call based on the region specified to the -Region parameter or set as default in the shell (via Set-DefaultAWSRegion). Only specify this parameter if you must direct the call to a specific custom endpoint.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-NetworkCredential <PSCredential>

Used with SAML-based authentication when ProfileName references a SAML role profile. Contains the network credentials to be supplied during authentication with the configured identity provider's endpoint. This parameter is not required if the user's default network identity can or should be used during authentication.

Required?	False
Position?	Named
Accept pipeline input?	True (ByValue, ByPropertyName)

-ProfileLocation <String>

Used to specify the name and location of the ini-format credential file (shared with the AWS CLI and other AWS SDKs)If this optional parameter is omitted this cmdlet will search the encrypted credential file used by the AWS SDK for .NET and AWS Toolkit for Visual Studio first. If the profile is not found then the cmdlet will search in the ini-format credential file at the default location: (user's home directory)\.aws\credentials.If this parameter is specified then this cmdlet will only search the ini-format credential file at the location given.As the current folder can vary in a shell or during script execution it is advised that you use specify a fully qualified path instead of a relative path.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	AWSProfilesLocation, ProfilesLocation

-ProfileName <String>

The user-defined name of an AWS credentials or SAML-based role profile containing credential information. The profile is expected to be found in the secure credential file shared with the AWS SDK for .NET and AWS Toolkit for Visual Studio. You can also specify the name of a profile stored in the .ini-format credential file used with the AWS CLI and other AWS SDKs.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	StoredCredentials, AWSProfileName

-Region <Object>

The system name of an AWS region or an AWSRegion instance. This governs the endpoint that will be used when calling service operations. Note that the AWS resources referenced in a call are usually region-specific.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	RegionToCall

-SecretKey <String>

The AWS secret key for the user account. This can be a temporary secret key if the corresponding session token is supplied to the -SessionToken parameter.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	SK, SecretAccessKey

-SessionToken <String>

The session token if the access and secret keys are temporary session-based credentials.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	ST

Outputs

System.String or Amazon.Comprehend.Model.CreateDocumentClassifierResponse

This cmdlet returns a System.String object. The service call response (type Amazon.Comprehend.Model.CreateDocumentClassifierResponse) can be returned by specifying '-Select *'.

New-COMPDocumentClassifier Cmdlet

Amazon Comprehend
Available in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Supported Version

New-COMPDocumentClassifier Cmdlet

Amazon ComprehendAvailable in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Related Links

Supported Version

Amazon Comprehend
Available in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell