Comprehend: New-COMPDataset Cmdlet | AWS Tools for PowerShell

Synopsis

Calls the Amazon Comprehend CreateDataset API operation.

Syntax

New-COMPDataset
-FlywheelArn <String>
-InputDataConfig_AugmentedManifest <DatasetAugmentedManifestsListItem[]>
-ClientRequestToken <String>
-InputDataConfig_DataFormat <DatasetDataFormat>
-DatasetName <String>
-DatasetType <DatasetType>
-Description <String>
-Documents_InputFormat <InputFormat>
-DocumentClassifierInputDataConfig_LabelDelimiter <String>
-DocumentClassifierInputDataConfig_S3Uri <String>
-Annotations_S3Uri <String>
-Documents_S3Uri <String>
-EntityList_S3Uri <String>
-Tag <Tag[]>
-Select <String>
-Force <SwitchParameter>
-ClientConfig <AmazonComprehendConfig>

Description

Creates a dataset to upload training or test data for a model associated with a flywheel. For more information about datasets, see Flywheel overview in the Amazon Comprehend Developer Guide.

Parameters

-Annotations_S3Uri <String>

Specifies the Amazon S3 location where the training documents for an entity recognizer are located. The URI must be in the same Region as the API endpoint that you are calling.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_EntityRecognizerInputDataConfig_Annotations_S3Uri

-ClientConfig <AmazonComprehendConfig>

Amazon.PowerShell.Cmdlets.COMP.AmazonComprehendClientCmdlet.ClientConfig

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-ClientRequestToken <String>

A unique identifier for the request. If you don't set the client request token, Amazon Comprehend generates one.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-DatasetName <String>

Name of the dataset.

Required?	True
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-DatasetType <DatasetType>

The dataset type. You can specify that the data in a dataset is for training the model or for testing the model.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Description <String>

Description of the dataset.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-DocumentClassifierInputDataConfig_LabelDelimiter <String>

Indicates the delimiter used to separate each label for training a multi-label classifier. The default delimiter between labels is a pipe (|). You can use a different character as a delimiter (if it's an allowed character) by specifying it under Delimiter for labels. If the training documents use a delimiter other than the default or the delimiter you specify, the labels on that line will be combined to make a single unique label, such as LABELLABELLABEL.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_DocumentClassifierInputDataConfig_LabelDelimiter

-DocumentClassifierInputDataConfig_S3Uri <String>

The Amazon S3 URI for the input data. The S3 bucket must be in the same Region as the API endpoint that you are calling. The URI can point to a single input file or it can provide the prefix for a collection of input files.For example, if you use the URI S3://bucketName/prefix, if the prefix is a single file, Amazon Comprehend uses that file as input. If more than one file begins with the prefix, Amazon Comprehend uses all of them as input.This parameter is required if you set DataFormat to COMPREHEND_CSV.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_DocumentClassifierInputDataConfig_S3Uri

-Documents_InputFormat <InputFormat>

Specifies how the text in an input file should be processed. This is optional, and the default is ONE_DOC_PER_LINE. ONE_DOC_PER_FILE - Each file is considered a separate document. Use this option when you are processing large documents, such as newspaper articles or scientific papers. ONE_DOC_PER_LINE - Each line in a file is considered a separate document. Use this option when you are processing many short documents, such as text messages.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_EntityRecognizerInputDataConfig_Documents_InputFormat

-Documents_S3Uri <String>

Specifies the Amazon S3 location where the documents for the dataset are located.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_EntityRecognizerInputDataConfig_Documents_S3Uri

-EntityList_S3Uri <String>

Specifies the Amazon S3 location where the entity list is located.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_EntityRecognizerInputDataConfig_EntityList_S3Uri

-FlywheelArn <String>

The Amazon Resource Number (ARN) of the flywheel of the flywheel to receive the data.

Required?	True
Position?	1
Accept pipeline input?	True (ByValue, ByPropertyName)

-Force <SwitchParameter>

This parameter overrides confirmation prompts to force the cmdlet to continue its operation. This parameter should always be used with caution.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-InputDataConfig_AugmentedManifest <DatasetAugmentedManifestsListItem[]>

A list of augmented manifest files that provide training data for your custom model. An augmented manifest file is a labeled dataset that is produced by Amazon SageMaker Ground Truth. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	InputDataConfig_AugmentedManifests

-InputDataConfig_DataFormat <DatasetDataFormat>

COMPREHEND_CSV: The data format is a two-column CSV file, where the first column contains labels and the second column contains documents.AUGMENTED_MANIFEST: The data format

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Select <String>

Use the -Select parameter to control the cmdlet output. The default value is 'DatasetArn'. Specifying -Select '*' will result in the cmdlet returning the whole service response (Amazon.Comprehend.Model.CreateDatasetResponse). Specifying the name of a property of type Amazon.Comprehend.Model.CreateDatasetResponse will result in that property being returned. Specifying -Select '^ParameterName' will result in the cmdlet returning the selected cmdlet parameter value.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-Tag <Tag[]>

Tags for the dataset. Starting with version 4 of the SDK this property will default to null. If no data for this property is returned from the service the property will also be null. This was changed to improve performance and allow the SDK and caller to distinguish between a property not set or a property being empty to clear out a value. To retain the previous SDK behavior set the AWSConfigs.InitializeCollections static property to true.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	Tags

Common Credential and Region Parameters

-AccessKey <String>

The AWS access key for the user account. This can be a temporary access key if the corresponding session token is supplied to the -SessionToken parameter.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	AK

-Credential <AWSCredentials>

An AWSCredentials object instance containing access and secret key information, and optionally a token for session-based credentials.

Required?	False
Position?	Named
Accept pipeline input?	True (ByValue, ByPropertyName)

-EndpointUrl <String>

The endpoint to make the call against.Note: This parameter is primarily for internal AWS use and is not required/should not be specified for normal usage. The cmdlets normally determine which endpoint to call based on the region specified to the -Region parameter or set as default in the shell (via Set-DefaultAWSRegion). Only specify this parameter if you must direct the call to a specific custom endpoint.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)

-NetworkCredential <PSCredential>

Used with SAML-based authentication when ProfileName references a SAML role profile. Contains the network credentials to be supplied during authentication with the configured identity provider's endpoint. This parameter is not required if the user's default network identity can or should be used during authentication.

Required?	False
Position?	Named
Accept pipeline input?	True (ByValue, ByPropertyName)

-ProfileLocation <String>

Used to specify the name and location of the ini-format credential file (shared with the AWS CLI and other AWS SDKs)If this optional parameter is omitted this cmdlet will search the encrypted credential file used by the AWS SDK for .NET and AWS Toolkit for Visual Studio first. If the profile is not found then the cmdlet will search in the ini-format credential file at the default location: (user's home directory)\.aws\credentials.If this parameter is specified then this cmdlet will only search the ini-format credential file at the location given.As the current folder can vary in a shell or during script execution it is advised that you use specify a fully qualified path instead of a relative path.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	AWSProfilesLocation, ProfilesLocation

-ProfileName <String>

The user-defined name of an AWS credentials or SAML-based role profile containing credential information. The profile is expected to be found in the secure credential file shared with the AWS SDK for .NET and AWS Toolkit for Visual Studio. You can also specify the name of a profile stored in the .ini-format credential file used with the AWS CLI and other AWS SDKs.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	StoredCredentials, AWSProfileName

-Region <Object>

The system name of an AWS region or an AWSRegion instance. This governs the endpoint that will be used when calling service operations. Note that the AWS resources referenced in a call are usually region-specific.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	RegionToCall

-SecretKey <String>

The AWS secret key for the user account. This can be a temporary secret key if the corresponding session token is supplied to the -SessionToken parameter.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	SK, SecretAccessKey

-SessionToken <String>

The session token if the access and secret keys are temporary session-based credentials.

Required?	False
Position?	Named
Accept pipeline input?	True (ByPropertyName)
Aliases	ST

Outputs

System.String or Amazon.Comprehend.Model.CreateDatasetResponse

This cmdlet returns a System.String object. The service call response (type Amazon.Comprehend.Model.CreateDatasetResponse) can be returned by specifying '-Select *'.

New-COMPDataset Cmdlet

Amazon Comprehend
Available in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Supported Version

New-COMPDataset Cmdlet

Amazon ComprehendAvailable in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell

Synopsis

Syntax

Description

Parameters

Common Credential and Region Parameters

Outputs

Related Links

Supported Version

Amazon Comprehend
Available in AWS.Tools.Comprehend, AWSPowerShell.NetCore and AWSPowerShell