AWS Tools for Windows PowerShell
Command Reference

AWS services or capabilities described in AWS Documentation may vary by region/location. Click Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.

Synopsis

Calls the AWS Glue CreateMLTransform API operation.

Syntax

New-GLUEMLTransform
-Name <String>
-FindMatchesParameters_AccuracyCostTradeoff <Double>
-Description <String>
-FindMatchesParameters_EnforceProvidedLabel <Boolean>
-GlueVersion <String>
-InputRecordTable <GlueTable[]>
-MaxCapacity <Double>
-MaxRetry <Int32>
-NumberOfWorker <Int32>
-FindMatchesParameters_PrecisionRecallTradeoff <Double>
-FindMatchesParameters_PrimaryKeyColumnName <String>
-Role <String>
-Timeout <Int32>
-Parameters_TransformType <TransformType>
-WorkerType <WorkerType>
-Select <String>
-PassThru <SwitchParameter>
-Force <SwitchParameter>

Description

Creates an AWS Glue machine learning transform. This operation creates the transform and all the necessary parameters to train it. Call this operation as the first step in the process of using a machine learning transform (such as the FindMatches transform) for deduplicating data. You can provide an optional Description, in addition to the parameters that you want to use for your algorithm. You must also specify certain parameters for the tasks that AWS Glue runs on your behalf as part of learning from your data and creating a high-quality machine learning transform. These parameters include Role, and optionally, AllocatedCapacity, Timeout, and MaxRetries. For more information, see Jobs.

Parameters

-Description <String>
A description of the machine learning transform that is being defined. The default is an empty string.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-FindMatchesParameters_AccuracyCostTradeoff <Double>
The value that is selected when tuning your transform for a balance between accuracy and cost. A value of 0.5 means that the system balances accuracy and cost concerns. A value of 1.0 means a bias purely for accuracy, which typically results in a higher cost, sometimes substantially higher. A value of 0.0 means a bias purely for cost, which results in a less accurate FindMatches transform, sometimes with unacceptable accuracy.Accuracy measures how well the transform finds true positives and true negatives. Increasing accuracy requires more machine resources and cost. But it also results in increased recall. Cost measures how many compute resources, and thus money, are consumed to run the transform.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesParameters_FindMatchesParameters_AccuracyCostTradeoff
-FindMatchesParameters_EnforceProvidedLabel <Boolean>
The value to switch on or off to force the output to match the provided labels from users. If the value is True, the find matches transform forces the output to match the provided labels. The results override the normal conflation results. If the value is False, the find matches transform does not ensure all the labels provided are respected, and the results rely on the trained model.Note that setting this value to true may increase the conflation execution time.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesParameters_FindMatchesParameters_EnforceProvidedLabels
-FindMatchesParameters_PrecisionRecallTradeoff <Double>
The value selected when tuning your transform for a balance between precision and recall. A value of 0.5 means no preference; a value of 1.0 means a bias purely for precision, and a value of 0.0 means a bias for recall. Because this is a tradeoff, choosing values close to 1.0 means very low recall, and choosing values close to 0.0 results in very low precision.The precision metric indicates how often your model is correct when it predicts a match. The recall metric indicates that for an actual match, how often your model predicts the match.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesParameters_FindMatchesParameters_PrecisionRecallTradeoff
-FindMatchesParameters_PrimaryKeyColumnName <String>
The name of a column that uniquely identifies rows in the source table. Used to help identify matching records.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesParameters_FindMatchesParameters_PrimaryKeyColumnName
This parameter overrides confirmation prompts to force the cmdlet to continue its operation. This parameter should always be used with caution.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-GlueVersion <String>
This value determines which version of AWS Glue this machine learning transform is compatible with. Glue 1.0 is recommended for most customers. If the value is not set, the Glue compatibility defaults to Glue 0.9. For more information, see AWS Glue Versions in the developer guide.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-InputRecordTable <GlueTable[]>
A list of AWS Glue table definitions used by the transform.
Required?True
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesInputRecordTables
-MaxCapacity <Double>
The number of AWS Glue data processing units (DPUs) that are allocated to task runs for this transform. You can allocate from 2 to 100 DPUs; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page. MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.
  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.
  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.
  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).
  • MaxCapacity and NumberOfWorkers must both be at least 1.
When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.When the WorkerType field is set to a value other than Standard, the MaxCapacity field is set automatically and becomes read-only.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-MaxRetry <Int32>
The maximum number of times to retry a task for this transform after a task run fails.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesMaxRetries
-Name <String>
The unique name that you give the transform when you create it.
Required?True
Position?1
Accept pipeline input?True (ByValue, ByPropertyName)
-NumberOfWorker <Int32>
The number of workers of a defined workerType that are allocated when this task runs.If WorkerType is set, then NumberOfWorkers is required (and vice versa).
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesNumberOfWorkers
-Parameters_TransformType <TransformType>
The type of machine learning transform.For information about the types of machine learning transforms, see Creating Machine Learning Transforms.
Required?True
Position?Named
Accept pipeline input?True (ByPropertyName)
-PassThru <SwitchParameter>
Changes the cmdlet behavior to return the value passed to the Name parameter. The -PassThru parameter is deprecated, use -Select '^Name' instead. This parameter will be removed in a future version.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-Role <String>
The name or Amazon Resource Name (ARN) of the IAM role with the required permissions. The required permissions include both AWS Glue service role permissions to AWS Glue resources, and Amazon S3 permissions required by the transform.
  • This role needs AWS Glue service role permissions to allow access to resources in AWS Glue. See Attach a Policy to IAM Users That Access AWS Glue.
  • This role needs permission to your Amazon Simple Storage Service (Amazon S3) sources, targets, temporary directory, scripts, and any libraries used by the task run for this transform.
Required?True
Position?Named
Accept pipeline input?True (ByPropertyName)
-Select <String>
Use the -Select parameter to control the cmdlet output. The default value is 'TransformId'. Specifying -Select '*' will result in the cmdlet returning the whole service response (Amazon.Glue.Model.CreateMLTransformResponse). Specifying the name of a property of type Amazon.Glue.Model.CreateMLTransformResponse will result in that property being returned. Specifying -Select '^ParameterName' will result in the cmdlet returning the selected cmdlet parameter value.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-Timeout <Int32>
The timeout of the task run for this transform in minutes. This is the maximum time that a task run for this transform can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-WorkerType <WorkerType>
The type of predefined worker that is allocated when this task runs. Accepts a value of Standard, G.1X, or G.2X.
  • For the Standard worker type, each worker provides 4 vCPU, 16 GB of memory and a 50GB disk, and 2 executors per worker.
  • For the G.1X worker type, each worker provides 4 vCPU, 16 GB of memory and a 64GB disk, and 1 executor per worker.
  • For the G.2X worker type, each worker provides 8 vCPU, 32 GB of memory and a 128GB disk, and 1 executor per worker.
MaxCapacity is a mutually exclusive option with NumberOfWorkers and WorkerType.
  • If either NumberOfWorkers or WorkerType is set, then MaxCapacity cannot be set.
  • If MaxCapacity is set then neither NumberOfWorkers or WorkerType can be set.
  • If WorkerType is set, then NumberOfWorkers is required (and vice versa).
  • MaxCapacity and NumberOfWorkers must both be at least 1.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)

Common Credential and Region Parameters

-AccessKey <String>
The AWS access key for the user account. This can be a temporary access key if the corresponding session token is supplied to the -SessionToken parameter.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesAK
-Credential <AWSCredentials>
An AWSCredentials object instance containing access and secret key information, and optionally a token for session-based credentials.
Required?False
Position?Named
Accept pipeline input?True (ByValue, ByPropertyName)
-EndpointUrl <String>
The endpoint to make the call against.Note: This parameter is primarily for internal AWS use and is not required/should not be specified for normal usage. The cmdlets normally determine which endpoint to call based on the region specified to the -Region parameter or set as default in the shell (via Set-DefaultAWSRegion). Only specify this parameter if you must direct the call to a specific custom endpoint.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
-NetworkCredential <PSCredential>
Used with SAML-based authentication when ProfileName references a SAML role profile. Contains the network credentials to be supplied during authentication with the configured identity provider's endpoint. This parameter is not required if the user's default network identity can or should be used during authentication.
Required?False
Position?Named
Accept pipeline input?True (ByValue, ByPropertyName)
-ProfileLocation <String>
Used to specify the name and location of the ini-format credential file (shared with the AWS CLI and other AWS SDKs)If this optional parameter is omitted this cmdlet will search the encrypted credential file used by the AWS SDK for .NET and AWS Toolkit for Visual Studio first. If the profile is not found then the cmdlet will search in the ini-format credential file at the default location: (user's home directory)\.aws\credentials.If this parameter is specified then this cmdlet will only search the ini-format credential file at the location given.As the current folder can vary in a shell or during script execution it is advised that you use specify a fully qualified path instead of a relative path.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesAWSProfilesLocation, ProfilesLocation
-ProfileName <String>
The user-defined name of an AWS credentials or SAML-based role profile containing credential information. The profile is expected to be found in the secure credential file shared with the AWS SDK for .NET and AWS Toolkit for Visual Studio. You can also specify the name of a profile stored in the .ini-format credential file used with the AWS CLI and other AWS SDKs.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesStoredCredentials, AWSProfileName
-Region <Object>
The system name of an AWS region or an AWSRegion instance. This governs the endpoint that will be used when calling service operations. Note that the AWS resources referenced in a call are usually region-specific.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesRegionToCall
-SecretKey <String>
The AWS secret key for the user account. This can be a temporary secret key if the corresponding session token is supplied to the -SessionToken parameter.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesSK, SecretAccessKey
-SessionToken <String>
The session token if the access and secret keys are temporary session-based credentials.
Required?False
Position?Named
Accept pipeline input?True (ByPropertyName)
AliasesST

Outputs

This cmdlet returns a System.String object. The service call response (type Amazon.Glue.Model.CreateMLTransformResponse) can also be referenced from properties attached to the cmdlet entry in the $AWSHistory stack.

Supported Version

AWS Tools for PowerShell: 2.x.y.z