AWS services or capabilities described in AWS Documentation may vary by region/location. Click Getting Started with Amazon AWS to see specific differences applicable to the China (Beijing) Region.
New-SMTransformJob-TransformJobName <String>-TransformOutput_Accept <String>-TransformOutput_AssembleWith <AssemblyType>-BatchStrategy <BatchStrategy>-TransformInput_CompressionType <CompressionType>-TransformInput_ContentType <String>-DataCaptureConfig_DestinationS3Uri <String>-Environment <Hashtable>-ExperimentConfig_ExperimentName <String>-DataCaptureConfig_GenerateInferenceId <Boolean>-DataProcessing_InputFilter <String>-TransformResources_InstanceCount <Int32>-TransformResources_InstanceType <TransformInstanceType>-ModelClientConfig_InvocationsMaxRetry <Int32>-ModelClientConfig_InvocationsTimeoutInSecond <Int32>-DataProcessing_JoinSource <JoinSource>-DataCaptureConfig_KmsKeyId <String>-TransformOutput_KmsKeyId <String>-MaxConcurrentTransform <Int32>-MaxPayloadInMB <Int32>-ModelName <String>-DataProcessing_OutputFilter <String>-ExperimentConfig_RunName <String>-S3DataSource_S3DataType <S3DataType>-TransformOutput_S3OutputPath <String>-S3DataSource_S3Uri <String>-TransformInput_SplitType <SplitType>-Tag <Tag[]>-ExperimentConfig_TrialComponentDisplayName <String>-ExperimentConfig_TrialName <String>-TransformResources_VolumeKmsKeyId <String>-Select <String>-PassThru <SwitchParameter>-Force <SwitchParameter>-ClientConfig <AmazonSageMakerConfig>
TransformJobName
- Identifies the transform job. The name must be unique within an Amazon Web Services Region in an Amazon Web Services account. ModelName
- Identifies the model to use. ModelName
must be the name of an existing Amazon SageMaker model in the same Amazon Web Services Region and Amazon Web Services account. For information on creating a model, see CreateModel. TransformInput
- Describes the dataset to be transformed and the Amazon S3 location where it is stored. TransformOutput
- Identifies the Amazon S3 location where you want Amazon SageMaker to save the results from the transform job. TransformResources
- Identifies the ML compute instances for the transform job. SplitType
property to Line
, RecordIO
, or TFRecord
.To use only one record when making an HTTP invocation request to a container, set BatchStrategy
to SingleRecord
and SplitType
to Line
.To fit as many records in a mini-batch as can fit within the MaxPayloadInMB
limit, set BatchStrategy
to MultiRecord
and SplitType
to Line
. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
1234abcd-12ab-34cd-56ef-1234567890ab
arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab
alias/ExampleAlias
arn:aws:kms:us-west-2:111122223333:alias/ExampleAlias
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
InputFilter
parameter to exclude fields, such as an ID column, from the input. If you want SageMaker to pass the entire input dataset to the algorithm, accept the default value $
.Examples: "$"
, "$[1:]"
, "$.features"
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
None
and Input
. The default value is None
, which specifies not to join the input with the transformed data. If you want the batch transform job to join the original input data with the transformed data, set JoinSource
to Input
. You can specify OutputFilter
as an additional filter to select a portion of the joined dataset and store it in the output file.For JSON or JSONLines objects, such as a JSON array, SageMaker adds the transformed data to the input JSON object in an attribute called SageMakerOutput
. The joined result for JSON must be a key-value pair object. If the input is not a key-value pair object, SageMaker creates a new JSON file. In the new JSON file, and the input data is stored under the SageMakerInput
key and the results are stored in SageMakerOutput
.For CSV data, SageMaker takes each row as a JSON array and joins the transformed data with the input by appending each transformed row to the end of the input. The joined data has the original input data followed by the transformed data and the output is a CSV file.For information on how joining in applied, see Workflow for Associating Inferences with Input Records. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
$
. If you specify indexes that aren't within the dimension size of the joined dataset, you get an error.Examples: "$"
, "$[0,5:]"
, "$['id','SageMakerOutput']"
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
MaxConcurrentTransforms
is set to 0
or left unset, Amazon SageMaker checks the optional execution-parameters to determine the settings for your chosen algorithm. If the execution-parameters endpoint is not enabled, the default value is 1
. For more information on execution-parameters, see How Containers Serve Requests. For built-in algorithms, you don't need to set a value for MaxConcurrentTransforms
. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | MaxConcurrentTransforms |
MaxPayloadInMB
must be greater than, or equal to, the size of a single record. To estimate the size of a record in MB, divide the size of your dataset by the number of records. To ensure that the records fit within the maximum payload size, we recommend using a slightly larger value. The default value is 6
MB. The value of MaxPayloadInMB
cannot be greater than 100 MB. If you specify the MaxConcurrentTransforms
parameter, the value of (MaxConcurrentTransforms * MaxPayloadInMB)
also cannot exceed 100 MB.For cases where the payload might be arbitrarily large and is transmitted using HTTP chunked encoding, set the value to 0
. This feature works only in supported algorithms. Currently, Amazon SageMaker built-in algorithms do not support HTTP chunked encoding. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | ModelClientConfig_InvocationsMaxRetries |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | ModelClientConfig_InvocationsTimeoutInSeconds |
ModelName
must be the name of an existing Amazon SageMaker model within an Amazon Web Services Region in an Amazon Web Services account. Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
S3Prefix
, S3Uri
identifies a key name prefix. Amazon SageMaker uses all objects with the specified key name prefix for batch transform.If you choose ManifestFile
, S3Uri
identifies an object that is a manifest file containing a list of object keys that you want Amazon SageMaker to use for batch transform. The following values are compatible: ManifestFile
, S3Prefix
The following value is not compatible: AugmentedManifestFile
Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | TransformInput_DataSource_S3DataSource_S3DataType |
S3DataType
, identifies either a key name prefix or a manifest. For example:s3://bucketname/exampleprefix
.s3://bucketname/example.manifest
The manifest is an S3 object which is a JSON file with the following format: [ {"prefix": "s3://customer_bucket/some/prefix/"},
"relative/path/to/custdata-1",
"relative/path/custdata-2",
...
"relative/path/custdata-N"
]
The preceding JSON matches the following S3Uris
: s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uris
in this manifest constitutes the input data for the channel for this datasource. The object that each S3Uris
points to must be readable by the IAM role that Amazon SageMaker uses to perform tasks on your behalf.Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | TransformInput_DataSource_S3DataSource_S3Uri |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | Tags |
None
. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
SplitType
is None
, which indicates that input data files are not split, and request payloads contain the entire contents of an input object. Set the value of this parameter to Line
to split records on a newline character boundary. SplitType
also supports a number of record-oriented binary data formats. Currently, the supported record formats are:BatchStrategy
and MaxPayloadInMB
parameters. When the value of BatchStrategy
is MultiRecord
, Amazon SageMaker sends the maximum number of records in each request, up to the MaxPayloadInMB
limit. If the value of BatchStrategy
is SingleRecord
, Amazon SageMaker sends individual records in each request.Some data formats represent a record as a binary payload wrapped with extra padding bytes. When splitting is applied to a binary data format, padding is removed if the value of BatchStrategy
is set to SingleRecord
. Padding is not removed if the value of BatchStrategy
is set to MultiRecord
.For more information about RecordIO
, see Create a Dataset Using RecordIO in the MXNet documentation. For more information about TFRecord
, see Consuming TFRecord data in the TensorFlow documentation. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | True |
Position? | 1 |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
None
. To add a newline character at the end of every transformed record, specify Line
. Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
KmsKeyId
can be any of the following formats: 1234abcd-12ab-34cd-56ef-1234567890ab
arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab
alias/ExampleAlias
arn:aws:kms:us-west-2:111122223333:alias/ExampleAlias
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
s3://bucket-name/key-name-prefix
.For every S3 object used as input for the transform job, batch transform stores the transformed data with an .out
suffix in a corresponding subfolder in the location in the output prefix. For example, for the input data stored at s3://bucket-name/input-name-prefix/dataset01/data.csv
, batch transform stores the transformed data at s3://bucket-name/output-name-prefix/input-name-prefix/data.csv.out
. Batch transform doesn't upload partially processed objects. For an input S3 object that contains multiple records, it creates an .out
file only if the transform job succeeds on the entire file. When the input contains multiple S3 objects, the batch transform job processes the listed S3 objects and uploads only the output for successfully processed objects. If any object fails in the transform job batch transform marks the job as failed to prompt investigation. Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
1
, and the maximum is 100
. For distributed transform jobs, specify a value greater than 1
. Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
ml.m5.large
instance types. Required? | True |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
VolumeKmsKeyId
when using an instance type with local storage.For a list of instance types that support local instance storage, see Instance Store Volumes.For more information about local instance storage encryption, see SSD Instance Store Volumes. The VolumeKmsKeyId
can be any of the following formats:1234abcd-12ab-34cd-56ef-1234567890ab
arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab
alias/ExampleAlias
arn:aws:kms:us-west-2:111122223333:alias/ExampleAlias
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | AK |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByValue, ByPropertyName) |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | AWSProfilesLocation, ProfilesLocation |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | StoredCredentials, AWSProfileName |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | RegionToCall |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | SK, SecretAccessKey |
Required? | False |
Position? | Named |
Accept pipeline input? | True (ByPropertyName) |
Aliases | ST |
AWS Tools for PowerShell: 2.x.y.z