You are viewing documentation for version 2 of the AWS SDK for Ruby. Version 3 documentation can be found here.

Class: Aws::SageMaker::Types::DataProcessing

Inherits:
Struct
  • Object
show all
Defined in:
(unknown)

Overview

Note:

When passing DataProcessing as input to an Aws::Client method, you can use a vanilla Hash:

{
  input_filter: "JsonPath",
  output_filter: "JsonPath",
  join_source: "Input", # accepts Input, None
}

The data structure used to combine the input data and transformed data from the batch transform output into a joined dataset and to store it in an output file. It also contains information on how to filter the input data and the joined dataset. For more information, see Batch Transform I/O Join.

Returned by:

Instance Attribute Summary collapse

Instance Attribute Details

#input_filterString

A JSONPath expression used to select a portion of the input data to pass to the algorithm. Use the InputFilter parameter to exclude fields, such as an ID column, from the input. If you want Amazon SageMaker to pass the entire input dataset to the algorithm, accept the default value $.

Examples: "$", "$[1:]", "$.features"

Returns:

  • (String)

    A JSONPath expression used to select a portion of the input data to pass to the algorithm.

#join_sourceString

Specifies the source of the data to join with the transformed data. The valid values are None and Input The default value is None which specifies not to join the input with the transformed data. If you want the batch transform job to join the original input data with the transformed data, set JoinSource to Input. To join input and output, the batch transform job must satisfy the Requirements for Using Batch Transform I/O Join.

For JSON or JSONLines objects, such as a JSON array, Amazon SageMaker adds the transformed data to the input JSON object in an attribute called SageMakerOutput. The joined result for JSON must be a key-value pair object. If the input is not a key-value pair object, Amazon SageMaker creates a new JSON file. In the new JSON file, and the input data is stored under the SageMakerInput key and the results are stored in SageMakerOutput.

For CSV files, Amazon SageMaker combines the transformed data with the input data at the end of the input data and stores it in the output file. The joined data has the joined input data followed by the transformed data and the output is a CSV file.

Returns:

  • (String)

    Specifies the source of the data to join with the transformed data.

#output_filterString

A JSONPath expression used to select a portion of the joined dataset to save in the output file for a batch transform job. If you want Amazon SageMaker to store the entire input dataset in the output file, leave the default value, $. If you specify indexes that aren\'t within the dimension size of the joined dataset, you get an error.

Examples: "$", "$[0,5:]", "$.['id','SageMakerOutput']"

Returns:

  • (String)

    A JSONPath expression used to select a portion of the joined dataset to save in the output file for a batch transform job.