Editing the data transform node - AWS Glue Studio

Editing the data transform node

AWS Glue Studio provides a set of built-in transforms that you can use to process your data. Your data passes from one node in the job diagram to another in a data structure called a DynamicFrame, which is an extension to an Apache Spark SQL DataFrame.

In the pre-populated diagram for a job, between the data source and data target nodes is the Transform - ApplyMapping node. You can configure this transform node to modify your data, or you can use additional transforms.

The following built-in transforms are available with AWS Glue Studio:

  • ApplyMapping: Map data property keys in the data source to data property keys in the data target. You can rename keys, modify the data types for keys, and choose which keys to drop from the dataset.

  • SelectFields: Choose the data property keys that you want to keep.

  • DropFields: Choose the data property keys that you want to drop.

  • RenameField: Rename a single data property key.

  • Spigot: Write samples of the data to an Amazon S3 bucket.

  • Join: Join two datasets into one dataset using a comparison phrase on the specified data property keys. You can use inner, outer, left, right, left semi, and left anti joins.

  • SplitFields: Split data property keys into two DynamicFrames. Output is a collection of DynamicFrames: one with selected data property keys, and one with the remaining data property keys.

  • SelectFromCollection: Choose one DynamicFrame from a collection of DynamicFrames. The output is the selected DynamicFrame.

  • FillMissingValues: Locate records in the dataset that have missing values and add a new field with a suggested value that is determined by imputation

  • Filter: Split a dataset into two, based on a filter condition.

  • DropNullFields: Removes columns from the dataset if all values in the column are ‘null’.

  • SQL: Enter SparkSQL code into a text entry field to use a SQL query to transform the data. The output is a single DynamicFrame.

  • Aggregate: performs a calculation (such as average, sum, min, max) on selected fields and rows, and creates a new field with the newly calculated value(s).

  • Custom transform: Enter code into a text entry field to use custom transforms. The output is a collection of DynamicFrames.