Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Glue PySpark transforms reference

Focus mode
AWS Glue PySpark transforms reference - AWS Glue

AWS Glue provides the following built-in transforms that you can use in PySpark ETL operations. Your data passes from transform to transform in a data structure called a DynamicFrame, which is an extension to an Apache Spark SQL DataFrame. The DynamicFrame contains your data, and you reference its schema to process your data.

Most of these transforms also exist as methods of the DynamicFrame class. For more information, see DynamicFrame transforms .

Data integration transforms

For AWS Glue 4.0 and above, create or update job arguments with key: --enable-glue-di-transforms, value: true.

Example job script:

from pyspark.context import SparkContext from awsgluedi.transforms import * sc = SparkContext() input_df = spark.createDataFrame( [(5,), (0,), (-1,), (2,), (None,)], ["source_column"], ) try: df_output = math_functions.IsEven.apply( data_frame=input_df, spark_context=sc, source_column="source_column", target_column="target_column", value=None, true_string="Even", false_string="Not even", ) df_output.show() except: print("Unexpected Error happened ") raise

Example Sessions using Notebooks

%idle_timeout 2880 %glue_version 4.0 %worker_type G.1X %number_of_workers 5 %region eu-west-1
%%configure { "--enable-glue-di-transforms": "true" }
from pyspark.context import SparkContext from awsgluedi.transforms import * sc = SparkContext() input_df = spark.createDataFrame( [(5,), (0,), (-1,), (2,), (None,)], ["source_column"], ) try: df_output = math_functions.IsEven.apply( data_frame=input_df, spark_context=sc, source_column="source_column", target_column="target_column", value=None, true_string="Even", false_string="Not even", ) df_output.show() except: print("Unexpected Error happened ") raise

Example Sessions using AWS CLI

aws glue create-session --default-arguments "--enable-glue-di-transforms=true"

DI transforms:

Maven: Bundle the plugin with your Spark applications

You can bundle the transforms dependency with your Spark applications and Spark distributions (version 3.3) by adding the plugin dependency in your Maven pom.xml while developing your Spark applications locally.

<repositories> ... <repository> <id>aws-glue-etl-artifacts</id> <url>https://aws-glue-etl-artifacts.s3.amazonaws.com/release/ </url> </repository> </repositories> ... <dependency> <groupId>com.amazonaws</groupId> <artifactId>AWSGlueTransforms</artifactId> <version>4.0.0</version> </dependency>

You can alternatively download the binaries from AWS Glue Maven artifacts directly and include them in your Spark application as follows.

#!/bin/bash sudo wget -v https://aws-glue-etl-artifacts.s3.amazonaws.com/release/com/amazonaws/AWSGlueTransforms/4.0.0/AWSGlueTransforms-4.0.0.jar -P /usr/lib/spark/jars/
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.