Serverless ETL on AWS Glue FAQ - AWS Prescriptive Guidance

Serverless ETL on AWS Glue FAQ

This section provides answers to commonly raised questions about serverless ETL on AWS Glue.

When should I use Python shell instead of Apache Spark for AWS Glue jobs?

Use Python shell when you have basic ETL jobs or small datasets that don’t require the distributed computing capabilities of Apache Spark. Use Apache Spark for more complex ETL jobs or large datasets that require the high processing power that Spark is optimized to handle.

What is the recommended AWS Glue version for my project?

We generally recommend using the latest version of AWS Glue. The AWS Glue versions page lists the differences between versions, along with their compatibility with various versions of Python and Spark.