AWS Glue
Developer Guide

Authoring Jobs in AWS Glue

A job is the business logic that performs the extract, transform, and load (ETL) work in AWS Glue. When you start a job, AWS Glue runs a script that extracts data from sources, transforms the data, and loads it into targets. You can create jobs in the ETL section of the AWS Glue console. For more information, see Working with Jobs on the AWS Glue Console.

The following diagram summarizes the basic workflow and steps involved in authoring a job in AWS Glue:

      Workflow showing how to author a job with AWS Glue in 6 basic steps.

Workflow Overview

When you author a job, you supply details about data sources, targets, and other information. The result is a generated Apache Spark API (PySpark) script. You can then store your job definition in the AWS Glue Data Catalog.

The following describes the overall process of authoring jobs in AWS Glue:

  1. You choose the data sources for your job. The tables that represent your data source must already be defined in your Data Catalog. If the source requires a connection, the connection is also referenced in your job.

  2. You choose the data targets of your job. The tables that represent the data target can be defined in your Data Catalog, or your job can create the target tables when it runs. You choose a target location when you author the job. If the target requires a connection, the connection is also referenced in your job.

  3. You customize the job-processing environment by providing arguments for your job and generated script. For more information, see Adding Jobs in AWS Glue.

  4. Initially, AWS Glue generates a script, but you can also edit your job to add transforms. For more information, see Built-In Transforms.

  5. You specify how your job is invoked, either on demand, by a time-based schedule, or by an event. For more information, see Triggering Jobs in AWS Glue.

  6. Based on your input, AWS Glue generates a PySpark or Scala script. You can tailor the script based on your business needs. For more information, see Editing Scripts in AWS Glue.

On this page: