Creating ETL jobs with AWS Glue Studio - AWS Glue Studio

Creating ETL jobs with AWS Glue Studio

You can use the simple visual interface in AWS Glue Studio to create your ETL jobs. You use the Jobs page to create new jobs. You can also use a script editor or notebook to work directly with code in the AWS Glue Studio ETL job script.

On the Jobs page, you can see all the jobs that you have created either with AWS Glue Studio or AWS Glue. You can view, manage, and run your jobs on this page.

Start the job creation process

You use the visual editor to create and customize your jobs. When you create a new job, you have the option of starting with an empty canvas, a job with a data source, transform, and data target node, or writing an ETL script.

To create a job in AWS Glue Studio
  1. Sign in to the AWS Management Console and open the AWS Glue Studio console at https://console.aws.amazon.com/gluestudio/.

  2. You can either choose Create and manage jobs from the AWS Glue Studio landing page, or you can choose Jobs from the navigation pane.

    The Jobs page appears.

  3. In the Create job section, choose a configuration option for your job.

    • Visual with a blank canvas – To create a job starting with an empty canvas

    • Visual with a source and target – To create a job starting with source node, or with a source, transform and target node

      You then choose the data source type. You can also choose the data target type, or you can choose the Choose later option from the Target drop-down list to start with only a data source node in the graph.

    • Spark script editor – For those familiar with programming and writing ETL scripts, choose this option to create a new Spark ETL job. You then have the option of writing Python or Scala code in a script editor window, or uploading an existing script from a local file. If you choose to use the script editor, you can't use the visual job editor to design or edit your job.

      A Spark job is run in an Apache Spark environment managed by AWS Glue. By default, new scripts are coded in Python. To write a new Scala script, see Creating and editing Scala scripts in AWS Glue Studio.

    • Python Shell script editor – For those familiar with programming and writing ETL scripts, choose this option to create a new Python shell job. You write code in a script editor window starting with a template (boilerplate), or you can upload an existing script from a local file. If you choose to use the Python shell editor, you can't use the visual job editor to design or edit your job.

      A Python shell job runs Python scripts as a shell and supports a Python version that depends on the AWS Glue version you choose for the job. You can use these jobs to schedule and run tasks that don't require an Apache Spark environment.

    • Jupyter Notebook – For those familiar with programming and writing ETL scripts, choose this option to create a new Python or Scala job script using a notebook interface based on Jupyter notebook. You write code in a notebook. If you choose to use the notebook interface to create your job, you can't use the visual job editor to design or edit your job.

      You can also use a command line interface to easily configure a notebook for authoring jobs.

  4. Choose Create to create a job in the editing interface that you selected.

    
            The screen shot shows the Jobs page of AWS Glue Studio. In the "Create job" section, the
              "Visual with a source and target" option is selected, The other create job options are
              "Visual with a blank canvas", "Spark script editor" and "Python Shell script editor".
              Beneath the Create job options is the Source drop-down list, which shows the various
              available data source types: AWS Glue Data Catalog, Amazon S3, Amazon Kinesis, Apache Kafka,
              Relational DB, Amazon Redshift, MySQL, and PostgreSQL, with more not shown in the screenshot. To
              the right of the Source drop-down list is the Target drop-down list, which shows
              "Amazon S3". The Create button is highlighted in orange near the top right of
              the image.
  5. If you chose the Jupyter notebook option, the Create job in Jupyter notebook page appears instead of the job editor interface. You must provide additional information before creating a notebook authoring session. For more information about how to specify this information, see Getting started with notebooks in AWS Glue Studio.

Create jobs that use a connector

After you have added a connector to AWS Glue Studio and created a connection for that connector, you can create a job that uses the connection for the data source.

For detailed instructions, see Authoring jobs with custom connectors.

Next steps for creating a job in AWS Glue Studio

You use the visual job editor to configure nodes for your job. Each node represents an action, such as reading data from the source location or applying a transform to the data. Each node you add to your job has properties that provide information about either the data location or the transform.

The next steps for creating and managing your jobs are: