Creating ETL jobs with AWS Glue Studio - AWS Glue Studio

Creating ETL jobs with AWS Glue Studio

You can use the simple visual interface in AWS Glue Studio to create your ETL jobs. You use the Jobs page to create new jobs. You can also use a script editor to work directly with code in the AWS Glue Studio ETL job script.

On the Jobs page, you can see all the jobs that you have created either with AWS Glue Studio or AWS Glue. You can view, manage, and run your jobs on this page.

Start the job creation process

You use the visual editor to create and customize your jobs. When you create a new job, you have the option of starting with an empty canvas, a job with a data source, transform, and data target node, or writing an ETL script.

To create a job in AWS Glue Studio

  1. Sign in to the AWS Management Console and open the AWS Glue Studio console at https://console.aws.amazon.com/gluestudio/.

  2. You can either choose Create and manage jobs from the AWS Glue Studio landing page, or you can choose Jobs from the navigation pane.

    The Jobs page appears.

  3. In the Create job section, choose a configuration option for your job.

    • To create a job starting with an empty canvas, choose Visual with a blank canvas.

    • To create a job starting with source node, or with a source, transform and target node, choose Visual with a source and target.

      You then choose the data source type. You can also choose the data target type, or you can choose the Choose later option to start with only a data source node in the graph.

    • For those familiar with programming and writing ETL scripts, you can choose Spark script editor to create a new Spark ETL job. You then have the option of writing Python or Scala code in a script editor window, or uploading an existing script from a local file. If you choose to use the script editor, then you can't use the visual job editor.

      By default, new scripts are coded in Python. To write a new Scala script, see Creating and editing Scala scripts in AWS Glue Studio.

    • You can alternatively create a new Python shell job with the Python Shell script editor option. You then have the option of writing code in a script editor window starting with a template (boilerplate), or uploading an existing script from a local file. If you choose to use the Python shell editor, then you can't use the visual job editor.

  4. Choose Create to open the visual job editor.

    
            The screen shot shows the Jobs page of AWS Glue Studio. In the "Create job" section, the
              "Visual with a source and target" option is selected, The other create job options are
              "Visual with a blank canvas", "Spark script editor" and "Python Shell script editor".
              Beneath the Create job options is the Source drop-down list, which shows the various
              available data source types: AWS Glue Data Catalog, Amazon S3, Amazon Kinesis, Apache Kafka,
              Relational DB, Amazon Redshift, MySQL, and PostgreSQL, with more not shown in the screenshot. To
              the right of the Source drop-down list is the Target drop-down list, which shows
              "Amazon S3". The Create button is highlighted in orange near the top right of
              the image.

Create jobs that use a connector

After you have added a connector to AWS Glue Studio and created a connection for that connector, you can create a job that uses the connection for the data source.

For detailed instructions, see Authoring jobs with custom connectors.

Next steps for creating a job in AWS Glue Studio

You use the visual job editor to configure nodes for your job. Each node represents an action, such as reading data from the source location or applying a transform to the data. Each node you add to your job has properties that provide information about either the data location or the transform.

The next steps for creating and managing your jobs are: