Creating and Running Workflows in AWS Glue - AWS Glue

Creating and Running Workflows in AWS Glue

You can use the AWS Glue console to create, visualize, and run workflows. For information about managing workflows using the AWS Glue API, see Workflows.

Creating and Building Out a Workflow Using the AWS Glue Console

A workflow contains jobs, crawlers, and triggers. Before creating a workflow, create the jobs and crawlers that the workflow is to include. It is best to specify run-on-demand crawlers for workflows. You can create new triggers while you are building out your workflow, or you can clone existing triggers into the workflow. When you clone a trigger, all the catalog objects associated with the trigger—the jobs or crawlers that fire it and the jobs or crawlers that it starts—are added to the workflow.

You build out your workflow by adding triggers to the workflow graph, and defining the watched events and actions for each trigger. You begin with a start trigger, which can be either an on-demand or schedule trigger, and complete the graph by adding event (conditional) triggers.

Step 1: Create the workflow

  1. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/.

  2. In the navigation pane, under ETL, choose Workflows.

  3. Choose Add workflow.

  4. On the Add a new ETL workflow page, enter a workflow name and optional description.

  5. (Optional) Choose Add property and add default workflow run properties.

    Default run properties are made available as arguments to all jobs in the workflow. For more information, see Getting and Setting Workflow Run Properties.

  6. (Optional) For Max concurrency, enter the maximum number of concurrent workflow runs to allow for this workflow.

    You can use this parameter to prevent unwanted multiple updates to data, to control costs, and in some cases, to prevent exceeding the maximum number of concurrent runs of any of the component jobs. If you leave this parameter blank, there is no limit to the number of concurrent workflow runs.

  7. (Optional) Add tags to the workflow. Each tag must have a key. The key value is optional.

    For more information, see AWS Tags in AWS Glue.

  8. Choose Add workflow.

    The new workflow appears in the list on the Workflows page.

Step 2: Add a start trigger

  1. On the Workflows page, select your new workflow. In the tabs at the bottom, choose Graph.

  2. Choose Add trigger, and in the Add trigger dialog box, do one of the following:

    • Choose Add new, and complete the Add trigger form, selecting Schedule or On demand for Trigger Type. Then choose Add.

      The trigger appears on the graph, along with a placeholder node (labeled Add node). At this point, the trigger is not yet saved.

      
                                    A graph with two rectangular nodes: a trigger, and a
                                        placeholder node. An arrow points from the trigger node to
                                        the placeholder node.
    • Choose Clone existing, and choose a trigger to clone. Then choose Add.

      The trigger appears on the graph, along with the jobs and crawlers that it watches and the jobs and crawlers that it starts.

      If you mistakenly selected the wrong trigger, select the trigger on the graph, and then choose Remove.

  3. If you added a new trigger, complete these steps:

    1. Do one of the following:

      • Choose the placeholder node (Add node).

      • Ensure that the start trigger is selected, and on the Action menu above the graph, choose Add jobs/crawlers to trigger.

    2. In the Add jobs(s) and crawler(s) to trigger dialog box, select one or more jobs or crawlers, and then choose Add.

      The trigger is saved, and the selected jobs or crawlers appear on the graph with connectors from the trigger.

      If you mistakenly added the wrong jobs or crawlers, you can select either the trigger or a connector and choose Remove.

Step 3: (Optional) Add more triggers

Continue to build out your workflow by adding more triggers. To zoom in or out or to enlarge the graph canvas, use the icons to the right of the graph. For each trigger to add, complete the following steps:

  1. Do one of the following:

    • To clone an existing trigger, ensure that no node on the graph is selected, and on the Action menu, choose Add trigger.

    • To add a new trigger that watches a particular job or crawler on the graph, select the job or crawler node, and then choose the Add trigger placeholder node.

      You can add more jobs or crawlers to watch for this trigger in a later step.

  2. In the Add trigger dialog box, do one of the following:

    • Choose Add new, and complete the Add trigger form. Then choose Add.

      The trigger appears on the graph. You will complete the trigger in a later step.

    • Choose Clone existing, and choose a trigger to clone. Then choose Add.

      The trigger appears on the graph, along with the jobs and crawlers that it watches and the jobs and crawlers that it starts.

      If you mistakenly chose the wrong trigger, select the trigger on the graph, and then choose Remove.

  3. If you added a new trigger, complete these steps:

    1. Select the new trigger.

      As the following graph shows, placeholder nodes appear for (1) events to watch and (2) actions.

      
                                    A graph with many nodes, two of which are placeholder
                                        nodes that are called out as numbers 1 and 2.
    2. (Optional if the trigger already watches an event and you want to add more jobs or crawlers to watch.) Choose the events-to-watch placeholder node, and in the Add job(s) and crawler(s) to watch dialog box, select one or more jobs or crawlers. Choose an event to watch (SUCCEEDED, FAILED, etc.), and choose Add.

    3. Ensure that the trigger is selected, and choose the actions placeholder node.

    4. In the Add job(s) and crawler(s) to watch dialog box, select one or more jobs or crawlers, and choose Add.

      The selected jobs and crawlers appear on the graph, with connectors from the trigger.

Running a Workflow

If the start trigger for a workflow is an on-demand trigger, you can run the workflow from the AWS Glue console, the AWS Command Line Interface (AWS CLI), or the AWS Glue API.

To run a workflow (console)

  1. Open the AWS Glue console at https://console.aws.amazon.com/glue/.

  2. In the navigation pane, under ETL, choose Workflows.

  3. Choose a workflow. On the Actions menu, choose Run.

    You can monitor the workflow run status on the AWS Glue console, under the Last run status column.

    If the workflow fails, do the following:

    1. Ensure that the workflow is selected, and choose the History tab.

    2. Under History, select the most recent run and choose View run details.

    3. Select a failed job or crawler in the dynamic (runtime) graph, and review the Job details or Crawler details pane at the right. Failed nodes are either red or yellow. For descriptions of the Status values, see AWS Glue Job Run Statuses.

To run a workflow (AWS CLI)

  • Enter the following command. Replace <workflow-name> with the workflow to run.

    aws glue start-workflow-run --name <workflow-name>

    If the workflow is successfully started, the command returns the run ID.