Querying Workflows Using the AWS Glue API - AWS Glue

Querying Workflows Using the AWS Glue API

AWS Glue provides a rich API for managing workflows. You can retrieve a static view of a workflow or a dynamic view of a running workflow using the AWS Glue API. For more information, see Workflows.

Querying Static Views

Use the GetWorkflow API operation to get a static view that indicates the design of a workflow. This operation returns a directed graph consisting of nodes and edges, where a node represents a trigger, a job, or a crawler. Edges define the relationships between nodes. They are represented by connectors (arrows) on the graph in the AWS Glue console.

You can also use this operation with popular graph-processing libraries such as NetworkX, igraph, JGraphT, and the Java Universal Network/Graph (JUNG) Framework. Because all these libraries represent graphs similarly, minimal transformations are needed.

The static view returned by this API is the most up-to-date view according to the latest definition of triggers associated with the workflow.

Graph Definition

A workflow graph G is an ordered pair (N, E), where N is a set of nodes and E a set of edges. Node is a vertex in the graph identified by a unique number. A node can be of type trigger, job, or crawler. For example: {name:T1, type:Trigger, uniqueId:1}, {name:J1, type:Job, uniqueId:2}.

Edge is a 2-tuple of the form (src, dest), where src and dest are nodes and there is a directed edge from src to dest.

Example of Querying a Static View

Consider a conditional trigger T, which triggers job J2 upon completion of job J1.

J1 ---> T ---> J2

Nodes: J1, T, J2

Edges: (J1, T), (T, J2)

Querying Dynamic Views

Use the GetWorkflowRun API operation to get a dynamic view of a running workflow. This operation returns the same static view of the graph along with metadata related to the workflow run.

For run, nodes representing jobs in the GetWorkflowRun call have a list of job runs initiated as part of the latest run of the workflow. You can use this list to display the run status of each job in the graph itself. For downstream dependencies that are not yet run, this field is set to null. The graphed information makes you aware of the current state of any workflow at any point of time.

The dynamic view returned by this API is based on the static view that was present when the workflow run was started.

Runtime nodes example: {name:T1, type: Trigger, uniqueId:1}, {name:J1, type:Job, uniqueId:2, jobDetails:{jobRuns}}, {name:C1, type:Crawler, uniqueId:3, crawlerDetails:{crawls}}

Example 1: Dynamic View

The following example illustrates a simple two-trigger workflow.

  • Nodes: t1, j1, t2, j2

  • Edges: (t1, j1), (j1, t2), (t2, j2)

The GetWorkflow response contains the following.

{ Nodes : [ { "type" : Trigger, "name" : "t1", "uniqueId" : 1 }, { "type" : Job, "name" : "j1", "uniqueId" : 2 }, { "type" : Trigger, "name" : "t2", "uniqueId" : 3 }, { "type" : Job, "name" : "j2", "uniqueId" : 4 } ], Edges : [ { "sourceId" : 1, "destinationId" : 2 }, { "sourceId" : 2, "destinationId" : 3 }, { "sourceId" : 3, "destinationId" : 4 } }

The GetWorkflowRun response contains the following.

{ Nodes : [ { "type" : Trigger, "name" : "t1", "uniqueId" : 1, "jobDetails" : null, "crawlerDetails" : null }, { "type" : Job, "name" : "j1", "uniqueId" : 2, "jobDetails" : [ { "id" : "jr_12334", "jobRunState" : "SUCCEEDED", "errorMessage" : "error string" } ], "crawlerDetails" : null }, { "type" : Trigger, "name" : "t2", "uniqueId" : 3, "jobDetails" : null, "crawlerDetails" : null }, { "type" : Job, "name" : "j2", "uniqueId" : 4, "jobDetails" : [ { "id" : "jr_1233sdf4", "jobRunState" : "SUCCEEDED", "errorMessage" : "error string" } ], "crawlerDetails" : null } ], Edges : [ { "sourceId" : 1, "destinationId" : 2 }, { "sourceId" : 2, "destinationId" : 3 }, { "sourceId" : 3, "destinationId" : 4 } }

Example 2: Multiple Jobs with a Conditional Trigger

The following example shows a workflow with multiple jobs and a conditional trigger (t3).

Consider Flow: T(t1) ---> J(j1) ---> T(t2) ---> J(j2) | | | | >+------> T(t3) <-----+ | | J(j3) Graph generated: Nodes: t1, t2, t3, j1, j2, j3 Edges: (t1, j1), (j1, t2), (t2, j2), (j1, t3), (j2, t3), (t3, j3)