AWS Glue blueprint classes reference - AWS Glue

AWS Glue blueprint classes reference

The libraries for AWS Glue blueprints define three classes that you use in your workflow layout script: Job, Crawler, and Workflow.

Job class

The Job class represents an AWS Glue ETL job.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Job class.

Argument name Type Description
Name str Name to assign to the job. AWS Glue adds a randomly generated suffix to the name to distinguish the job from those created by other blueprint runs.
Role str Amazon Resource Name (ARN) of the role that the job should assume while executing.
Command dict Job command, as specified in the JobCommand structure in the API documentation.
Optional constructor arguments

The following are optional constructor arguments for the Job class.

Argument name Type Description
DependsOn dict List of workflow entities that the job depends on. For more information, see Using the DependsOn argument.
WaitForDependencies str Indicates whether the job should wait until all entities on which it depends complete before executing or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the job depends on only one entity.
(Job properties) - Any of the job properties listed in Job structure in the AWS Glue API documentation (except CreatedOn and LastModifiedOn).

Crawler class

The Crawler class represents an AWS Glue crawler.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Crawler class.

Argument name Type Description
Name str Name to assign to the crawler. AWS Glue adds a randomly generated suffix to the name to distinguish the crawler from those created by other blueprint runs.
Role str ARN of the role that the crawler should assume while running.
Targets dict Collection of targets to crawl. Targets class constructor arguments are defined in the CrawlerTargets structure in the API documentation. All Targets constructor arguments are optional, but you must pass at least one.
Optional constructor arguments

The following are optional constructor arguments for the Crawler class.

Argument name Type Description
DependsOn dict List of workflow entities that the crawler depends on. For more information, see Using the DependsOn argument.
WaitForDependencies str Indicates whether the crawler should wait until all entities on which it depends complete before running or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the crawler depends on only one entity.
(Crawler properties) - Any of the crawler properties listed in Crawler structure in the AWS Glue API documentation, with the following exceptions:
  • State

  • CrawlElapsedTime

  • CreationTime

  • LastUpdated

  • LastCrawl

  • Version

Workflow class

The Workflow class represents an AWS Glue workflow. The workflow layout script returns a Workflow object. AWS Glue creates a workflow based on this object.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Workflow class.

Argument name Type Description
Name str Name to assign to the workflow.
Entities Entities A collection of entities (jobs and crawlers) to include in the workflow. The Entities class constructor accepts a Jobs argument, which is a list of Job objects, and a Crawlers argument, which is a list of Crawler objects.
Optional constructor arguments

The following are optional constructor arguments for the Workflow class.

Argument name Type Description
Description str See Workflow structure.
DefaultRunProperties dict See Workflow structure.
OnSchedule str A cron expression.

Class methods

All three classes include the following methods.

validate()

Validates the properties of the object and if errors are found, outputs a message and exits. Generates no output if there are no errors. For the Workflow class, calls itself on every entity in the workflow.

to_json()

Serializes the object to JSON. Also calls validate(). For the Workflow class, the JSON object includes job and crawler lists, and a list of triggers generated by the job and crawler dependency specifications.