Job class Crawler class Workflow class Class methods

AWS Glue blueprint classes reference

The libraries for AWS Glue blueprints define three classes that you use in your workflow layout script: Job, Crawler, and Workflow.

Topics

Job class
Crawler class
Workflow class
Class methods

Job class

The Job class represents an AWS Glue ETL job.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Job class.

Argument name	Type	Description
`Name`	`str`	Name to assign to the job. AWS Glue adds a randomly generated suffix to the name to distinguish the job from those created by other blueprint runs.
`Role`	`str`	Amazon Resource Name (ARN) of the role that the job should assume while executing.
`Command`	`dict`	Job command, as specified in the JobCommand structure in the API documentation.

Optional constructor arguments

The following are optional constructor arguments for the Job class.

Argument name	Type	Description
`DependsOn`	`dict`	List of workflow entities that the job depends on. For more information, see Using the DependsOn argument.
`WaitForDependencies`	`str`	Indicates whether the job should wait until all entities on which it depends complete before executing or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the job depends on only one entity.
(Job properties)	-	Any of the job properties listed in Job structure in the AWS Glue API documentation (except `CreatedOn` and `LastModifiedOn`).

Crawler class

The Crawler class represents an AWS Glue crawler.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Crawler class.

Argument name	Type	Description
`Name`	`str`	Name to assign to the crawler. AWS Glue adds a randomly generated suffix to the name to distinguish the crawler from those created by other blueprint runs.
`Role`	`str`	ARN of the role that the crawler should assume while running.
`Targets`	`dict`	Collection of targets to crawl. `Targets` class constructor arguments are defined in the CrawlerTargets structure in the API documentation. All `Targets` constructor arguments are optional, but you must pass at least one.

Optional constructor arguments

The following are optional constructor arguments for the Crawler class.

Argument name	Type	Description
`DependsOn`	`dict`	List of workflow entities that the crawler depends on. For more information, see Using the DependsOn argument.
`WaitForDependencies`	`str`	Indicates whether the crawler should wait until all entities on which it depends complete before running or until any completes. For more information, see Using the WaitForDependencies argument. Omit if the crawler depends on only one entity.
(Crawler properties)	-	Any of the crawler properties listed in Crawler structure in the AWS Glue API documentation, with the following exceptions: `State` `CrawlElapsedTime` `CreationTime` `LastUpdated` `LastCrawl` `Version`

Workflow class

The Workflow class represents an AWS Glue workflow. The workflow layout script returns a Workflow object. AWS Glue creates a workflow based on this object.

Mandatory constructor arguments

The following are mandatory constructor arguments for the Workflow class.

Argument name	Type	Description
`Name`	`str`	Name to assign to the workflow.
`Entities`	`Entities`	A collection of entities (jobs and crawlers) to include in the workflow. The `Entities` class constructor accepts a `Jobs` argument, which is a list of `Job` objects, and a `Crawlers` argument, which is a list of `Crawler` objects.

Optional constructor arguments

The following are optional constructor arguments for the Workflow class.

Argument name	Type	Description
`Description`	`str`	See Workflow structure.
`DefaultRunProperties`	`dict`	See Workflow structure.
`OnSchedule`	`str`	A `cron` expression.

Class methods

All three classes include the following methods.

validate(): Validates the properties of the object and if errors are found, outputs a message and exits. Generates no output if there are no errors. For the Workflow class, calls itself on every entity in the workflow.
to_json(): Serializes the object to JSON. Also calls validate(). For the Workflow class, the JSON object includes job and crawler lists, and a list of triggers generated by the job and crawler dependency specifications.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Publishing a blueprint

Blueprint samples