Crawler - AWS Glue

Crawler

Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the AWS Glue Data Catalog.

Contents

Classifiers

A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.

Type: Array of strings

Length Constraints: Minimum length of 1. Maximum length of 255.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: No

Configuration

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.

Type: String

Required: No

CrawlElapsedTime

If the crawler is running, contains the total time elapsed since the last crawl began.

Type: Long

Required: No

CrawlerSecurityConfiguration

The name of the SecurityConfiguration structure to be used by this crawler.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 128.

Required: No

CreationTime

The time that the crawler was created.

Type: Timestamp

Required: No

DatabaseName

The name of the database in which the crawler's output is stored.

Type: String

Required: No

Description

A description of the crawler.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 2048.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*

Required: No

LakeFormationConfiguration

Specifies whether the crawler should use AWS Lake Formation credentials for the crawler instead of the IAM role credentials.

Type: LakeFormationConfiguration object

Required: No

LastCrawl

The status of the last crawl, and potentially error information if an error occurred.

Type: LastCrawlInfo object

Required: No

LastUpdated

The time that the crawler was last updated.

Type: Timestamp

Required: No

LineageConfiguration

A configuration that specifies whether data lineage is enabled for the crawler.

Type: LineageConfiguration object

Required: No

Name

The name of the crawler.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 255.

Pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: No

RecrawlPolicy

A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.

Type: RecrawlPolicy object

Required: No

Role

The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.

Type: String

Required: No

Schedule

For scheduled crawlers, the schedule when the crawler runs.

Type: Schedule object

Required: No

SchemaChangePolicy

The policy that specifies update and delete behaviors for the crawler.

Type: SchemaChangePolicy object

Required: No

State

Indicates whether the crawler is running, or whether a run is pending.

Type: String

Valid Values: READY | RUNNING | STOPPING

Required: No

TablePrefix

The prefix added to the names of tables that are created.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 128.

Required: No

Targets

A collection of targets to crawl.

Type: CrawlerTargets object

Required: No

Version

The version of the crawler.

Type: Long

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: