Menu
AWS CloudFormation
User Guide (API Version 2010-05-15)

AWS::Glue::Crawler

The AWS::Glue::Crawler resource specifies an AWS Glue crawler. For more information, see Cataloging Tables with a Crawler and Crawler Structure in the AWS Glue Developer Guide.

Syntax

To declare this entity in your AWS CloudFormation template, use the following syntax:

JSON

Copy
{ "Type" : "AWS::Glue::Crawler", "Properties" : { "Role" : String, "Classifiers" : [ String, ... ], "Description" : String, "SchemaChangePolicy" : SchemaChangePolicy, "Schedule" : Schedule, "DatabaseName" : String, "Targets" : Targets, "TablePrefix" : String, "Name" : String } }

YAML

Copy
Type: "AWS::Glue::Crawler" Properties: Role: String Classifiers: - String Description: String SchemaChangePolicy: SchemaChangePolicy Schedule: Schedule DatabaseName: String Targets: Targets TablePrefix: String Name: String

Properties

Role

The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon S3 data.

Required: Yes

Type: String

Update requires: No interruption

Classifiers

A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.

Required: No

Type: List of String values

Update requires: No interruption

Description

A description of the crawler and where it should be used. It must match the URI address multi-line string pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*

Required: No

Type: String

Update requires: No interruption

SchemaChangePolicy

The policy that specifies update and delete behaviors for the crawler.

Required: No

Type: AWS Glue Crawler SchemaChangePolicy

Update requires: No interruption

Schedule

The schedule for the crawler.

Required: No

Type: AWS Glue Crawler Schedule

Update requires: No interruption

DatabaseName

The name of the database where the crawler's output is stored.

Required: Yes

Type: String

Update requires: No interruption

Targets

The crawler targets.

Required: Yes

Type: AWS Glue Crawler Targets

Update requires: No interruption

TablePrefix

The table prefix that's used for catalog tables that are created.

Required: No

Type: String

Update requires: No interruption

Name

The name of the crawler. Must match the single-line string pattern: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*

Required: No

Type: String

Update requires: Replacement

Return Values

Ref

When the logical ID of this resource is provided to the Ref intrinsic function, Ref returns the resource name.

For more information about using the Ref function, see Ref.

Examples

The following example creates a crawler for an Amazon S3 target.

JSON

Copy
{ "Description": "AWS Glue Crawler Test", "Resources": { "MyRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "glue.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/", "Policies": [ { "PolicyName": "root", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "*", "Resource": "*" } ] } } ] } }, "MyDatabase": { "Type": "AWS::Glue::Database", "Properties": { "CatalogId": { "Ref": "AWS::AccountId" }, "DatabaseInput": { "Name": "dbCrawler", "Description": "TestDatabaseDescription", "LocationUri": "TestLocationUri", "Parameters": { "key1": "value1", "key2": "value2" } } } }, "MyClassifier": { "Type": "AWS::Glue::Classifier", "Properties": { "GrokClassifier": { "Name": "CrawlerClassifier", "Classification": "wikiData", "GrokPattern": "%{NOTSPACE:language} %{NOTSPACE:page_title} %{NUMBER:hits:long} %{NUMBER:retrieved_size:long}" } } }, "MyS3Bucket": { "Type": "AWS::S3::Bucket", "Properties": { "BucketName": "crawlertesttarget", "AccessControl": "BucketOwnerFullControl" } }, "MyCrawler2": { "Type": "AWS::Glue::Crawler", "Properties": { "Name": "testcrawler1", "Role": { "Fn::GetAtt": [ "MyRole", "Arn" ] }, "DatabaseName": { "Ref": "MyDatabase" }, "Classifiers": [ { "Ref": "MyClassifier" } ], "Targets": { "S3Targets": [ { "Path": { "Ref": "MyS3Bucket" } } ] }, "SchemaChangePolicy": { "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }, "Schedule": { "ScheduleExpression": "cron(0/10 * ? * MON-FRI *)" } } } } }

YAML

Copy
Resources: MyRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "glue.amazonaws.com" Action: - "sts:AssumeRole" Path: "/" Policies: - PolicyName: "root" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: "*" Resource: "*" MyDatabase: Type: AWS::Glue::Database Properties: CatalogId: !Ref AWS::AccountId DatabaseInput: Name: "dbCrawler" Description: "TestDatabaseDescription" LocationUri: "TestLocationUri" Parameters: key1 : "value1" key2 : "value2" MyClassifier: Type: AWS::Glue::Classifier Properties: GrokClassifier: Name: "CrawlerClassifier" Classification: "wikiData" GrokPattern: "%{NOTSPACE:language} %{NOTSPACE:page_title} %{NUMBER:hits:long} %{NUMBER:retrieved_size:long}" MyS3Bucket: Type: AWS::S3::Bucket Properties: BucketName: "crawlertesttarget" AccessControl: "BucketOwnerFullControl" MyCrawler2: Type: AWS::Glue::Crawler Properties: Name: "testcrawler1" Role: !GetAtt MyRole.Arn DatabaseName: !Ref MyDatabase Classifiers: - !Ref MyClassifier Targets: S3Targets: - Path: !Ref MyS3Bucket SchemaChangePolicy: UpdateBehavior: "UPDATE_IN_DATABASE" DeleteBehavior: "LOG" Schedule: ScheduleExpression: "cron(0/10 * ? * MON-FRI *)"