CreateCrawler
Creates a new crawler with specified targets, role, configuration, and optional schedule.
At least one crawl target must be specified, in the s3Targets
field, the
jdbcTargets
field, or the DynamoDBTargets
field.
Request Syntax
{
"Classifiers": [ "string
" ],
"Configuration": "string
",
"CrawlerSecurityConfiguration": "string
",
"DatabaseName": "string
",
"Description": "string
",
"LakeFormationConfiguration": {
"AccountId": "string
",
"UseLakeFormationCredentials": boolean
},
"LineageConfiguration": {
"CrawlerLineageSettings": "string
"
},
"Name": "string
",
"RecrawlPolicy": {
"RecrawlBehavior": "string
"
},
"Role": "string
",
"Schedule": "string
",
"SchemaChangePolicy": {
"DeleteBehavior": "string
",
"UpdateBehavior": "string
"
},
"TablePrefix": "string
",
"Tags": {
"string
" : "string
"
},
"Targets": {
"CatalogTargets": [
{
"ConnectionName": "string
",
"DatabaseName": "string
",
"DlqEventQueueArn": "string
",
"EventQueueArn": "string
",
"Tables": [ "string
" ]
}
],
"DeltaTargets": [
{
"ConnectionName": "string
",
"CreateNativeDeltaTable": boolean
,
"DeltaTables": [ "string
" ],
"WriteManifest": boolean
}
],
"DynamoDBTargets": [
{
"Path": "string
",
"scanAll": boolean
,
"scanRate": number
}
],
"HudiTargets": [
{
"ConnectionName": "string
",
"Exclusions": [ "string
" ],
"MaximumTraversalDepth": number
,
"Paths": [ "string
" ]
}
],
"IcebergTargets": [
{
"ConnectionName": "string
",
"Exclusions": [ "string
" ],
"MaximumTraversalDepth": number
,
"Paths": [ "string
" ]
}
],
"JdbcTargets": [
{
"ConnectionName": "string
",
"EnableAdditionalMetadata": [ "string
" ],
"Exclusions": [ "string
" ],
"Path": "string
"
}
],
"MongoDBTargets": [
{
"ConnectionName": "string
",
"Path": "string
",
"ScanAll": boolean
}
],
"S3Targets": [
{
"ConnectionName": "string
",
"DlqEventQueueArn": "string
",
"EventQueueArn": "string
",
"Exclusions": [ "string
" ],
"Path": "string
",
"SampleSize": number
}
]
}
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- Classifiers
-
A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.
Type: Array of strings
Length Constraints: Minimum length of 1. Maximum length of 255.
Pattern:
[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*
Required: No
- Configuration
-
Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Setting crawler configuration options.
Type: String
Required: No
- CrawlerSecurityConfiguration
-
The name of the
SecurityConfiguration
structure to be used by this crawler.Type: String
Length Constraints: Minimum length of 0. Maximum length of 128.
Required: No
- DatabaseName
-
The AWS Glue database where results are written, such as:
arn:aws:daylight:us-east-1::database/sometable/*
.Type: String
Required: No
- Description
-
A description of the new crawler.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 2048.
Pattern:
[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*
Required: No
- LakeFormationConfiguration
-
Specifies AWS Lake Formation configuration settings for the crawler.
Type: LakeFormationConfiguration object
Required: No
- LineageConfiguration
-
Specifies data lineage configuration settings for the crawler.
Type: LineageConfiguration object
Required: No
- Name
-
Name of the new crawler.
Type: String
Length Constraints: Minimum length of 1. Maximum length of 255.
Pattern:
[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*
Required: Yes
- RecrawlPolicy
-
A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
Type: RecrawlPolicy object
Required: No
- Role
-
The IAM role or Amazon Resource Name (ARN) of an IAM role used by the new crawler to access customer resources.
Type: String
Required: Yes
- Schedule
-
A
cron
expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify:cron(15 12 * * ? *)
.Type: String
Required: No
- SchemaChangePolicy
-
The policy for the crawler's update and deletion behavior.
Type: SchemaChangePolicy object
Required: No
- TablePrefix
-
The table prefix used for catalog tables that are created.
Type: String
Length Constraints: Minimum length of 0. Maximum length of 128.
Required: No
- Tags
-
The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.
Type: String to string map
Map Entries: Minimum number of 0 items. Maximum number of 50 items.
Key Length Constraints: Minimum length of 1. Maximum length of 128.
Value Length Constraints: Minimum length of 0. Maximum length of 256.
Required: No
- Targets
-
A list of collection of targets to crawl.
Type: CrawlerTargets object
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
For information about the errors that are common to all actions, see Common Errors.
- AlreadyExistsException
-
A resource to be created or added already exists.
HTTP Status Code: 400
- InvalidInputException
-
The input provided was not valid.
HTTP Status Code: 400
- OperationTimeoutException
-
The operation timed out.
HTTP Status Code: 400
- ResourceNumberLimitExceededException
-
A resource numerical limit was exceeded.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: