AWS Glue
Developer Guide

Crawler Scheduler API

The Crawler Scheduler API describes AWS Glue crawler data types, along with the API for creating, deleting, updating, and listing crawlers.

Data Types

Schedule Structure

A scheduling object using a cron statement to schedule an event.

Fields

  • ScheduleExpression – UTF-8 string.

    A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

  • State – UTF-8 string (valid values: SCHEDULED | NOT_SCHEDULED | TRANSITIONING).

    The state of the schedule.

Operations

UpdateCrawlerSchedule Action (Python: update_crawler_schedule)

Updates the schedule of a crawler using a cron expression.

Request

  • CrawlerNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the crawler whose schedule to update.

  • Schedule – UTF-8 string.

    The updated cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • VersionMismatchException

  • SchedulerTransitioningException

  • OperationTimeoutException

StartCrawlerSchedule Action (Python: start_crawler_schedule)

Changes the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED.

Request

  • CrawlerNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the crawler to schedule.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • SchedulerRunningException

  • SchedulerTransitioningException

  • NoScheduleException

  • OperationTimeoutException

StopCrawlerSchedule Action (Python: stop_crawler_schedule)

Sets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running.

Request

  • CrawlerNameRequired: UTF-8 string, not less than 1 or more than 255 bytes long, matching the Single-line string pattern.

    Name of the crawler whose schedule state to set.

Response

  • No Response parameters.

Errors

  • EntityNotFoundException

  • SchedulerNotRunningException

  • SchedulerTransitioningException

  • OperationTimeoutException