Menu
AWS Glue
Developer Guide

Working with Crawlers on the AWS Glue Console

A crawler accesses your data store, extracts metadata, and creates table definitions in the AWS Glue Data Catalog. The Crawlers tab in the AWS Glue console lists all the crawlers that you create. The list displays status and metrics from the last run of your crawler.

To add a crawler using the console:

  1. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/. Choose the Crawlers tab.

  2. Choose Add crawler, and follow the instructions in the Add crawler wizard.

    Note

    To get step-by-step guidance for adding a crawler, see the Add a crawler tutorial in the console. You can also use the Add crawler wizard to create and modify an IAM role that attaches a policy that includes permissions for your Amazon S3 data stores.

For Amazon S3 data stores, an exclude pattern is relative to the include path. For more information about glob patterns, see Which Data Stores Can I Crawl?.

When you crawl a JDBC data store, a connection is required. For more information, see Working with Connections on the AWS Glue Console. An exclude path is relative to the include path. For example, to exclude a table in your JDBC data store, type the table name in the exclude path.

Viewing Crawler Results

To view the results of a crawler, find the crawler name in the list and choose the Logs link. This link takes you to the CloudWatch Logs, where you can see details about which tables were created in the AWS Glue Data Catalog and any errors that were encountered. You can manage your log retention period in the CloudWatch console. The default log retention is Never Expire. For more information about how to change the retention period, see Change Log Data Retention in CloudWatch Logs.

To see details of a crawler, choose the crawler name in the list. Crawler details include the information you defined when you created the crawler with the Add crawler wizard. When a crawler run completes, choose the Tables tab to see the tables that were created by your crawler in the database you specified.

Note

The crawler assumes the permissions of the IAM role that you specify when you define it. This IAM role must have permissions to extract data from your data store and write to the Data Catalog. The AWS Glue console lists only IAM roles that have attached a trust policy for the AWS Glue principal service. For more information about providing roles for AWS Glue, see Using Identity-Based Policies (IAM Policies).

The following are some important properties and metrics about the last run of a crawler:

Name

When you create a crawler, you must give it a unique name.

Schedule

You can choose to run your crawler on demand or choose a frequency with a schedule. For more information about scheduling a crawler, see Scheduling a Crawler.

Status

A crawler can be ready, starting, stopping, scheduled, or schedule paused. A running crawler progresses from starting to stopping. You can resume or pause a schedule attached to a crawler.

Logs

Links to any available logs from the last run of the crawler.

Last runtime

The amount of time it took the crawler to run when it last ran.

Median runtime

The median amount of time it took the crawler to run since it was created.

Tables updated

The number of tables in the AWS Glue Data Catalog that were updated by the latest run of the crawler.

Tables added

The number of tables that were added into the AWS Glue Data Catalog by the latest run of the crawler.

On this page: